The artificial intelligence division of Microsoft, the colossal technology company’s research arm, unveiled the introduction of three fundamental AI systems this past Thursday, which are capable of generating text, voice, and visual content.
This unveiling highlights Microsoft’s persistent effort to forge its proprietary collection of diverse AI models and to contend with competing AI research centers, notwithstanding its enduring affiliation with OpenAI.
MAI-Transcribe-1 converts spoken words from 25 distinct tongues into written form and operates two and a half times swifter than Microsoft’s Azure Fast offering, as stated in a corporate communiqué. MAI-Voice-1 functions as a sound-producing system. This vocal model enables the creation of a minute of sound in mere moments and facilitates the development of a personalized vocal identity for users. MAI-Image-2 is designed as a motion-picture creation system.
MAI-Image-2 was initially launched via MAI Playground, a novel platform for evaluating large language models, on the nineteenth of March. Presently, the entire trio of models is rolling out on Microsoft Foundry, with the transcription and voice models also accessible within MAI Playground.
These systems were crafted by Microsoft’s MAI Superintelligence squad, an artificial intelligence investigative group under the leadership of Mustafa Suleyman, Microsoft AI’s chief executive officer, which was established and publicized in November 2025.
“At Microsoft AI, we are developing Human-centric AI. We employ a unique perspective when devising our AI models — placing humanity at the core, refining for natural human interaction patterns, and educating for real-world application,” Suleyman stated in a weblog entry. “Expect to encounter additional models from our team shortly, both within Foundry and integrated into Microsoft’s offerings and user interactions.”
In the ever more saturated large language model arena, MAI anticipates that a key advantage for these systems will be their affordability compared to the offerings from Google and OpenAI, as indicated by the firm in its weblog entry.
Techcrunch event
San Francisco, CA
|
October 13-15, 2026
MAI-Transcribe-1 is priced beginning at $0.36 hourly. MAI-Voice-1 has an initial cost of $22 for every million characters, while MAI-Image-2 commences at $5 for a million tokens when providing text input and $33 for a million tokens for image generation.
Notwithstanding the introduction of its proprietary models, Suleyman reiterated Microsoft’s dedication to its collaboration with OpenAI during a discussion with VentureBeat — even though a fresh re-evaluation of that alliance enabled Microsoft to genuinely advance this superintelligence inquiry, Suleyman informed The Verge.
Microsoft has allocated over $13 billion to the artificial intelligence investigative facility and integrates its systems into its diverse offerings via a multi-year collaborative agreement. Microsoft adopts an identical approach concerning microprocessors; it both manufactures its proprietary units and procures from external entities.
{content}
Source: {feed_title}

