Microsoft Launches Three New MAI Foundation Models to Compete With OpenAI and Google

Microsoft Launches Three New MAI Foundation Models to Compete With OpenAI and Google

Microsoft AI — the research division led by former DeepMind co-founder Mustafa Suleyman — has quietly but decisively stepped out from OpenAI's shadow. This week the company released three new foundational models now available in Microsoft Foundry and the MAI Playground: MAI-Transcribe-1 for speech-to-text across 25 languages (running 2.5× faster than Azure Fast at just $0.36/hr), MAI-Voice-1 capable of generating 60 seconds of audio in under a second with custom voice cloning ($22/million characters), and MAI-Image-2 for image generation at $5/unit. Suleyman framed the effort around what he calls "Humanist AI" — models built for practical human communication, with more coming directly into Microsoft products.

The timing is notable. Microsoft and OpenAI remain deeply intertwined through Azure, yet the two companies are increasingly competing for the same enterprise AI dollars. These models are priced aggressively against Google and OpenAI's equivalents, and their appearance in Foundry alongside GitHub Copilot hints at future native integration in developer tooling. For teams already running on Azure infrastructure, MAI-Transcribe-1 and MAI-Voice-1 in particular are worth benchmarking against current spend on Whisper and ElevenLabs-style voice APIs. The multimodal stack Microsoft is quietly assembling may reshape where enterprise developers reach first — not because it's flashier, but because it's already in the platform they're billing.

Read the full article at TechCrunch →