Dispatch from the LLM wars — New Microsoft AI model may challenge GPT-4 and Google Gemini In project headed by former Inflection chief, MAI-1 may have 500B parameters.
Benj Edwards – May 6, 2024 7:51 pm UTC Enlarge / Mustafa Suleyman, co-founder and chief executive officer of Inflection AI UK Ltd., during a town hall on day two of the World Economic Forum (WEF) in Davos, Switzerland, on Wednesday, Jan. 17, 2024. Suleyman joined Microsoft in March.Getty Images reader comments 44
Microsoft is working on a new large-scale AI language model called MAI-1, which could potentially rival state-of-the-art models from Google, Anthropic, and OpenAI, according to a report by The Information. This marks the first time Microsoft has developed an in-house AI model of this magnitude since investing over $10 billion in OpenAI for the rights to reuse the startup’s AI models. OpenAI’s GPT-4 powers not only ChatGPT but also Microsoft Copilot. Further ReadingDeepMind co-founder Mustafa Suleyman will run Microsofts new consumer AI unit
The development of MAI-1 is being led by Mustafa Suleyman, the former Google AI leader who recently served as CEO of the AI startup Inflection before Microsoft acquired the majority of the startup’s staff and intellectual property for $650 million in March. Although MAI-1 may build on techniques brought over by former Inflection staff, it is reportedly an entirely new large language model (LLM), as confirmed by two Microsoft employees familiar with the project.
With approximately 500 billion parameters, MAI-1 will be significantly larger than Microsoft’s previous open source models (such as Phi-3, which we covered last month), requiring more computing power and training data. This reportedly places MAI-1 in a similar league as OpenAI’s GPT-4, which is rumored to have over 1 trillion parameters (in a mixture-of-experts configuration) and well above smaller models like Meta and Mistral’s 70 billion parameter models. Advertisement
The development of MAI-1 suggests a dual approach to AI within Microsoft, focusing on both small locally run language models for mobile devices and larger state-of-the-art models that are powered by the cloud. Apple is reportedly exploring a similar approach. It also highlights the company’s willingness to explore AI development independently from OpenAI, whose technology currently powers Microsoft’s most ambitious generative AI features, including a chatbot baked into Windows. Further ReadingApple releases eight small AI language models aimed at on-device use
Reportedly, the exact purpose of MAI-1 has not been determined (even within Microsoft), and its most ideal use will depend on its performance, according to one of The Information’s sources. To train the model, Microsoft has been allocating a large cluster of servers with Nvidia GPUs and compiling training data from various sources, including text generated by OpenAI’s GPT-4 and public Internet data.
Depending on the progress made in the coming weeks, The Information reports that Microsoft may preview MAI-1 as early as its Build developer conference later this month, as reported by one of the sources cited by the publication. reader comments 44 Benj Edwards Benj Edwards is an AI and Machine Learning Reporter for Ars Technica. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC. Advertisement Promoted Comments Kanator It’s shocking how much Microsoft is going all-in on large language models. They have:
The MAI team in the article
The Bing AI team (It’s based on GPT-4, but it’s a separate model)
The mentioned Phi-3 team
The WizardLM team
The partnership with OpenAI
Another partnership with Mistral
and a one hundred billion (!!!) dollar investment in new AI datacenters. May 6, 2024 at 7:57 pm Channel Ars Technica ← Previous story Next story → Related Stories Today on Ars