The AI Model Wars Are Boring. UX Is Where the Real Battle Is.
Is Gemini 3 better than Claude Opus 4.5? Will GPT 5.1 be coming to Microsoft 365 Copilot? Which new models got added to Microsoft Foundry? These are the types of questions that many AI-related posts seem to focus on, but are they the right ones?
Given the overall quality of all recent frontier models, I wonder why more time isn't spent considering the UX differences between different AI tools. Ultimately, how well a tool helps you to do work is far more important than how smart it is or which model powers it.
Copilot Chat seems to be walking a path where feature-wise and UI-wise it increasingly closely aligns with ChatGPT - just look at the recent change to add a reasoning slider - but there are lots of other approaches to tool design and features that perhaps Microsoft could take inspiration from.
Take NotebookLM from Google. This most closely fits in the Microsoft AI stack with Copilot Notebooks (or from OpenAI with ChatGPT Projects) but looks, feels, and operates entirely differently. In terms of number of sources, it seems to dwarf the OpenAI powered options, it can accept a wider variety of source inputs and easily creates a wider variety of outputs.
Recently, I wanted to create some summaries based on the 50+ episodes of my podcast I had only in raw audio or video format. Doing this directly in either Copilot Notebooks or ChatGPT Projects was impossible, but it took only a couple of minutes in NotebookLM. It consumed a couple of GB of MP3s, transcribed them, and converted them into insights in about 90 seconds. Copilot and ChatGPT were dumbfounded by this task.
Now, to be fair, there are ways to get around file type limitations (pre-processing to transcribe the audio) or source quantity limitations (aggregate the content into files), but these things are the human grunt work we're trying to alleviate WITH AI, not create BECAUSE OF AI!
My point here isn't that one product is better than another, all have their place, and Copilot Notebooks have integration points and governance benefits that NotebookLM can't touch. But the continuous focus on model quality sometimes, in my opinion, ignores that there's a lot going on around the useability of these tools for the vast majority of users other than which model has the highest benchmark.
What alternative tools do you use that you wish you could see more of as part of Microsoft 365 Copilot?
First posted on Linkedin on 12/08/2025-> View Linkedin Post Here