ChatGPT-5’s Short-Lived Simplicity: One Week of Clean Design

August 17, 2025

About a month ago I wrote about my frustration with ChatGPT-4o’s pull-down menu and its mess of mismatched models.  I ended my post saying that I hoped ChatGPT 5 would address this.  Last Thursday my prayers were answered.  The pull-down was gone and with it the hodgepodge of models.  In its place there was one simple clean button.


The backlash

Not long after the new model and its clean UX debuted, a hue and cry erupted in the web-o-sphere. The biggest complaint was the disappearance of GPT-4o. People were both angry and frustrated at the loss of friendliness and companionship that 4o brought (not to mention the fact that it had been incorporated into many workflows).  To OpenAI’s credit they listened and responded. Within days 4o was back on the home screen as a “legacy model.”


Fun fact!

When I first started writing this, I wasn’t sure which keys to hit when typing “4o.” The “0” looked a couple of points smaller than the “4,” but I couldn’t quite tell. After asking the source, I learned it wasn’t a zero at all—it was a lowercase “o.”  It turns out the “o” stood for omni, signaling multimodal capability. (I’ve got to believe that I’m not the only one surprised and confused by this)

As a follow up I asked if GPT-5 is multimodal. The answer: “Indeed.” Which then raises another question—why isn’t it called “GPT-5o”? But I digress.


The return of the pull-down

Along with 4o, came the reintroduction of the pull-down menu which, in addition to the “Legacy Models” button,  presented four thinking modes to choose from  (it seems the thinking modes were added in response to what some users felt was GPT-5’s slow speed and lack of flexibility).  

  • Auto – decides how long to think
  • Fast – instant answers
  • Thinking – thinks longer for better answers 
  • Pro – Research-grade intelligence (upgrade)

While power users must have rejoiced, I found myself, once again, confronted with a set of options lacking helpful explanations.  While it was certainly an improvement over GPT-4o’s mess of models and naming conventions I found myself left with many questions: 

While power users must have rejoiced, I found myself, once again, confronted with a set of options lacking helpful explanations.  While it was certainly an improvement over GPT-4o’s mess of models and naming conventions I found myself left with many questions:  What criteria does Auto use to determine how long to think?  What tradeoffs come with Fast?  What exactly qualifies as “better” in Thinking?  if I’m impatient will I get lesser answers?  And what the heck is “research-grade” and why would I pay more for it?

Just like GPT-4o where I rarely ventured beyond the main model, so far I have kept GPT-5 in auto mode.


Grading the model

Before grading the models myself I decided to get ChatGPT 5’s thoughts.  I asked the AI to focus its grading on the UX experience, specifically to what extent a user would be able to quickly and confidently pick the right mode for a given task. 

Surprisingly, it graded itself and its predecessor just as I would have. GPT-5 gave the GPT-4o UX a C– and its own, a B-.   From there it went through and critiqued the experiences in detail.  

At the very end, GPT-5  offered to put together a proposal for a “redesigned hybrid menu that takes GPT-5’s simplicity and pairs it with short, task-oriented descriptions so users can choose confidently without guesswork.”  If only OpenAI had access to a tool like this!


Your mileage may vary

Over the past week I’ve used GPT-5 a fair amount. While I can’t measure results objectively, Auto seems to choose well. Writing, research, and analysis have been solid.

Compared to 4o I found that it spent significantly more time researching and answering questions. Not only have I noticed the increase in thinking time,  but for the first time I witnessed “stepwise reasoning with visible sub-tasks.” Component topics were flashed on the screen as GPT-5 focused on each one at a time. The rigor was impressive, and the answers were detailed and informative.

Was it an improvement over 4o? Hard to tell—but the process felt more deliberate and transparent, even if it took longer.


Back to UX

And yet, we’re back to a cluttered pull-down. OpenAI isn’t alone—Anthropic and Gemini also present users with a maze of choices that lack clarity.  What’s surprising is how little attention is paid to basic UX. Even simple fixes—like linking to a quick FAQ or watching a handful of users struggle through the interface—would go a long way.

As LLMs become interchangeable, the real competition will move higher up the stack. At that point, user experience will outweigh minor gains in benchmarks. It makes sense to start practicing now.

Pau for now…


Meta, Llama and Malcolm in the Middle – Prime Time Open Washing

July 14, 2025

My son and I were watching a Malcolm in the Middle marathon recently when, rather than typical detergent or Nissan ads, multiple 30-second spots from Meta popped up.  Each advert highlighted the virtues of open source through their Llama LLM and ended with taglines like, “Open source AI. Available to all, not just the few.”  The message I took away was: Open Source AI benefits everyone.  Llama is Open Source. Llama benefits everyone.

These weren’t your usual niche tech ads (see two examples below)—they were slick, mainstream productions airing during a popular family sitcom. Surprised and puzzled, I did some digging and learned these ads were rolled out at the end of last year and intensified around April and May to coincide with the release of Llama 4 and leveraging the momentum from Llama 3.1.

But is Llama truly open source?
No. The Open Source Initiative (OSI), the definitive authority on open source standards, notes several critical shortfalls:

  • Commercial Restrictions: Limit on large-scale commercial use excludes key competitors.
  • Redistribution Restrictions: Violates principles of unrestricted redistribution.
  • Training Data Not Public: OSI’s AI-specific definition requires open access to training datasets.
  • Regional Restrictions: Certain geographic uses (e.g., in the EU) may be prohibited.

Meta can set whatever restrictions they want on their software, but if they impose the above restrictions, Llama doesnt  qualify as “open source.”

Do the ends justify the means?
On one hand, Labeling Llama as open source could dilute the definition, opening it up to interpretation and potentially undermining genuine open-source projects. Critics argue this erodes trust, blurs established norms, and disadvantages truly open projects.

On the other hand, there’s a notable benefit: Meta’s mainstream campaign significantly boosts public awareness and portrays open source as beneficial, democratizing technology and driving innovation.   

Ultimately, the challenge is balancing the immense public exposure Meta’s Llama TV ads provide to the open source movement against concerns about accurately preserving the open source definition. The key question for the open source community is not whether these TV ads cause harm—they likely don’t—but how to maintain the integrity of what “open source” really means, which in the new world of AI, has become even harder.


Two examples

Meta AI TV Spot, ‘Open Source AI: Everyone Benefits’ (prosthesis)
“Open source AI is an open invitation. To take our model and build amazing things… When AI is open source, it’s available to all, and everyone benefits.

Meta AI TV Spot, ‘Open Source AI: Collaboration’ (start up)
“Open source AI allows universities, researchers, and scientists to collaborate using Meta’s free open-source AI Llama… potentially fast-tracking life-saving medications.”


Extra-credit reading

Pau for now…


Why Storage Matters in Every Stage of the AI Pipeline

June 13, 2025

One of the companies that impressed me at AI Infrastructure Field Days was Solidigm. Solidigm, which was spun out of Intel’s storage and memory group, is a manufacturer of high-performance solid-state drives (SSDs) optimized for AI and data-intensive workloads.  What I particularly appreciated about Solidigm’s presentation was, rather than diving directly into speeds and feeds, they started by providing us with a broader context.  They spent the first part of the presentation orientating us and explaining the role storage plays and what to consider when building out an AI environment. They started by walking us through the AI data pipeline: (for the TL;DR see “My Takeaways” at the bottom)

Breaking down the AI Data Pipeline

Solidigm’s Ace Stryker kicked off their presentation by breaking the AI data pipeline into two phases: Foundation Model Development on the front end and Enterprise Solution Deployment on the back end. Each of these phases is then made up of three discrete stages.

Phase I: Foundation Model Development. 

The development of foundation models is usually done by a hyper-scaler working in a huge data center.  Ace defined foundation models as typically being LLMs, Recommendation Engines, Chatbots, Natural Language Processing, Classifiers and Computer Vision.  Within foundation model development phase, raw data is ingested, prepped and then used to train the model. The discreet steps are:

1. Data Ingest: Raw, unstructured data is written to disk.

2. Data Preparation: Data is cleaned and vectorized to prepare it for training.

3. Training: Structured data is fed into ML algorithms to produce a base (foundation) model.

Phase II: Enterprise Solution Deployment

As the name implies, phase II takes place inside the enterprise whether that’s in the core data center, the near edge or the far edge.  In phase II models are fitted and deployed with the goal of solving a specific business problem:  

4. Fine-Tuning: Foundation models are customized using domain-specific data (e.g., chatbot conversations).

5. Inference: The model is deployed for real-time use, sometimes enhanced with external data (via Retrieval Augmented Generation).

6. Archive: All intermediate and final data is stored for auditing or reuse.


Data Flows and Magnitude

From there took us through the above slide which lays out how data is generated and flows through the pipeline.  Every item above with disk icon represents the substantial data that is generated during the workflow.  The purple half circles give a sense of the relative size of the data sets by stage.  (an aside: it doesn’t surprise me that Inference is the stage that generates the most data but I wouldn’t have thought that Training would be significantly less than the rest).  


Data Locality and I/O Types

Ace ended our walk through by pointing out where all this data is stored as well as what kinds of disk activity takes place at each stage.

Data Locality:

Above, Network Attached Storage is indicated in blue and Direct Attached Storage is called out in yellow ie Ingest is pure NAS, Training and Tuning are all DAS, Prep, Inference and Archive are 50/50.  Basically, early and late stages rely on network-attached storage (NAS) for capacity and power efficiency.  Middle stages, on the other hand, use direct-attached storage (DAS) for speed, ensuring GPUs are continuously fed data.  The takeaway: direct attached storage for high-performance workloads and network storage for larger, more complex datasets.

I/O Types:

As Ace explained, it’s useful to know what kinds of disk activity are most prevalent during each stage.  And that knowing the I/O characteristics can help ensure the best decisions are being made for the storage subsystem.  For example,

  • Early stages favor sequential writes.
  • Training workloads are random read intensive.

Something else the presentation stressed was the significance of GPU direct storage, which can reduce CPU utilization and improve overall AI system performance by allowing direct data transfer between storage and GPU memory.


My takeaways

  1. It may sound corny but Data is the lifeblood of the AI pipeline
  2. The AI data pipeline has both a front end and a back end. The back end usually sits in a hyperscaler where, after being ingested and prepped, the data is used to train the model. The front end is within the enterprise where the model is tuned for business-specific use then used for inference with the resulting data archived for audits or reuse.
  3. Not only is there a lot of data in the pipeline but it grows (data begets data). Some stages amass more data than others.
  4. There isn’t one storage type that dominates. In those stages like Data Ingest where density and power efficiency are key you want to go with NAS whereas in areas like Training and Fine Tuning, where you want performance to keep the GPUs busy, DAS is what you want.

Pau for now…