Some thoughts after a month of working on AI

10 minute read

Introduction

It’s been a month since I left Google and started working at Delphos Labs, an AI-focused startup. A lot happened in that time.

A lot of new models (or access to previously announced models): GPT-4.5, Gemini 2.5, Llama 4, etc. etc.
Huge amounts of new funding coming in, topped by Anthropic’s $3.5 billion funding round
The continued rise of agents as the primary abstraction for LLM development, and the accompanying burst of

And of course, GPT 4o Image Generation came out, which definitely made a splash on X. And while my little test of the new model wasn’t especially Ghibli-inspired, I was quite happy with the results.

cartoon of a skier crashing

While this post won’t be necessarily focused so much on the news, I wanted to use the milestone as an opportunity to collect my thoughts on my experience thus far. These aren’t necessarily that fleshed out, but I think they will be common themes as Delphos and the rest of the industry continue to grow.

The rise of small projects

Over the last month, many of the AI projects that I’ve been working on can be easily described as small. Instead of building a complete, integrated system, the project focuses on completing one or more tasks. These task might chain together to eventually form a larger system, but each component comes with its own set of requirements and goals. Think of projects like

Providing a GPT to another team member to help them with their work
Or building a GPT for myself to speed up searching for information during code review
Building a specific agent to accomplish a data collection task, often with search
Doing extensive research with an agent to eventually configure other tools

These type of quick, small projects, often geared towards a specific goal, have really focused on my thoughts on the value of AI, LLMs and the forthcoming wave of agents that we will all interact with. Often, we are thinking about a task that we would like a critically-minded, detailed-orient person to take on, but there isn’t capacity to fully hand this off. Instead, AI steps in as the next best, and often better, option.

This is a huge shift from the past, where we would often think of AI as a way to automate a task that was already being done by a human. Now, we are thinking about how to augment our own work with AI.

Text is data; data is text

All of these small projects revolve around text data in various forms.

I need something that can reference pages in a book so that I don’t have to do the search
I need a reviewer for a dozen or so papers
I need to quickly search through a whole repository of code

So much work over the last couple of decades has been focused on applying structure to this type of data. A favorite example of mine is Kythe, which creates a language-agnostic graph schema for a repository of code. It powers one of the best tools within Google: the ability to search the Google mono-repo.

I’m seriously questioning about the long-term future of such tools. Why invest all the engineering hours into a new tool like Kythe when almost all questions can be answered by an LLM that fits the codebase into context? What other similar body of knowledge problems will we knock off by simply getting more powerful models over the next couple of years? And as data scientists, we should be heartened to learn how good LLMs are becoming at structured data extraction. It opens up all new realms for applying measurement and statistics to new types of data.

The burden of choice

I swear, one of the hardest problems with switching from Google to a new startup is the sheer number of choices that you have to make. At Google, we had a lot of tools and systems that were already in place, and most of the developer environment was essentially “solved.” You don’t pick your own editor, build system, linter, formatter, etc. You just use the tools that are already there.

Now, even though most of my startup uses VS Code, Cursor or similar tools built on the same foundation, I still need to figure out

How we configure ruff
If we are using mypy or if pyright is sufficient for most of our needs
Which LSP in VS Code is supposed to tie this all together
What to do about other code health tools, like dead code monitoring, deps checks, etc.

Progress in the field of AI means that this getting started problem applies to almost all of the projects that I’ve taken on so far. While I don’t care too much about development environments, I need to think about

Models used
Frameworks for calling the models
If we need to add tool use
Should we go through an existing provider that already has a UI in place
etc. etc.

I would like to think that we’ve largely settled on PydanticAI for most use cases, but a project that I was working on just tonight revealed a gap in what I could do with that framework. I ended up using LiteLLM instead. At the current moment, I think the standard set of features includes:

Structuring outputs and providing validation; Pydantic’s models are good at this but the exact LLM support varies
Connecting to different tools and especially search, which seems ever more critical
Easy approaches to swapping out models and providers
Chaining together different models and tools
And probably a lot more

Swapping models is especially important, as the field is moving so quickly. The LLM arena is a good reference point for the current landscape today, but no “winning” model seems to stay at the top for more than a couple of weeks.

From lmarena.ai/price.

Even ICs are now managers

A lot has been said about the future of software engineering in the age of AI. While there is more to the debate than what I can get to here, one clear trend is that the role of the software engineer is changing. In the past, we were often seen as the “doers” of the team, responsible for writing code and implementing features. Now, with the rise of AI and LLMs, our role is shifting towards that of a manager or overseer. I often apply the model of the Google career ladder to better frame this trend. As a great Google joke goes:

L4s code but don’t talk; L5s talk and code; L6s talk and don’t code

LLMs are quickly becoming the “L4” of the software engineering world. They can write code, but they don’t have the same level of understanding or context that a human engineer does. As a result, we are seeing a shift towards a more managerial role for software engineers, where we are responsible for overseeing the work of LLMs and ensuring that they are producing high-quality code.

One big part of that is in the importance of system design, architecture and the overall flow of information. We are still critical in understanding and describing both the nature of the problem and the steps needed to solve it. This is a big highlight from the recent episode of Cognitive Revolution on AMIE. The developers of the system spent a lot of time talking to practitioners about asking them “how do you think about” diagnosing disease, suggesting treatments, etc.

That said, while I think that writing is one of the most important skills for a good people leader, I’m still not sure on what the right approach to communication is for when we’re working with models. Most guides to Prompt Engineering suggest that we are entering a paradigm where we communicate with models in a very particular style. The exact nature of this style of communicate is still being worked out. For example, how critical is formatting a prompt? With reasoning models, do we need to follow chain of thought? As context windows grow, is simplicity still critical? I expect a lot of this to get worked out as models continue to gain more capabilities.

Tools, tools, tools

The first wave of AI models came in the form of chat bots: Chat GPT, Gemini, Claude, etc. To be frank, I only found limited utility from this wave of tools. There was still a lot to learn about how to use them, and there value as knowledge engines was often overshadowed by things like hallucinations. That’s no longer the case, as three new augmentations have come to models over the last year. All are impactful:

The addition of Search means that information is largely grounded and that the response is consistently good
The big splash of reasoning after Deepseek, which is now showing up everywhere
Generic tool use, either within the apps provided by major platforms or augmented through things like the Model Context Protocol (MCP)

On a day-to-day basis, I’m seeing my own behavior shift towards an AI first approach and away from traditional tools. Search is the most obvious of these. While I don’t necessarily have a single preferred option, I find myself using either Perplexity, ChatGPT or Gemini before looking for an answer in a search engine. Moreover, the way I search seems to be changing, as I am becoming more prone to ask questions than to try and track down a reference that I already have in mind.

Deep research has fundamentally changed the way how I work. Reading is now a collaborative activity, and I’ve found myself consistently returning to AI to brainstorm ideas, summarize literatures and come up with reading plans. At the same time, it is just as important for me to make sure that articles get loaded into a notebooklm, so I can come back and ask questions later. This is a huge increase in my personal productivity, as I have found my ability to learn new things accelerate.

The last of these big A-ha moments was the debut of MCP servers. While not a new idea, it really feels like they took off over the last month. My big mind blown moment came from this video by Zen van Riel. I immediately began to dream of LLM-centric operating systems where a single model can interact with programs across my computer, but to provide assistance but to also off-load much of the mental load that goes into remembering shortcuts, UIs and all the other stuff that fills your head as a knowledge worker.

Now, to be realistic, a lot of MCP demos are still just demos, it will be some time until they can be relied on as day-to-day tools. For example, I could never get the Linear server to work with my API key, and my DuckDuckGo-augmented Claude wasn’t especially performant. But we are still on the cusp on something amazing, where an LLM can be improved upon by dozens of different tools and systems. Your full-featured assistant is just a prompt away.

You have agency; you are king 👑

I want to wrap this up by diving into one of the more controversial topics in the AI world: the rise of agents. So much of the conversation focuses on things like job replacement, automation and the like. But I think that the most important thing to remember is that we are still in control, and the collective we will remain in control for a very long time. AI systems are already incredibly smart, but they are ultimately guided by humans. It is up to us to steer them, enforce align and ultimately derive value from them. It seems a little unfair to tap Andrej Karpathy twice in the same post, but I’ll do it anyway. Here’s what we had to say on all of this.

Agency > Intelligence

I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are… https://t.co/8yvECKi7GU
— Andrej Karpathy (@karpathy) February 24, 2025

Another way: You can just do stuff. And the new tools make you all the better at doing that stuff. What a time to be alive.

Share on

X Facebook LinkedIn Bluesky

Michael Quinn

Some thoughts after a month of working on AI

Introduction

The rise of small projects

Text is data; data is text

The burden of choice

Even ICs are now managers

Tools, tools, tools

You have agency; you are king 👑

Share on

You May Also Enjoy

Effective Vibe Coding Runs on MCPs

What is Data Science in the Age of AI

Moving the development of this blog to devcontainers

An R translation of List out of Lambda