Skip to main content

After writing about 200 blogs over the past six years as AI evangelist for 3M and then Solventum, it is time to hangup my hat. This is my last blog as solver at Solventum as I am retiring at the end of this year. However, I will continue my AI evangelism as a private citizen. 

It is the end of another tumultuous year for generative artificial intelligence (AI). In this blog I am going to highlight a few themes which captured my attention this year.

Large language model (LLM) interpretability

A LLM’s capability to generate fluent text on any topic is a matter of great mystery. Researchers don’t really understand how or why the technology exhibits this behavior. It is something that is just found out observationally with really large language models. The phenomenon has been dubbed “emergence.” But why does it emerge in this fashion? 

Anthropic researchers focused on “monosemanticity” which refers to the ability of a model to encode specific concepts. Essentially, they trained a surrogate model that mapped the weights of an LLM from one of the layers to another model, a sparse auto encoder (SAE). And they showed that in the SAE model, one can identify specific concepts (features) like “Golden Gate Bridge” and a million other concepts. They also showed that you can control the LLM output by manipulating the features in the SAE.

Google DeepMind interpretability researchers were hard at work at this task as well. They called it “mechanistic interpretability.” In summer of this year, they released “Gemma Scope.” Unlike Anthropic, instead of one SAE model, they open sourced a series of SAE models (hundreds), focused on each layer of the transformer architecture of Gemma LLM models. In essence a microscope to look at the entire transformer stack. You can play around with a cool demo here and check out this article from MIT Technology Review on this interpretability research.

NotebookLM

Google’s NotebookLM is certainly the viral sensation of the year in terms of gen AI capabilities.  Promoted by Google as a study aid to students, the basic part of the app is pretty straightforward: Load a bunch of research papers or other source material and engage in a chat session to understand the material. But it also sported another feature: You can ask it to create a podcast on the material loaded and it does an absolutely awesome job. 

Two characters – one a male voice and another a female voice – discuss the paper in an upbeat manner, even interrupting each other. It is a realistic sounding podcast. When I created my own podcast on a paper I uploaded I was blown away. You can try it here. MIT Technology Review did an article on this tech recently. The one knock on the podcast? It is too upbeat and non-critical – mostly focusing on the positive. Regardless, it is indeed a powerful new tool for students and researchers alike. 

Agents

Another persistent theme this year is the agentification of everything. I wrote about the resurgence of the “agent” theme in a blog at the beginning of this year. The theme has now exploded into prominence. One way to think of an agent is a narrow, focused application of AI that incorporates both traditional, classical rule-based AI along with gen AI functionality. This blog from Gemini discusses a range of agents. An example: An agent that converses with you to answer questions about the Volkswagen user manual or how to change a tire. 

What do you do when you have a “swarm” of agents? You need to figure out how the agents can work together. OpenAI has a tool for that and Anthropic is not to be left behind. They have introduced something called “computer use” — an agent that will navigate a website and fill out a complex form for you. Microsoft has incorporated creating agents as part of their Co-pilot platform.   

Essentially, if there is a tedious task to be done and it is a common occurrence, you can create and deploy a surrogate agent to do the heavy lifting. However, bear in mind, the tech is not error free and the old Reagan adage applies, adapted for our times: Don’t trust, and verify.

LLM’s persistent math problem

One of the persistent nagging problems that LLMs deal with is not just that they hallucinate, but that they are so poor at math. It is not hard to see why that is so, if you realize they simply work by predicting the next token. If a specific math problem is something that was seen in training, it will more likely get it correct. Open AI’s latest model the (o1 series) was specifically built around the notion of improving math reasoning skills. It performs better using an elaborate chain-of-thought reasoning paradigm. 

However, a recent paper by researchers at Apple has shed more light on LLMs doing math. They tweaked a standard grade school math (GSM8K) dataset by simply switching variable names and content and noticed the performance go down. What is more interesting is they took Open AI’s o1-preview model (the one that is supposed to be good in math) and added no-op statements (statements added to the math problem unrelated to what the main question is). They found the o1 performance dropped from 95% to 66%. 

Coke ad

Coca-Cola released a Christmas ad — resplendent with its red truck and classic Coke themes. It was beautiful and well made. Only problem? It was made with the help of generative AI. And there was a quick backlash. The basic theme for the protest? AI coming for the job of creative artists. Perhaps that horse has left the barn. The New York Times reports that Coca-Cola used the services of three different AI firms to create the ad. In fact, it is not wholly created by AI — but rather “assisted” by AI. That same theme was voiced by the CEO of Stability AI, one of the leading gen AI foundation models for image and video creation, in a panel discussion on the future of AI. Gen AI will be a good assistant in human-AI collaboration. 

Concluding thoughts

What does next year bring? I don’t have a crystal ball, but one thing is guaranteed. It will be eventful and the applications that exploit AI will explode. We don’t have to wait for artificial general intelligence (AGI)  – the current models are plenty good for millions of applications. However, will the foundation models improve? What would it take to build AGI? A timely paper published in Transactions of Machine Learning Research by researchers from five different premier universities have a comprehensive (127-pages long) take on it! The paper title: “How far are we from AGI? Are LLMs all we need?” Interestingly, they disclose a survey from a workshop focused on AGI, of when people think AGI will be reached. The estimates range from 2 years (a small group) to 20+ years. The paper is also a comprehensive listing of current LLM capabilities and a good discussion on what an AGI system should have. Suffice it to say, there are significant challenges that must be overcome before we can scale the AGI barrier. And that is not going to be 2025. 

 

“Juggy” Jagannathan, PhD, is an AI evangelist with four decades of experience in AI and computer science research.