Home » Technical Topics » Knowledge Engineering

Large action models and the promise of true agency

  • Alan Morrison 
Large action models and the promise of true agency

Image by Gerd Altmann from Pixabay

In September 2024, Salesforce AI Research EVP and Chief Scientist Silvio Savarese posted at the Salesforce 360 site on the rise of large action models (LAMs). Among other things, he noted that both AI assistants and AI agents require agency. 

In his words, agency implies “…the ability to act in meaningful ways, sometimes entirely on their own, in pursuit of an assigned goal.” AI assistants, Savarese said, dedicate themselves to a single user. But AI agents, by contrast, “…are built to be shared (and scaled)” to support a team or an organization.

Effective agency is something previous generations of chatbots have lacked. But 2024 has ushered in a new generation of AI assistants and agents powered by large action models (LAMs). Savarese in an earlier 2023 post asserted that “LAMs may soon make it possible to automate entire processes.” In other words, AI assistants and agents with the help of LAMs could be able to act autonomously on behalf of users.

What’s a large action model?

According to Data Science Central’s parent TechTarget, “an LAM is an artificial intelligence (AI) system that understands queries and responds by taking action.

“An LAM improves on a large language model (LLM), one of the foundational elements of modern generative AI. An LLM such as OpenAI’s GPT-4o uses natural language processing (NLP) as a core capability to power ChatGPT. However, while it generates content, it cannot perform actions. The LAM concept moves past this limitation, giving the model the ability to act.”

LAMs, TechTarget says, rely on contextual information to ascertain user goals. They harness the power of neurosymbolic AI, which blends the capabilities of neural nets and the knowledge representation in knowledge graphs to infer user goals. (See “A neurosymbolic AI approach to learning + reasoning” at https://www.datasciencecentral.com/a-neurosymbolic-ai-approach-to-learning-reasoning/ for more information.) 

LAMs can also interact with web interfaces, call APIs and use other software systems, making it possible for assistants and agents to take action directly.

So it seems plausible that LAMs, which blend the language abilities of LLMs, the reasoning abilities of knowledge graphs and the ability to act online, are clearing a path to bona fide autonomous systems.

Assistants and agents that learn directly from users in real time

Others have been elaborating recently on what constitutes a real agent. Supreeth Koundinya, writing for Analytics India Magazine, quoted Ketan Karkhanis, CEO of ThoughtSpot:“There are a lot of nuances to this. If you can’t coach it, then it’s not an agent. I don’t think you can coach a copilot. You can write custom prompts [but] that’s not coaching.”

This notion of coaching has to do with true agents being able to learn directly from users in real-time interactions. LAM pioneer Rabbit Inc. in November 2024 introduced a beta “teaching mode” for all users of its R1 devices. The device can record a sequence of steps the user takes, retrieve this lesson on demand, and then execute the learned task.

ThoughtSpot itself promotes an LAM-based agentic alternative to business intelligence dashboards. “True self-service means anyone can drive real business outcomes with a dedicated AI analyst who can answer any question, on any data, anywhere you work,” the company proclaims.

Salesforce for its part has offered smaller, domain-specific LAMs it calls xLAMs since September 2024. These smaller models can run on mobile devices and can call functions from applications.

Purpose-built automation startup Orby.ai, led by ex-UI Path product development head Bella Liu, promises to “reduce automation development costs by 50 percent or more” with the help of its enterprise LAM. An example Orby use case involves invoice reconciliation–matching invoices to purchase orders and receipts–and the claim is that 64 percent of matches are fully automated.

The promise of real-time agents?

To date, workflow and task automation in balkanized applications and siloed data environments has been difficult. I tried using Microsoft Flow (later Power Automate) a couple of years ago inside a large enterprise, and the challenge was finding and persuading the right people in the right places to help build the automated workflows. It was a volunteer effort, an ad-hoc collaboration via SharePoint that wasn’t sufficient for the need. There was seemingly no leadership interest; everyone spent all their time just making the workflow manually across a dozen different applications. 

Earlier, at a different company, leadership invested in and encouraged the adoption of robotic process automation (a la UI Path), establishing formal training programs and ensuring staff were trained on the tooling. But the long-term adoption that occurred was fractional; most people, as far as I could tell, didn’t find the methods sufficiently intuitive to invest more time in them. RPA to me felt more like a chewing gum and baling wire type of temporary workflow automation arrangement, a kludgey approach that could fall apart if any step in the flow changed. 

Given this historical context, I’m cautiously optimistic about this new crop of LAM-based agents and assistants. The main stumbling block now is data and semantic metadata maturity inside enterprises–both are in short supply. But I’m encouraged that end users trying to work with agents and assistants and then seeing where they fall short might make the need for better data input more evident. 

With real-time interaction and learning, there could be a virtual feedback loop that develops and continual improvement that persists. Fingers crossed.

Leave a Reply

Your email address will not be published. Required fields are marked *