ai fundamentsls

The New AI Operations Framework: A Developer's Insight

An inside look at the growing AI agent operations industry from the perspective of developers and engineers building these systems.

At definable AI, we're constantly investigating how emerging artificial intelligence technologies can address genuine challenges for the 2 billion people using services across our portfolio companies. AI-powered agents represent the latest breakthrough area in generative AI that we're actively pursuing. Our team has developed and evaluated intelligent agents across diverse applications — from interactive data analytics and personalized learning systems to smart shopping assistants that enhance the customer journey in food delivery, e-commerce, and peer-to-peer marketplaces. Through this work, we've discovered that creating effective agents presents significant challenges, yet successful implementations deliver exceptional value and enable completely reimagined user interactions. Here we outline our key insights alongside a comprehensive mapping of the agent technology and infrastructure ecosystem (AgentOps Landscape) that we developed .

What’s in an agent?

Understanding agentic systems requires establishing a clear definition of their fundamental nature. In essence, agents are artificial intelligence frameworks capable of autonomous decision-making and action execution based on high-level user guidance. These systems typically comprise four essential elements:

  1. Advanced language models that interpret user objectives and formulate strategic plans according to available resources and capabilities at the agent's disposal.
  2. Integrated capabilities that extend beyond the base model's functionality, including internet research, information retrieval, computational processing, data system connections, and potentially additional AI components. These resources allow the agent to perform concrete tasks such as generating documents, running database operations, producing visualizations, and more.
  3. Information storage systems encompassing access to relevant knowledge repositories (persistent memory) and the capacity to maintain context-specific details throughout the multi-step processes required to fulfill objectives (working memory).
  4. Self-evaluation mechanisms: sophisticated agents incorporate capabilities to identify and rectify errors during plan execution while dynamically adjusting task priorities.

Systems like our proprietary agent represent progress toward fully autonomous capabilities, and we remain committed to researching and promoting approaches that maximize their utility while ensuring operational safety.

Agents can come in various degrees of sophistication, depending on the quantity and quality of the tools, as well the LLM used, and the constraints and controls placed on workflows created by the agents. See below (Evolution from LLMs to Agents), a comparison of a single-turn chatbot to two agents.

Why we build agentic systems

The motivation behind developing agentic systems stems from the limitations inherent in current AI assistants. Today's AI co-pilots remain constrained to straightforward, single-step operations such as "condense this document" or "finish this code snippet."

In contrast, agents offer the potential to handle sophisticated, multi-stage workflows like: "research and purchase the optimal running shoes for my needs," "examine this financial report to assess the company's expansion prospects," or "create a comprehensive wearables market analysis incorporating our proprietary sales figures." The difference in capability between traditional co-pilots and agents becomes evident when comparing their responses to identical queries.

Enhanced response quality alone justifies agent development. Additionally, agents deliver measurable performance improvements, as Andrew Ng recently demonstrated. The accompanying benchmark data (Coding Performance) illustrates how GPT-3.5 enhanced with agentic frameworks substantially outperforms GPT-4 – currently among the most advanced available models – on complex programming challenges. Without agentic enhancement, GPT-3.5 performs considerably worse than its more sophisticated counterpart on identical tasks.

Not all plain sailing

This naturally raises the question: if agents deliver such superior results, why haven't they become the standard for all AI interactions? While significant advances have been made, the path toward fully harnessing AI agents powered by large language models remains in its infancy and presents substantial obstacles that create barriers to developing practical and dependable agentic solutions. We identify these challenges as falling into three primary areas: technological maturity, system scalability, and infrastructure connectivity.

Opportunities with task- and industry-specific agents

Our experience shows that agents demonstrate improved performance when designed for particular domains or focused task sets. Consequently, while the field continues to mature, we're increasingly enthusiastic about specialized agents tailored to specific functions and industries, which offer customized solutions that tackle distinct challenges and requirements while helping address common agent development obstacles. A prime example is our domain-specific agent created for interactive data exploration, enabling any organization member to access internal information without requiring database query expertise.

Obtaining relevant insights when needed for evidence-based decisions presents challenges since information typically resides in internal systems and demands skilled analysts who comprehend data structures and can craft appropriate extraction queries – creating a complicated process. Through specialized agents concentrated on functions like data exploration, it becomes simpler to navigate information repositories, connect to databases, assess data relevance, and compile findings to respond to user inquiries. We applied this methodology in developing our Toqan analytical agent shown below. Here's its operational framework:

Through implementing this process and refining it iteratively, we've developed a robust framework that delivered substantial improvements in response precision, advancing from 50% perceived reliability to achieving up to 100% accuracy in targeted critical applications. For deeper insights into the technical architecture of the Toqan Data Analyst, refer to our detailed blog post outlining our engineering discoveries and lessons learned.

The future is…AgentsOps?

As previously outlined, developing agent systems involves far more than sophisticated prompt design for powerful LLMs – although ongoing model improvements like external tool integration capabilities and enhanced reasoning/planning functions in advanced LLMs are what enable agent development. Constructing a functional agent demands creating accessible resources (such as autonomous code generation and execution, internet navigation, database interaction), establishing operational environments, connecting with existing systems, implementing strategic planning and self-assessment capabilities, and additional components.

Given the intricacy of these agentic frameworks, AgentOps has emerged as a vital focus area. AgentOps seeks to lower technical obstacles for building and expanding AI agents by offering modular, ready-made functionalities and resources that can be combined to facilitate the creation of more advanced and streamlined agentic solutions. For developers working with agents, staying informed about the AgentOps ecosystem will be essential for tracking technological progress that can further enhance AI agent capabilities and broaden their potential applications.

While developing Toqan and other agent-driven platforms, we consistently encountered complex technical challenges and searched for suitable development tools. Consequently, we collaborated with our Prosus Ventures colleagues to compile the AgentOps landscape shown below, showcasing tools we evaluated during our process. We believe this serves as a valuable resource for others pursuing agent development.

Just the beginning

Despite the intensive development work many of us are undertaking, AI agents remain in their earliest stages of evolution. The path toward creating reliable and widespread autonomous AI systems continues to require ongoing experimentation and breakthrough discoveries. As we work through the intricacies and obstacles of bringing these agents to market, their capacity for revolutionary impact becomes more apparent. Through comprehending the existing ecosystem, organizing agents by their specialized domains, and monitoring the evolution of AgentOps infrastructure, we can better prepare for the remarkable progress that awaits in the realm of intelligent autonomous systems.

We anticipate that agent capabilities will integrate into co-pilot platforms and AI assistants throughout this year, becoming widely adopted for testing purposes and lower-stakes applications including market analysis, data visualization, and e-commerce recommendation systems. For our organization, leveraging Toqan and the foundational infrastructure we've developed, our objective is to maintain a leading position in agent technology as we strategically transition our generative AI and agent research into enhanced iterations of the Toqan Agent and across products within the broader definable ecosystem.

Wrapping Up

Definable AI is exploring how AI-powered agents can solve real problems for 2 billion users across their portfolio companies. Unlike traditional AI assistants that handle single tasks, agents can autonomously execute complex, multi-step workflows like researching products, analyzing financial reports, or conducting market research.

Key Components of Agents:

  • Advanced language models for planning
  • Integrated tools (web search, databases, code execution)
  • Memory systems (short and long-term)
  • Self-correction capabilities

Main Challenges:

  • Technology is still immature
  • Scalability issues
  • Complex tooling and integration requirements

Their Approach: Definable AI found success building specialized, domain-specific agents rather than general-purpose ones. Their Toqan Data Analyst agent improved accuracy from 50% to 100% in specific use cases by focusing on conversational data analysis, allowing non-technical users to query internal databases.

The AgentOps Ecosystem:As agent development is complex, "AgentOps" has emerged as a field providing pre-built tools and infrastructure to simplify agent creation. Definable AI mapped this landscape while building their own systems.

Future Outlook:While agents are still early-stage, Definable AI expects them to appear in mainstream AI assistants this year for non-critical applications like market research and data visualization, with their goal being to stay ahead of the curve through continued development of Toqan and integration across their portfolio.