Skip to main content

Agent

An Agent is an executor of a specific action. It interfaces with AI models or providers, such as chatGPT, gemini, stable diffusion, or any other supported models, to perform operations like text generation, speech, image processing, or vision.

Latest Improvements

The Agent implementation has been significantly enhanced with:

  • Modular Handler Architecture: Each agent type now uses specialized handlers, improving maintainability and extensibility
  • Improved Provider Compatibility: Better support across different AI model providers
  • Enhanced Error Handling: More robust error detection and recovery
  • Optimized Execution: Streamlined processing of different input and output formats

Parameters

When creating an Agent, you need to provide several parameters that define its behavior:

  • agent_type: Determines the kind of actions it can perform. Accepted types are defined in AgentTypes (e.g., text, image, vision, speech).
  • provider: The name of the AI model or service provider (e.g., openai, gemini, stability).
  • mission: A mission statement guiding the agent's task. It often serves as the system input for AI models.
  • model_params: A dictionary containing the AI model parameters, such as API keys, model names, temperature, and any additional configuration needed for operation.
  • options (optional): Additional options that may affect the agent's execution or interaction with the AI model/service.

Agent Types

The Intelli framework currently supports the following agent types:

  • 'text': Agents that generate or manipulate text.
  • 'image': Agents designed for image generation tasks.
  • 'vision': Agents that perform image to text tasks.
  • 'speech': Agents that convert text to audio.
  • 'recognition': Agents that convert speech to text.
  • 'embed': Agents that generate text embeddings.
  • 'search': Agents that perform semantic search operations.

Example

Creating an Agent

Instantiate the Agent class with the appropriate parameters:

from intelli.flow.agents.agent import Agent
text_agent = Agent(
agent_type='text',
provider='openai',
mission='write a blog post',
model_params={'key': 'YOUR_OPENAI_API_KEY', 'model': 'gpt-3.5-turbo'}
)

Executing an Agent

To execute an agent, you need to provide an input specific to the agent's type. The Intelli framework offers different input classes such as TextAgentInputand ImageAgentInput. Usually the task handle the agent execution, so you don't have to execute manually.

Here's how to execute a text agent:

from intelli.flow.input.agent_input import TextAgentInput
# creating a text input
agent_input = TextAgentInput(desc="What is the capital of France?")
# Executing the agent
result = text_agent.execute(agent_input)