Agents
Quick Start
Google recently open-sourced Gemini CLI, an AI agent that lives in the terminal. You can quickly get it running with:
npm install -g @google/gemini-cli
gemini
███ █████████ ██████████ ██████ ██████ █████ ██████ █████ █████
░░░███ ███░░░░░███░░███░░░░░█░░██████ ██████ ░░███ ░░██████ ░░███ ░░███
░░░███ ███ ░░░ ░███ █ ░ ░███░█████░███ ░███ ░███░███ ░███ ░███
░░░███ ░███ ░██████ ░███░░███ ░███ ░███ ░███░░███░███ ░███
███░ ░███ █████ ░███░░█ ░███ ░░░ ░███ ░███ ░███ ░░██████ ░███
███░ ░░███ ░░███ ░███ ░ █ ░███ ░███ ░███ ░███ ░░█████ ░███
███░ ░░█████████ ██████████ █████ █████ █████ █████ ░░█████ █████
░░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ What is an “Agent”?
✨ Insight
AI agents are systems that can think, act, and learn from results. Unlike chatbots that only generate text, agents interact with the world through tools and APIs to complete real tasks.
The core pattern is the Thought-Action-Observation (TAO) loop:
Loading diagram...
Agents are Just Programs
✨ Insight
Agents don't contain AI models! The Gemini CLI is simply a program that makes API calls to Google's external LLM servers. It's no different from a weather app calling a weather API—the intelligence lives in the "cloud", not in your terminal.
The entire “agent” is just orchestration code:
Loading diagram...
From Theory to Practice
When you ask “How many files are in the src directory?”, the agent doesn’t guess—it uses the TAO loop to gather real information. The key difference: agents perform real-world actions, while chatbots only process training data.
Loading diagram...
How Gemini Works: Streaming Architecture
Google built Gemini CLI around a streaming-first design that processes events in real-time. Instead of waiting for complete responses, it streams thoughts, actions, and results as they happen.
The Four Core Components
🔄 Conversation Manager
Handles the back-and-forth with Google's AI and manages conversation history
⚙️ Tool Scheduler
Manages when and how tools run, including safety approvals
🔧 Tool Registry
Library of available actions like reading files, running commands, web searches
🖥️ User Interface
Terminal display that shows everything happening in real-time
Loading diagram...
Safety and Control
Gemini CLI has three safety modes:
- 🛡️ DEFAULT: Ask permission for dangerous operations (delete files, run commands)
- ⚡ AUTO_EDIT: Auto-approve file edits, but ask for everything else
- 🚀 YOLO: Run everything automatically (for experienced users)
✨ Smart Safety
The system can tell the difference between safe operations (reading files) and dangerous ones (deleting files). It only asks permission when it actually matters.
Real Example: Finding TypeScript Files
When you ask “Find all TypeScript files and analyze their imports”, here’s what happens:
find_files("*.ts") and multiple read_file() commandsExtensible Tools
The Tool Registry automatically discovers and loads tools from multiple sources—built-in capabilities, project-specific discovery commands, and MCP servers. This means the AI’s abilities expand dynamically based on your project’s needs, whether that’s connecting to databases, GitHub APIs, or development environments through MCP protocol.
- Built-in tools for files, web, commands
- MCP (Model Context Protocol) server support
- Project-specific tool discovery
Loading diagram...
Key Takeaways
🎯 What Makes Agents Special
- They can actually DO things, not just talk
- They use real-time reasoning and feedback loops
- They're just programs that coordinate AI with tools
🚀 Why This Matters
- Agents will become our main AI interface
- The patterns here work for any domain
- Safety and control are built-in from the start
- Real-time streaming makes everything feel natural
The Future of AI Agents
✨ Looking Ahead
As AI models get better, agents like Gemini CLI show us the path forward: AI that can think, act, and learn in real-time while keeping humans in control. The magic isn't in the AI model itself—it's in how we connect AI reasoning to real-world actions.
The patterns from Gemini CLI—streaming responses, safety controls, extensible tools, and transparent reasoning—are the blueprint for the next generation of AI interfaces. We’re moving from “AI that talks” to “AI that works.”
This post was written with assistance from Gemini CLI and Claude Code (mainly Claude) for research tasks on the Gemini CLI Codebase and helping with blog writing + styling