Read the Chantilly white paper
Fork Chantilly’s code
The Problem with Current AI Agent Platforms
Since ChatGPT launched in late 2022, companies have been racing to add AI to their products. But despite the hype, most offerings fall short of what they promise. Organizations can’t actually build agents that work reliably in production environments.
Microsoft’s Copilot Studio claims to let anyone create agents with a “low code” platform. After two years, it still delivers what can only be described as underwhelming results. The interface looks polished, but the actual capability to build production-grade agents just isn’t there.
Google took a different approach with their Vertex AI API and Gemini. They gave developers the raw tools – function calling and grounding – but no real blueprint for building complete systems. You get a blank canvas, which is great if you’re an experienced developer with time to figure everything out from scratch. Not so great if you need something working tomorrow.
This is the gap Chantilly fills: a production-ready blueprint that works on any large language model, gives you control over your data, and actually delivers on the promise of agentic AI.
What “Agentic” Actually Means
Before diving deeper, let’s clarify some terms. An agentic AI system is one that can take actions that consistently move toward achieving goals over time, without having every step spelled out in advance.
The key word is “agenticness” – how well a system can adaptively achieve complex goals in complex environments with limited supervision. Think of it like this: a chatbot that answers questions isn’t very agentic. An agent that can research a market, create a report, learn from your feedback, and improve the next report – that’s agentic.
Chantilly is designed to maximize agenticness through proven patterns and smart architecture.
The Architecture: Five Layers Working Together
Chantilly is built on five interconnected layers. Understanding how they work together helps you see why it’s more powerful than just connecting ChatGPT to your database.
Backend (Node.js/Express) – Handles incoming messages from users and platform webhooks. Node.js was chosen for its asynchronous operations, massive developer ecosystem, and universal cloud deployment support. Express connects all layers together.
LLM – Your AI model provides semantic reasoning. Default is Gemini, but swap in Claude, GPT-4, or others. Smart cost control: use cheaper models for routine conversations, advanced models for complex task creation and self-repair.
Knowledge Base – Prevents hallucinations through retrieval-augmented generation (RAG). Grounds responses in curated organizational information at five critical touchpoints: every user message, every tool execution, complex task creation, external API searches, and security sanitization. Think of it like an employee’s institutional memory preventing knowledge degradation.
Tools and Services – Built-in capabilities include knowledge management, web search, diagram generation, task execution, and translation. Clean architecture separates concerns: Tools provide the AI interface, Services contain business logic, and Models handle data access. This prevents conflicts when multiple tools share underlying services.
Complex Tasks – Long-running operations that execute in worker threads instead of the main conversation. Examples: analyzing customer invoices, creating marketing personas, reviewing call transcripts for coaching insights, or developing social media strategies. Users describe objectives in natural language, and the agent generates executable JavaScript code that runs in secure isolated sandboxes.
Self-Repair and Learning
Code rarely works perfectly on the first attempt. When errors occur during task execution, Chantilly automatically:
- Captures the complete error context
- Searches ReasoningMemory for similar past issues
- Asks the LLM to analyze and debug the code
- Attempts repair and re-execution (up to 3 cycles)
This happens automatically for validation errors, security violations, and runtime failures.
ReasoningMemory is where the magic happens. Based on Google’s ReasoningBank framework, it stores lessons from every code fix and user modification. The system categorizes memories into error patterns, successful fix strategies, effective API usage patterns, general execution strategies, and template creation best practices.
When creating new tasks or repairing broken ones, the agent searches these categorized memories using vector embeddings and applies relevant lessons to the current situation. This means your agent genuinely improves over time, building institutional knowledge specific to your organization.
Production-Ready Features
Security: Implements OWASP LLM Top 10 protections including JWT auth, PII sanitization, prompt injection detection, SSRF protection, and isolated code execution via isolated-vm sandboxing.
Dual Embeddings: Task templates get two vector embeddings – one for exact name matching, one for semantic search. Search by name (“invoice report”) or description (“analyze payments”), and the system finds the right template.
Personality System: Configure communication style (formal/casual), emotional tone, and behavioral patterns across 32 traits. Ensures consistent agent behavior with optional user-specific adaptations.
Benefits and Costs
What You Gain:
- Full control – You own the code and data, not locked into a vendor’s platform
- Flexibility – Swap AI models, change cloud providers, integrate any platform
- Production-ready – Built with enterprise patterns, not an experimental prototype
- Cost-effective – Open source with no licensing fees
- Self-improving – Gets smarter through ReasoningMemory over time
Real Costs:
- Cloud hosting – Typically $10-50/month for modest usage on Google Cloud
- AI API calls – Usage-based pricing varies by how much the agent is used
- Learning curve – Requires basic web development skills
- Training time – Like onboarding employees, agents need initial setup and knowledge
Think of Chantilly as a training system for AI employees. You invest time teaching the agent about your organization, creating custom tools, and building task templates. In return, you get an agent that works continuously and improves with experience.