AI with Claude
Quill has built-in AI syntax that makes it the easiest language to build AI-powered apps.
Just write ask claude and you're done.
Getting started
The fastest way to start is with the scaffolding command:
quill ai my-app
cd my-app
cp .env.example .env -- add your API key
npm install
quill run app.quill
This creates a project with the Anthropic SDK already configured.
Manual setup
- Install the Anthropic SDK:
npm install @anthropic-ai/sdk - Set your API key:
export ANTHROPIC_API_KEY=your-key(or add it to a.envfile) - Write your Quill code and run it
Quill automatically loads .env files, so just create a .env file with:
ANTHROPIC_API_KEY=sk-ant-...
ask claude
The ask claude expression sends a message to Claude and returns the response text. It's a single line of code:
answer is ask claude "What is the capital of France?"
say answer
That's it. No imports, no client setup, no callbacks. Quill handles everything.
Under the hood, Quill automatically:
- Requires the
@anthropic-ai/sdkpackage - Creates a client (reads
ANTHROPIC_API_KEYfrom the environment) - Sends the message and extracts the response text
- Wraps everything in an async context
Options
You can customize the request by adding with followed by options:
answer is ask claude "Summarize this article" with model "claude-sonnet-4-20250514" max_tokens 500
say answer
Available options
| Option | Type | Default | Description |
|---|---|---|---|
model |
string | "claude-sonnet-4-20250514" |
Which Claude model to use |
max_tokens |
number | 1024 | Maximum tokens in the response |
system |
string | (none) | System prompt to set Claude's behavior |
temperature |
number | (API default) | Randomness (0 = deterministic, 1 = creative) |
System prompts
answer is ask claude "Explain quantum computing" with system "You are a patient teacher who explains things simply" max_tokens 2000
say answer
Conversation history
For multi-turn conversations, pass a messages variable instead of a string:
messages are [
{ role: "user", content: "Hi" },
{ role: "assistant", content: "Hello! How can I help?" },
{ role: "user", content: "What did I just say?" }
]
answer is ask claude messages
say answer
When you pass a variable (not a string literal), Quill treats it as a messages array and sends it directly to the API.
Streaming responses
For long responses, stream them token-by-token with stream claude:
stream claude "Write a poem about coding":
say chunk
The chunk variable is automatically available inside the stream block. Each chunk contains a piece of the response as it arrives.
You can also use options with streaming:
stream claude "Tell me a long story" with max_tokens 4000:
say chunk
Example: building a chatbot
Here's a complete interactive chatbot in Quill:
-- AI Chatbot built with Quill
use "readline" as readline
rl is readline.createInterface({ input: process.stdin, output: process.stdout })
messages are []
to chat prompt:
messages.push({ role: "user", content: prompt })
answer is ask claude messages with system "You are a helpful assistant"
messages.push({ role: "assistant", content: answer })
say answer
to askQuestion:
rl.question("You: ", with input:
if input is "exit":
rl.close()
give back nothing
chat(input)
askQuestion()
)
say "Chatbot ready! Type 'exit' to quit."
askQuestion()
Example: AI-powered CLI tool
-- Code explainer CLI tool
code is process.argv[2]
if code is nothing:
say "Usage: quill run explain.quill 'your code here'"
process.exit(1)
explanation is ask claude code with system "Explain the following code in simple terms. Be concise." max_tokens 500
say "Explanation:"
say explanation
Run it: quill run explain.quill "for (let i = 0; i < 10; i++) { console.log(i); }"
Multi-Provider Support
Quill supports multiple LLM providers with the same simple ask syntax. Just swap the provider name.
All four providers — Claude, OpenAI, Gemini, and Ollama — share identical syntax for options, streaming, structured output, and agents.
OpenAI
OpenAI provides GPT-4o and other models via their cloud API.
- Install the SDK:
npm install openai - Get an API key from platform.openai.com
- Add it to your
.envfile:
OPENAI_API_KEY=sk-proj-...
Then use it in your code:
answer is ask openai "Explain quantum computing"
say answer
-- With options
answer is ask openai "Translate to French" with model "gpt-4o" system "You are a translator"
say answer
Google Gemini
Google Gemini provides fast, capable models through the Generative AI SDK.
- Install the SDK:
npm install @google/generative-ai - Get an API key from Google AI Studio
- Add it to your
.envfile (either name works):
GEMINI_API_KEY=AIza...
# or alternatively:
GOOGLE_API_KEY=AIza...
Then use it in your code:
answer is ask gemini "Write a haiku about coding"
say answer
-- With model option
answer is ask gemini "Summarize this article" with model "gemini-pro"
say answer
Ollama (local models)
Run models locally for free with Ollama. No API key needed — everything runs on your machine. Perfect for privacy-sensitive applications or offline development.
- Install Ollama from ollama.com
- Pull a model:
ollama pull llama3 - Make sure Ollama is running (it serves on
http://localhost:11434)
-- Uses your local Ollama instance (no API key needed)
answer is ask ollama "Summarize this text"
say answer
-- Specify a model
answer is ask ollama "Explain recursion" with model "llama3"
say answer
Default models
Each provider uses a sensible default model when you don't specify one:
| Provider | Default Model | API Key Env Variable |
|---|---|---|
claude |
claude-sonnet-4-20250514 |
ANTHROPIC_API_KEY |
openai |
gpt-4o |
OPENAI_API_KEY |
gemini |
gemini-2.0-flash |
GEMINI_API_KEY or GOOGLE_API_KEY |
ollama |
llama3 |
(none — local) |
All provider options
These options work with every provider via the with keyword:
| Option | Type | Default | Description |
|---|---|---|---|
model |
string | (see table above) | Which model to use for this provider |
system |
string | (none) | System prompt to set the LLM's behavior |
max_tokens |
number | 1024 | Maximum tokens in the response (OpenAI, Claude, Ollama) |
temperature |
number | (API default) | Randomness: 0 = deterministic, 1 = creative (OpenAI, Claude, Ollama) |
Comparing providers
You can send the same prompt to multiple providers and compare the results:
-- Compare responses from all 4 providers
prompt is "What are the benefits of functional programming? Answer in 2 sentences."
say "--- Claude ---"
say ask claude prompt
say "--- OpenAI ---"
say ask openai prompt
say "--- Gemini ---"
say ask gemini prompt
say "--- Ollama ---"
say ask ollama prompt
Error handling
If an API key is missing or invalid, you will get a runtime error. Wrap your calls in a try block to handle failures gracefully:
try:
answer is ask openai "Hello"
say answer
catch err:
say "Error: " + err.message
-- Common: "Incorrect API key provided" or module not found
with keyword works with every provider for options like model, system, max_tokens, and temperature. You can switch providers by changing a single word — no other code changes needed.
Streaming
Streaming delivers the LLM's response piece by piece as it is generated, rather than waiting for the entire response to complete. This is ideal for long responses, real-time UI updates, and chatbot interfaces where you want the user to see output immediately.
When to use streaming vs regular ask
- Use
askwhen you need the full response at once (structured output, short answers, programmatic use) - Use
streamwhen the response is long, when you want real-time display, or when building chat interfaces
Inside the stream block, the chunk variable contains a piece of plain text (not JSON) — typically a few characters or a word at a time. Each chunk arrives as soon as the LLM generates it.
Streaming from each provider
-- Stream from Claude
stream claude "Write a poem about coding":
say chunk
-- Stream from OpenAI
stream openai "Write a short story":
say chunk
-- Stream from Gemini
stream gemini "Explain machine learning":
say chunk
-- Stream from Ollama (local)
stream ollama "Describe the solar system":
say chunk
Streaming with options
All with options work with streaming, just like regular ask:
stream openai "Tell me a long story" with model "gpt-4o" max_tokens 4000 temperature 0.9:
say chunk
Building a progress indicator
You can collect chunks and track progress as the response streams in:
fullResponse is ""
chunkCount is 0
stream claude "Write a detailed essay about climate change" with max_tokens 2000:
fullResponse is fullResponse + chunk
chunkCount is chunkCount + 1
say "Received " + chunkCount + " chunks"
say "Total length: " + fullResponse.length + " characters"
claude for openai, gemini, or ollama. The chunk variable and block syntax are identical across providers.
Structured Output
Structured output lets you get typed, parsed data back from any LLM instead of free-form text. Add as followed by a type shape to your ask call, and Quill will instruct the LLM to return JSON and automatically parse it into a typed object.
Supported types
| Type | Description | Example value |
|---|---|---|
text |
A string value, converted with String() |
"John Smith" |
number |
A numeric value, converted with Number() |
42 |
bool |
A boolean value, converted with Boolean() |
true |
list |
An array of values. Non-array values are wrapped in [ ] |
["red", "blue", "green"] |
Example 1: Simple extraction
-- Extract name and age from natural language
person is ask claude "Extract: John Smith is 30 years old" as {name: text, age: number}
say person.name -- "John Smith"
say person.age -- 30
Example 2: Product information
-- Extract multiple types including booleans and lists
product is ask openai "Extract product info: The Nike Air Max 90 costs $120, is currently in stock, and comes in white, black, and red" as {name: text, price: number, inStock: bool, colors: list}
say product.name -- "Nike Air Max 90"
say product.price -- 120
say product.inStock -- true
say product.colors -- ["white", "black", "red"]
Example 3: Extracting a list of items
-- Extract structured data from a paragraph
paragraph is "Our team includes Alice (engineer, 5 years), Bob (designer, 3 years), and Carol (manager, 8 years)."
team is ask claude paragraph with system "Extract the team members" as {names: list, roles: list, years: list}
say team.names -- ["Alice", "Bob", "Carol"]
say team.roles -- ["engineer", "designer", "manager"]
say team.years -- [5, 3, 8]
How it works
Under the hood, Quill modifies your prompt to ask the LLM to return JSON matching your schema, then parses the response with __parse_structured. It handles JSON wrapped in markdown code fences or embedded in text.
Error handling
If the LLM returns text that cannot be parsed as JSON, the result will contain an error field and the raw response:
result is ask claude "Tell me a joke" as {setup: text, punchline: text}
if result.error:
say "Parsing failed: " + result.error
say "Raw response: " + result.raw
otherwise:
say result.setup
say result.punchline
AI Agents
An AI agent is an LLM combined with tools and a loop. You give the agent a goal, and it autonomously decides which tools to call, processes the results, and repeats until the task is complete. Think of it as giving the LLM hands and eyes — it can take actions, not just generate text.
- You call
agent.run(goal)with a task description - The LLM sees the goal and available tools, then decides which tool to call
- The tool runs and returns a result
- The LLM sees the tool result and decides the next action
- This repeats until the LLM signals it is done, or
maxTurnsis reached
API Reference
createAgent(name, options)
Creates a new agent with the given name and configuration.
| Option | Type | Default | Description |
|---|---|---|---|
provider |
string | "claude" |
Which LLM provider to use ("claude", "openai", "gemini", "ollama") |
model |
string | (provider default) | Which model to use. If not set, uses the provider's default model |
system |
string | (auto-generated) | Custom system prompt. The default instructs the agent to use JSON for tool calls |
maxTurns |
number | 10 |
Maximum number of LLM calls before the agent stops. Prevents runaway loops |
agent.addTool(name, description, function)
Registers a tool that the agent can call. The description is shown to the LLM so it knows when to use this tool.
- name: A short identifier for the tool (e.g.,
"search","calculate") - description: A human-readable explanation of what the tool does. The LLM reads this to decide when to use it
- function: A Quill function that accepts arguments and returns a result
agent.run(goal)
Starts the agent loop with the given goal. Returns the final result when the agent signals it is done, or a text response if maxTurns is reached.
Basic example
-- Create a simple math agent
to calculate expression:
give back eval(expression)
myAgent is createAgent("math-helper", {provider: "claude", maxTurns: 5})
myAgent.addTool("calculate", "Evaluate a math expression", calculate)
result is await myAgent.run("What is 15% of 280, then add 42?")
say result
Real-world example: Research agent
Here is a more complete agent that can search the web and summarize findings:
-- Research agent with search and summarize tools
to searchWeb query:
data is await fetchJSON("https://api.duckduckgo.com/?q=" + query + "&format=json")
give back data.AbstractText or "No results found"
to summarizeText text:
give back ask claude "Summarize in 3 bullets: " + text
myAgent is createAgent("researcher", {provider: "claude", maxTurns: 5})
myAgent.addTool("search", "Search the web for information", searchWeb)
myAgent.addTool("summarize", "Summarize a piece of text", summarizeText)
result is await myAgent.run("Find and summarize the latest news about AI safety")
say result
Using agents with different providers
Agents work with any provider. Just change the provider option:
-- Use OpenAI as the agent's brain
agent is createAgent("helper", {provider: "openai", model: "gpt-4o"})
-- Use a local Ollama model (free, private)
agent is createAgent("local-agent", {provider: "ollama", model: "llama3", maxTurns: 3})
How the agent communicates
The agent uses JSON messages to interact with tools. When it wants to call a tool, it responds with:
{"tool": "search", "args": {"query": "AI safety news"}}
When it has finished the task, it responds with:
{"done": true, "result": "Here are the findings..."}
If the LLM responds with plain text (not JSON), the agent treats it as the final answer and returns it directly.
maxTurns to a reasonable limit (3–10) to prevent runaway costs. The default is 10 turns. If the agent reaches maxTurns without completing, it returns "Agent reached max turns without completing."
Embeddings & Vector Search
Embeddings convert text into arrays of numbers (vectors) that capture semantic meaning. Similar text produces similar vectors, which lets you search by meaning rather than keywords. This is the foundation of RAG (Retrieval-Augmented Generation) and semantic search.
embed(text, provider, model)
Generates an embedding vector for a piece of text. Returns an array of numbers.
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
string | (required) | The text to convert into a vector |
provider |
string | "openai" |
"openai" or "ollama" |
model |
string | (see below) | Embedding model name. Defaults: "text-embedding-3-small" (OpenAI), "nomic-embed-text" (Ollama) |
-- Generate an embedding with OpenAI (default)
vec is await embed("Hello world")
say "Vector has " + vec.length + " dimensions"
-- Generate an embedding with Ollama (local, free)
vec is await embed("Hello world", "ollama", "nomic-embed-text")
cosineSimilarity(a, b)
Compares two embedding vectors and returns a similarity score. Use this when you want to compare two specific texts without a full vector store.
vec1 is await embed("I love programming")
vec2 is await embed("Coding is my passion")
vec3 is await embed("The weather is nice today")
say cosineSimilarity(vec1, vec2) -- ~0.85 (very similar)
say cosineSimilarity(vec1, vec3) -- ~0.15 (unrelated)
VectorStore API
The VectorStore is an in-memory database for embedding vectors. It handles embedding, storage, and similarity search in one convenient interface.
createVectorStore()
Creates an empty vector store.
store is createVectorStore()
store.add(text, metadata)
Adds a single text to the store. Automatically generates the embedding. The optional metadata object lets you attach extra information (source, date, category, etc.).
await store.add("Quill compiles to JavaScript", {source: "docs", page: 1})
await store.add("Quill has built-in AI support", {source: "readme"})
store.addMany(texts, metadatas)
Batch-add multiple texts at once. The metadatas array is optional and should match the length of texts.
texts are ["First document", "Second document", "Third document"]
metas are [{id: 1}, {id: 2}, {id: 3}]
await store.addMany(texts, metas)
store.search(query, topK)
Searches the store for the most similar texts to the query. Returns an array of results, each with text, metadata, and score fields. The topK parameter controls how many results to return (default: 5).
-- Search for the 3 most relevant documents
results is await store.search("How does Quill work?", 3)
for each result in results:
say result.text + " (score: " + result.score + ")"
say " Source: " + result.metadata.source
store.toJSON() / VectorStore.fromJSON(json)
Serialize and deserialize the vector store for persistence. This lets you build the store once and reuse it later without re-embedding all your documents.
-- Save the vector store to a file
write("knowledge.json", store.toJSON())
-- Load it later in another script
stored is read("knowledge.json")
store is VectorStore.fromJSON(stored)
-- Search immediately, no re-embedding needed
results is await store.search("my question", 3)
Document Processing / RAG
Quill provides built-in functions to extract text from files, split it into chunks, and prepare it for embedding. These are the building blocks of any RAG (Retrieval-Augmented Generation) pipeline.
extract(filePath)
Reads a file and returns its text content as a string. Supports multiple file formats:
| Format | Extensions | Notes |
|---|---|---|
| Plain text | .txt |
Read as-is |
| Markdown | .md |
Read as-is (raw markdown text) |
| CSV | .csv |
Read as-is (raw CSV text) |
| JSON | .json |
Parsed and pretty-printed with 2-space indentation |
| HTML | .html, .htm |
HTML tags are stripped, leaving only text content |
.pdf |
Requires npm install pdf-parse |
-- Extract text from different file types
text is await extract("document.pdf")
text is await extract("page.html")
text is await extract("data.json")
text is await extract("notes.md")
npm install pdf-parse before using extract with PDF files. If the package is missing, you will get a clear error message telling you to install it.
chunk(text, size, overlap)
Splits a long text into overlapping chunks suitable for embedding. The chunker tries to break at sentence boundaries (periods and newlines) rather than mid-word, so chunks read naturally.
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
string | (required) | The text to split into chunks |
size |
number | 500 |
Maximum characters per chunk |
overlap |
number | 50 |
Character overlap between consecutive chunks to preserve context at boundaries |
-- Split text into overlapping chunks
chunks are chunk(text, 500, 50)
say "Split into " + chunks.length + " chunks"
-- Smaller chunks with more overlap for higher precision
chunks are chunk(text, 300, 100)
splitSentences(text)
Splits text into an array of individual sentences. Sentences are delimited by periods, exclamation marks, and question marks.
sentences are splitSentences("Hello world. How are you? I am fine!")
-- ["Hello world.", " How are you?", " I am fine!"]
say "Found " + sentences.length + " sentences"
splitParagraphs(text)
Splits text at double newlines (blank lines) into an array of paragraphs. Empty paragraphs are filtered out.
paragraphs are splitParagraphs(text)
for each p in paragraphs:
say "---"
say p
Practical example: PDF Q&A
Extract a PDF, chunk it, embed the chunks, and answer questions about the document:
-- Extract and process a PDF for question answering
text is await extract("research-paper.pdf")
say "Extracted " + text.length + " characters"
-- Chunk the text
chunks are chunk(text, 400, 80)
say "Created " + chunks.length + " chunks"
-- Build a vector store from the chunks
store is createVectorStore()
for each c in chunks:
await store.add(c, {source: "research-paper.pdf"})
-- Ask a question
question is "What methodology did the authors use?"
results is await store.search(question, 3)
context is join(map_list(results, with r: r.text), "\n\n")
answer is ask claude question with system "Answer based only on this context:\n\n" + context
say answer
Full RAG pipeline
Here is a complete RAG pipeline with error handling and persistence:
-- Full RAG pipeline: extract -> chunk -> embed -> search -> ask
try:
-- 1. Extract text from a PDF
text is await extract("knowledge-base.pdf")
say "Extracted " + text.length + " characters"
-- 2. Chunk into manageable pieces
chunks are chunk(text, 500, 50)
say "Created " + chunks.length + " chunks"
-- 3. Build the vector store
store is createVectorStore()
for each c in chunks:
await store.add(c)
-- 4. Save the vector store for reuse
write("knowledge-store.json", store.toJSON())
say "Vector store saved to knowledge-store.json"
-- 5. Search for relevant context
question is "What are the main features?"
results is await store.search(question, 3)
-- 6. Build context string
context is join(map_list(results, with r: r.text), "\n\n")
-- 7. Ask the LLM with retrieved context
answer is ask claude question with system "Answer based on this context:\n\n" + context
say answer
catch err:
say "Error in RAG pipeline: " + err.message
Loading a saved vector store
Once you have saved a vector store, you can load it in subsequent runs without re-embedding:
-- Load a previously saved vector store
stored is read("knowledge-store.json")
store is VectorStore.fromJSON(stored)
-- Search and ask immediately
question is "How does authentication work?"
results is await store.search(question, 3)
context is join(map_list(results, with r: r.text), "\n\n")
answer is ask claude question with system "Answer based on this context:\n\n" + context
say answer
- Explore the Examples page for more AI patterns
- Combine AI with web servers to build AI APIs
- Add AI to your Discord bot