AI with Claude

Quill has built-in AI syntax that makes it the easiest language to build AI-powered apps. Just write ask claude and you're done.

Getting started

The fastest way to start is with the scaffolding command:

quill ai my-app
cd my-app
cp .env.example .env    -- add your API key
npm install
quill run app.quill

This creates a project with the Anthropic SDK already configured.

Manual setup

Install the Anthropic SDK: npm install @anthropic-ai/sdk
Set your API key: export ANTHROPIC_API_KEY=your-key (or add it to a .env file)
Write your Quill code and run it

Quill automatically loads .env files, so just create a .env file with:

ANTHROPIC_API_KEY=sk-ant-...

ask claude

The ask claude expression sends a message to Claude and returns the response text. It's a single line of code:

answer is ask claude "What is the capital of France?"
say answer

That's it. No imports, no client setup, no callbacks. Quill handles everything.

Under the hood, Quill automatically:

Requires the @anthropic-ai/sdk package
Creates a client (reads ANTHROPIC_API_KEY from the environment)
Sends the message and extracts the response text
Wraps everything in an async context

Options

You can customize the request by adding with followed by options:

answer is ask claude "Summarize this article" with model "claude-sonnet-4-20250514" max_tokens 500
say answer

Available options

Option	Type	Default	Description
`model`	string	`"claude-sonnet-4-20250514"`	Which Claude model to use
`max_tokens`	number	1024	Maximum tokens in the response
`system`	string	(none)	System prompt to set Claude's behavior
`temperature`	number	(API default)	Randomness (0 = deterministic, 1 = creative)

System prompts

answer is ask claude "Explain quantum computing" with system "You are a patient teacher who explains things simply" max_tokens 2000
say answer

Conversation history

For multi-turn conversations, pass a messages variable instead of a string:

messages are [
  { role: "user", content: "Hi" },
  { role: "assistant", content: "Hello! How can I help?" },
  { role: "user", content: "What did I just say?" }
]
answer is ask claude messages
say answer

When you pass a variable (not a string literal), Quill treats it as a messages array and sends it directly to the API.

Streaming responses

For long responses, stream them token-by-token with stream claude:

stream claude "Write a poem about coding":
  say chunk

The chunk variable is automatically available inside the stream block. Each chunk contains a piece of the response as it arrives.

You can also use options with streaming:

stream claude "Tell me a long story" with max_tokens 4000:
  say chunk

Example: building a chatbot

Here's a complete interactive chatbot in Quill:

-- AI Chatbot built with Quill

use "readline" as readline

rl is readline.createInterface({ input: process.stdin, output: process.stdout })

messages are []

to chat prompt:
  messages.push({ role: "user", content: prompt })
  answer is ask claude messages with system "You are a helpful assistant"
  messages.push({ role: "assistant", content: answer })
  say answer

to askQuestion:
  rl.question("You: ", with input:
    if input is "exit":
      rl.close()
      give back nothing
    chat(input)
    askQuestion()
  )

say "Chatbot ready! Type 'exit' to quit."
askQuestion()

Example: AI-powered CLI tool

-- Code explainer CLI tool

code is process.argv[2]

if code is nothing:
  say "Usage: quill run explain.quill 'your code here'"
  process.exit(1)

explanation is ask claude code with system "Explain the following code in simple terms. Be concise." max_tokens 500

say "Explanation:"
say explanation

Run it: quill run explain.quill "for (let i = 0; i < 10; i++) { console.log(i); }"

Multi-Provider Support

Quill supports multiple LLM providers with the same simple ask syntax. Just swap the provider name. All four providers — Claude, OpenAI, Gemini, and Ollama — share identical syntax for options, streaming, structured output, and agents.

OpenAI

OpenAI provides GPT-4o and other models via their cloud API.

Install the SDK: npm install openai
Get an API key from platform.openai.com
Add it to your .env file:

OPENAI_API_KEY=sk-proj-...

Then use it in your code:

answer is ask openai "Explain quantum computing"
say answer

-- With options
answer is ask openai "Translate to French" with model "gpt-4o" system "You are a translator"
say answer

Google Gemini

Google Gemini provides fast, capable models through the Generative AI SDK.

Install the SDK: npm install @google/generative-ai
Get an API key from Google AI Studio
Add it to your .env file (either name works):

GEMINI_API_KEY=AIza...
# or alternatively:
GOOGLE_API_KEY=AIza...

Then use it in your code:

answer is ask gemini "Write a haiku about coding"
say answer

-- With model option
answer is ask gemini "Summarize this article" with model "gemini-pro"
say answer

Ollama (local models)

Run models locally for free with Ollama. No API key needed — everything runs on your machine. Perfect for privacy-sensitive applications or offline development.

Install Ollama from ollama.com
Pull a model: ollama pull llama3
Make sure Ollama is running (it serves on http://localhost:11434)

-- Uses your local Ollama instance (no API key needed)
answer is ask ollama "Summarize this text"
say answer

-- Specify a model
answer is ask ollama "Explain recursion" with model "llama3"
say answer

Default models

Each provider uses a sensible default model when you don't specify one:

Provider	Default Model	API Key Env Variable
`claude`	`claude-sonnet-4-20250514`	`ANTHROPIC_API_KEY`
`openai`	`gpt-4o`	`OPENAI_API_KEY`
`gemini`	`gemini-2.0-flash`	`GEMINI_API_KEY` or `GOOGLE_API_KEY`
`ollama`	`llama3`	(none — local)

All provider options

These options work with every provider via the with keyword:

Option	Type	Default	Description
`model`	string	(see table above)	Which model to use for this provider
`system`	string	(none)	System prompt to set the LLM's behavior
`max_tokens`	number	1024	Maximum tokens in the response (OpenAI, Claude, Ollama)
`temperature`	number	(API default)	Randomness: 0 = deterministic, 1 = creative (OpenAI, Claude, Ollama)

Comparing providers

You can send the same prompt to multiple providers and compare the results:

-- Compare responses from all 4 providers
prompt is "What are the benefits of functional programming? Answer in 2 sentences."

say "--- Claude ---"
say ask claude prompt

say "--- OpenAI ---"
say ask openai prompt

say "--- Gemini ---"
say ask gemini prompt

say "--- Ollama ---"
say ask ollama prompt

Error handling

If an API key is missing or invalid, you will get a runtime error. Wrap your calls in a try block to handle failures gracefully:

try:
  answer is ask openai "Hello"
  say answer
catch err:
  say "Error: " + err.message
  -- Common: "Incorrect API key provided" or module not found

All providers share the same syntax. The with keyword works with every provider for options like model, system, max_tokens, and temperature. You can switch providers by changing a single word — no other code changes needed.

Streaming

Streaming delivers the LLM's response piece by piece as it is generated, rather than waiting for the entire response to complete. This is ideal for long responses, real-time UI updates, and chatbot interfaces where you want the user to see output immediately.

When to use streaming vs regular ask

Use ask when you need the full response at once (structured output, short answers, programmatic use)
Use stream when the response is long, when you want real-time display, or when building chat interfaces

Inside the stream block, the chunk variable contains a piece of plain text (not JSON) — typically a few characters or a word at a time. Each chunk arrives as soon as the LLM generates it.

Streaming from each provider

-- Stream from Claude
stream claude "Write a poem about coding":
  say chunk

-- Stream from OpenAI
stream openai "Write a short story":
  say chunk

-- Stream from Gemini
stream gemini "Explain machine learning":
  say chunk

-- Stream from Ollama (local)
stream ollama "Describe the solar system":
  say chunk

Streaming with options

All with options work with streaming, just like regular ask:

stream openai "Tell me a long story" with model "gpt-4o" max_tokens 4000 temperature 0.9:
  say chunk

Building a progress indicator

You can collect chunks and track progress as the response streams in:

fullResponse is ""
chunkCount is 0

stream claude "Write a detailed essay about climate change" with max_tokens 2000:
  fullResponse is fullResponse + chunk
  chunkCount is chunkCount + 1

say "Received " + chunkCount + " chunks"
say "Total length: " + fullResponse.length + " characters"

Streaming works with all 4 providers using the same syntax. Just swap claude for openai, gemini, or ollama. The chunk variable and block syntax are identical across providers.

Structured Output

Structured output lets you get typed, parsed data back from any LLM instead of free-form text. Add as followed by a type shape to your ask call, and Quill will instruct the LLM to return JSON and automatically parse it into a typed object.

Supported types

Type	Description	Example value
`text`	A string value, converted with `String()`	`"John Smith"`
`number`	A numeric value, converted with `Number()`	`42`
`bool`	A boolean value, converted with `Boolean()`	`true`
`list`	An array of values. Non-array values are wrapped in `[ ]`	`["red", "blue", "green"]`

Example 1: Simple extraction

-- Extract name and age from natural language
person is ask claude "Extract: John Smith is 30 years old" as {name: text, age: number}
say person.name    -- "John Smith"
say person.age     -- 30

Example 2: Product information

-- Extract multiple types including booleans and lists
product is ask openai "Extract product info: The Nike Air Max 90 costs $120, is currently in stock, and comes in white, black, and red" as {name: text, price: number, inStock: bool, colors: list}
say product.name       -- "Nike Air Max 90"
say product.price      -- 120
say product.inStock    -- true
say product.colors     -- ["white", "black", "red"]

Example 3: Extracting a list of items

-- Extract structured data from a paragraph
paragraph is "Our team includes Alice (engineer, 5 years), Bob (designer, 3 years), and Carol (manager, 8 years)."
team is ask claude paragraph with system "Extract the team members" as {names: list, roles: list, years: list}
say team.names    -- ["Alice", "Bob", "Carol"]
say team.roles    -- ["engineer", "designer", "manager"]
say team.years    -- [5, 3, 8]

How it works

Under the hood, Quill modifies your prompt to ask the LLM to return JSON matching your schema, then parses the response with __parse_structured. It handles JSON wrapped in markdown code fences or embedded in text.

Error handling

If the LLM returns text that cannot be parsed as JSON, the result will contain an error field and the raw response:

result is ask claude "Tell me a joke" as {setup: text, punchline: text}

if result.error:
  say "Parsing failed: " + result.error
  say "Raw response: " + result.raw
otherwise:
  say result.setup
  say result.punchline

Tip: For best results, be specific in your prompt about the expected format. Structured output works with all 4 providers (Claude, OpenAI, Gemini, Ollama) — just swap the provider name.

AI Agents

An AI agent is an LLM combined with tools and a loop. You give the agent a goal, and it autonomously decides which tools to call, processes the results, and repeats until the task is complete. Think of it as giving the LLM hands and eyes — it can take actions, not just generate text.

How the agent loop works:

You call agent.run(goal) with a task description
The LLM sees the goal and available tools, then decides which tool to call
The tool runs and returns a result
The LLM sees the tool result and decides the next action
This repeats until the LLM signals it is done, or maxTurns is reached

API Reference

`createAgent(name, options)`

Creates a new agent with the given name and configuration.

Option	Type	Default	Description
`provider`	string	`"claude"`	Which LLM provider to use (`"claude"`, `"openai"`, `"gemini"`, `"ollama"`)
`model`	string	(provider default)	Which model to use. If not set, uses the provider's default model
`system`	string	(auto-generated)	Custom system prompt. The default instructs the agent to use JSON for tool calls
`maxTurns`	number	`10`	Maximum number of LLM calls before the agent stops. Prevents runaway loops

`agent.addTool(name, description, function)`

Registers a tool that the agent can call. The description is shown to the LLM so it knows when to use this tool.

name: A short identifier for the tool (e.g., "search", "calculate")
description: A human-readable explanation of what the tool does. The LLM reads this to decide when to use it
function: A Quill function that accepts arguments and returns a result

`agent.run(goal)`

Starts the agent loop with the given goal. Returns the final result when the agent signals it is done, or a text response if maxTurns is reached.

Basic example

-- Create a simple math agent
to calculate expression:
  give back eval(expression)

myAgent is createAgent("math-helper", {provider: "claude", maxTurns: 5})
myAgent.addTool("calculate", "Evaluate a math expression", calculate)

result is await myAgent.run("What is 15% of 280, then add 42?")
say result

Real-world example: Research agent

Here is a more complete agent that can search the web and summarize findings:

-- Research agent with search and summarize tools

to searchWeb query:
  data is await fetchJSON("https://api.duckduckgo.com/?q=" + query + "&format=json")
  give back data.AbstractText or "No results found"

to summarizeText text:
  give back ask claude "Summarize in 3 bullets: " + text

myAgent is createAgent("researcher", {provider: "claude", maxTurns: 5})
myAgent.addTool("search", "Search the web for information", searchWeb)
myAgent.addTool("summarize", "Summarize a piece of text", summarizeText)

result is await myAgent.run("Find and summarize the latest news about AI safety")
say result

Using agents with different providers

Agents work with any provider. Just change the provider option:

-- Use OpenAI as the agent's brain
agent is createAgent("helper", {provider: "openai", model: "gpt-4o"})

-- Use a local Ollama model (free, private)
agent is createAgent("local-agent", {provider: "ollama", model: "llama3", maxTurns: 3})

How the agent communicates

The agent uses JSON messages to interact with tools. When it wants to call a tool, it responds with:

{"tool": "search", "args": {"query": "AI safety news"}}

When it has finished the task, it responds with:

{"done": true, "result": "Here are the findings..."}

If the LLM responds with plain text (not JSON), the agent treats it as the final answer and returns it directly.

Watch your token costs. Each agent turn uses tokens from your provider. Set maxTurns to a reasonable limit (3–10) to prevent runaway costs. The default is 10 turns. If the agent reaches maxTurns without completing, it returns "Agent reached max turns without completing."

Embeddings & Vector Search

Embeddings convert text into arrays of numbers (vectors) that capture semantic meaning. Similar text produces similar vectors, which lets you search by meaning rather than keywords. This is the foundation of RAG (Retrieval-Augmented Generation) and semantic search.

`embed(text, provider, model)`

Generates an embedding vector for a piece of text. Returns an array of numbers.

Parameter	Type	Default	Description
`text`	string	(required)	The text to convert into a vector
`provider`	string	`"openai"`	`"openai"` or `"ollama"`
`model`	string	(see below)	Embedding model name. Defaults: `"text-embedding-3-small"` (OpenAI), `"nomic-embed-text"` (Ollama)

-- Generate an embedding with OpenAI (default)
vec is await embed("Hello world")
say "Vector has " + vec.length + " dimensions"

-- Generate an embedding with Ollama (local, free)
vec is await embed("Hello world", "ollama", "nomic-embed-text")

`cosineSimilarity(a, b)`

Compares two embedding vectors and returns a similarity score. Use this when you want to compare two specific texts without a full vector store.

vec1 is await embed("I love programming")
vec2 is await embed("Coding is my passion")
vec3 is await embed("The weather is nice today")

say cosineSimilarity(vec1, vec2)  -- ~0.85 (very similar)
say cosineSimilarity(vec1, vec3)  -- ~0.15 (unrelated)

Understanding cosine similarity scores. Scores range from 0 to 1. A score of 1.0 means the texts are identical in meaning. Scores above 0.7 indicate strong similarity. Scores below 0.3 suggest the texts are unrelated.

VectorStore API

The VectorStore is an in-memory database for embedding vectors. It handles embedding, storage, and similarity search in one convenient interface.

`createVectorStore()`

Creates an empty vector store.

store is createVectorStore()

`store.add(text, metadata)`

Adds a single text to the store. Automatically generates the embedding. The optional metadata object lets you attach extra information (source, date, category, etc.).

await store.add("Quill compiles to JavaScript", {source: "docs", page: 1})
await store.add("Quill has built-in AI support", {source: "readme"})

`store.addMany(texts, metadatas)`

Batch-add multiple texts at once. The metadatas array is optional and should match the length of texts.

texts are ["First document", "Second document", "Third document"]
metas are [{id: 1}, {id: 2}, {id: 3}]
await store.addMany(texts, metas)

`store.search(query, topK)`

Searches the store for the most similar texts to the query. Returns an array of results, each with text, metadata, and score fields. The topK parameter controls how many results to return (default: 5).

-- Search for the 3 most relevant documents
results is await store.search("How does Quill work?", 3)
for each result in results:
  say result.text + " (score: " + result.score + ")"
  say "  Source: " + result.metadata.source

`store.toJSON()` / `VectorStore.fromJSON(json)`

Serialize and deserialize the vector store for persistence. This lets you build the store once and reuse it later without re-embedding all your documents.

-- Save the vector store to a file
write("knowledge.json", store.toJSON())

-- Load it later in another script
stored is read("knowledge.json")
store is VectorStore.fromJSON(stored)

-- Search immediately, no re-embedding needed
results is await store.search("my question", 3)

Tip: For large knowledge bases, save the vector store to a file after building it. Loading from JSON is instant, while re-embedding hundreds of documents can take minutes and use API credits.

Document Processing / RAG

Quill provides built-in functions to extract text from files, split it into chunks, and prepare it for embedding. These are the building blocks of any RAG (Retrieval-Augmented Generation) pipeline.

`extract(filePath)`

Reads a file and returns its text content as a string. Supports multiple file formats:

Format	Extensions	Notes
Plain text	`.txt`	Read as-is
Markdown	`.md`	Read as-is (raw markdown text)
CSV	`.csv`	Read as-is (raw CSV text)
JSON	`.json`	Parsed and pretty-printed with 2-space indentation
HTML	`.html`, `.htm`	HTML tags are stripped, leaving only text content
PDF	`.pdf`	Requires `npm install pdf-parse`

-- Extract text from different file types
text is await extract("document.pdf")
text is await extract("page.html")
text is await extract("data.json")
text is await extract("notes.md")

PDF support requires an extra package. Run npm install pdf-parse before using extract with PDF files. If the package is missing, you will get a clear error message telling you to install it.

`chunk(text, size, overlap)`

Splits a long text into overlapping chunks suitable for embedding. The chunker tries to break at sentence boundaries (periods and newlines) rather than mid-word, so chunks read naturally.

Parameter	Type	Default	Description
`text`	string	(required)	The text to split into chunks
`size`	number	`500`	Maximum characters per chunk
`overlap`	number	`50`	Character overlap between consecutive chunks to preserve context at boundaries

-- Split text into overlapping chunks
chunks are chunk(text, 500, 50)
say "Split into " + chunks.length + " chunks"

-- Smaller chunks with more overlap for higher precision
chunks are chunk(text, 300, 100)

Chunk size guidance. For RAG, chunk sizes of 300–500 characters work best. Use 50–100 character overlap to avoid losing context at boundaries. Smaller chunks give more precise search results; larger chunks give more context per result.

`splitSentences(text)`

Splits text into an array of individual sentences. Sentences are delimited by periods, exclamation marks, and question marks.

sentences are splitSentences("Hello world. How are you? I am fine!")
-- ["Hello world.", " How are you?", " I am fine!"]
say "Found " + sentences.length + " sentences"

`splitParagraphs(text)`

Splits text at double newlines (blank lines) into an array of paragraphs. Empty paragraphs are filtered out.

paragraphs are splitParagraphs(text)
for each p in paragraphs:
  say "---"
  say p

Practical example: PDF Q&A

Extract a PDF, chunk it, embed the chunks, and answer questions about the document:

-- Extract and process a PDF for question answering
text is await extract("research-paper.pdf")
say "Extracted " + text.length + " characters"

-- Chunk the text
chunks are chunk(text, 400, 80)
say "Created " + chunks.length + " chunks"

-- Build a vector store from the chunks
store is createVectorStore()
for each c in chunks:
  await store.add(c, {source: "research-paper.pdf"})

-- Ask a question
question is "What methodology did the authors use?"
results is await store.search(question, 3)
context is join(map_list(results, with r: r.text), "\n\n")

answer is ask claude question with system "Answer based only on this context:\n\n" + context
say answer

Full RAG pipeline

What is RAG? Retrieval-Augmented Generation (RAG) is a pattern where you first retrieve relevant documents from a knowledge base, then pass them as context to the LLM. This grounds the LLM's response in your actual data, reducing hallucinations and enabling answers about private or recent information the LLM was not trained on.

Here is a complete RAG pipeline with error handling and persistence:

-- Full RAG pipeline: extract -> chunk -> embed -> search -> ask

try:
  -- 1. Extract text from a PDF
  text is await extract("knowledge-base.pdf")
  say "Extracted " + text.length + " characters"

  -- 2. Chunk into manageable pieces
  chunks are chunk(text, 500, 50)
  say "Created " + chunks.length + " chunks"

  -- 3. Build the vector store
  store is createVectorStore()
  for each c in chunks:
    await store.add(c)

  -- 4. Save the vector store for reuse
  write("knowledge-store.json", store.toJSON())
  say "Vector store saved to knowledge-store.json"

  -- 5. Search for relevant context
  question is "What are the main features?"
  results is await store.search(question, 3)

  -- 6. Build context string
  context is join(map_list(results, with r: r.text), "\n\n")

  -- 7. Ask the LLM with retrieved context
  answer is ask claude question with system "Answer based on this context:\n\n" + context
  say answer

catch err:
  say "Error in RAG pipeline: " + err.message

Loading a saved vector store

Once you have saved a vector store, you can load it in subsequent runs without re-embedding:

-- Load a previously saved vector store
stored is read("knowledge-store.json")
store is VectorStore.fromJSON(stored)

-- Search and ask immediately
question is "How does authentication work?"
results is await store.search(question, 3)
context is join(map_list(results, with r: r.text), "\n\n")
answer is ask claude question with system "Answer based on this context:\n\n" + context
say answer

Next steps

Explore the Examples page for more AI patterns
Combine AI with web servers to build AI APIs
Add AI to your Discord bot

AI with Claude

Getting started

Manual setup

ask claude

Options

Available options

System prompts

Conversation history

Streaming responses

Example: building a chatbot

Example: AI-powered CLI tool

Multi-Provider Support

OpenAI

Google Gemini

Ollama (local models)

Default models

All provider options

Comparing providers

Error handling

Streaming

When to use streaming vs regular ask

Streaming from each provider

Streaming with options

Building a progress indicator

Structured Output

Supported types

Example 1: Simple extraction

Example 2: Product information

Example 3: Extracting a list of items

How it works

Error handling

AI Agents

API Reference

createAgent(name, options)

agent.addTool(name, description, function)

agent.run(goal)

Basic example

Real-world example: Research agent

Using agents with different providers

How the agent communicates

Embeddings & Vector Search

embed(text, provider, model)

cosineSimilarity(a, b)

VectorStore API

createVectorStore()

store.add(text, metadata)

store.addMany(texts, metadatas)

store.search(query, topK)

store.toJSON() / VectorStore.fromJSON(json)

Document Processing / RAG

extract(filePath)

chunk(text, size, overlap)

splitSentences(text)

splitParagraphs(text)

Practical example: PDF Q&A

Full RAG pipeline

Loading a saved vector store

`createAgent(name, options)`

`agent.addTool(name, description, function)`

`agent.run(goal)`

`embed(text, provider, model)`

`cosineSimilarity(a, b)`

`createVectorStore()`

`store.add(text, metadata)`

`store.addMany(texts, metadatas)`

`store.search(query, topK)`

`store.toJSON()` / `VectorStore.fromJSON(json)`

`extract(filePath)`

`chunk(text, size, overlap)`

`splitSentences(text)`

`splitParagraphs(text)`