AI with Claude

Quill has built-in AI syntax that makes it the easiest language to build AI-powered apps. Just write ask claude and you're done.

Getting started

The fastest way to start is with the scaffolding command:

quill ai my-app
cd my-app
cp .env.example .env    -- add your API key
npm install
quill run app.quill

This creates a project with the Anthropic SDK already configured.

Manual setup

  1. Install the Anthropic SDK: npm install @anthropic-ai/sdk
  2. Set your API key: export ANTHROPIC_API_KEY=your-key (or add it to a .env file)
  3. Write your Quill code and run it

Quill automatically loads .env files, so just create a .env file with:

ANTHROPIC_API_KEY=sk-ant-...

ask claude

The ask claude expression sends a message to Claude and returns the response text. It's a single line of code:

answer is ask claude "What is the capital of France?"
say answer

That's it. No imports, no client setup, no callbacks. Quill handles everything.

Under the hood, Quill automatically:

Options

You can customize the request by adding with followed by options:

answer is ask claude "Summarize this article" with model "claude-sonnet-4-20250514" max_tokens 500
say answer

Available options

Option Type Default Description
model string "claude-sonnet-4-20250514" Which Claude model to use
max_tokens number 1024 Maximum tokens in the response
system string (none) System prompt to set Claude's behavior
temperature number (API default) Randomness (0 = deterministic, 1 = creative)

System prompts

answer is ask claude "Explain quantum computing" with system "You are a patient teacher who explains things simply" max_tokens 2000
say answer

Conversation history

For multi-turn conversations, pass a messages variable instead of a string:

messages are [
  { role: "user", content: "Hi" },
  { role: "assistant", content: "Hello! How can I help?" },
  { role: "user", content: "What did I just say?" }
]
answer is ask claude messages
say answer

When you pass a variable (not a string literal), Quill treats it as a messages array and sends it directly to the API.

Streaming responses

For long responses, stream them token-by-token with stream claude:

stream claude "Write a poem about coding":
  say chunk

The chunk variable is automatically available inside the stream block. Each chunk contains a piece of the response as it arrives.

You can also use options with streaming:

stream claude "Tell me a long story" with max_tokens 4000:
  say chunk

Example: building a chatbot

Here's a complete interactive chatbot in Quill:

-- AI Chatbot built with Quill

use "readline" as readline

rl is readline.createInterface({ input: process.stdin, output: process.stdout })

messages are []

to chat prompt:
  messages.push({ role: "user", content: prompt })
  answer is ask claude messages with system "You are a helpful assistant"
  messages.push({ role: "assistant", content: answer })
  say answer

to askQuestion:
  rl.question("You: ", with input:
    if input is "exit":
      rl.close()
      give back nothing
    chat(input)
    askQuestion()
  )

say "Chatbot ready! Type 'exit' to quit."
askQuestion()

Example: AI-powered CLI tool

-- Code explainer CLI tool

code is process.argv[2]

if code is nothing:
  say "Usage: quill run explain.quill 'your code here'"
  process.exit(1)

explanation is ask claude code with system "Explain the following code in simple terms. Be concise." max_tokens 500

say "Explanation:"
say explanation

Run it: quill run explain.quill "for (let i = 0; i < 10; i++) { console.log(i); }"

Multi-Provider Support

Quill supports multiple LLM providers with the same simple ask syntax. Just swap the provider name. All four providers — Claude, OpenAI, Gemini, and Ollama — share identical syntax for options, streaming, structured output, and agents.

OpenAI

OpenAI provides GPT-4o and other models via their cloud API.

  1. Install the SDK: npm install openai
  2. Get an API key from platform.openai.com
  3. Add it to your .env file:
OPENAI_API_KEY=sk-proj-...

Then use it in your code:

answer is ask openai "Explain quantum computing"
say answer

-- With options
answer is ask openai "Translate to French" with model "gpt-4o" system "You are a translator"
say answer

Google Gemini

Google Gemini provides fast, capable models through the Generative AI SDK.

  1. Install the SDK: npm install @google/generative-ai
  2. Get an API key from Google AI Studio
  3. Add it to your .env file (either name works):
GEMINI_API_KEY=AIza...
# or alternatively:
GOOGLE_API_KEY=AIza...

Then use it in your code:

answer is ask gemini "Write a haiku about coding"
say answer

-- With model option
answer is ask gemini "Summarize this article" with model "gemini-pro"
say answer

Ollama (local models)

Run models locally for free with Ollama. No API key needed — everything runs on your machine. Perfect for privacy-sensitive applications or offline development.

  1. Install Ollama from ollama.com
  2. Pull a model: ollama pull llama3
  3. Make sure Ollama is running (it serves on http://localhost:11434)
-- Uses your local Ollama instance (no API key needed)
answer is ask ollama "Summarize this text"
say answer

-- Specify a model
answer is ask ollama "Explain recursion" with model "llama3"
say answer

Default models

Each provider uses a sensible default model when you don't specify one:

Provider Default Model API Key Env Variable
claude claude-sonnet-4-20250514 ANTHROPIC_API_KEY
openai gpt-4o OPENAI_API_KEY
gemini gemini-2.0-flash GEMINI_API_KEY or GOOGLE_API_KEY
ollama llama3 (none — local)

All provider options

These options work with every provider via the with keyword:

Option Type Default Description
model string (see table above) Which model to use for this provider
system string (none) System prompt to set the LLM's behavior
max_tokens number 1024 Maximum tokens in the response (OpenAI, Claude, Ollama)
temperature number (API default) Randomness: 0 = deterministic, 1 = creative (OpenAI, Claude, Ollama)

Comparing providers

You can send the same prompt to multiple providers and compare the results:

-- Compare responses from all 4 providers
prompt is "What are the benefits of functional programming? Answer in 2 sentences."

say "--- Claude ---"
say ask claude prompt

say "--- OpenAI ---"
say ask openai prompt

say "--- Gemini ---"
say ask gemini prompt

say "--- Ollama ---"
say ask ollama prompt

Error handling

If an API key is missing or invalid, you will get a runtime error. Wrap your calls in a try block to handle failures gracefully:

try:
  answer is ask openai "Hello"
  say answer
catch err:
  say "Error: " + err.message
  -- Common: "Incorrect API key provided" or module not found
All providers share the same syntax. The with keyword works with every provider for options like model, system, max_tokens, and temperature. You can switch providers by changing a single word — no other code changes needed.

Streaming

Streaming delivers the LLM's response piece by piece as it is generated, rather than waiting for the entire response to complete. This is ideal for long responses, real-time UI updates, and chatbot interfaces where you want the user to see output immediately.

When to use streaming vs regular ask

Inside the stream block, the chunk variable contains a piece of plain text (not JSON) — typically a few characters or a word at a time. Each chunk arrives as soon as the LLM generates it.

Streaming from each provider

-- Stream from Claude
stream claude "Write a poem about coding":
  say chunk

-- Stream from OpenAI
stream openai "Write a short story":
  say chunk

-- Stream from Gemini
stream gemini "Explain machine learning":
  say chunk

-- Stream from Ollama (local)
stream ollama "Describe the solar system":
  say chunk

Streaming with options

All with options work with streaming, just like regular ask:

stream openai "Tell me a long story" with model "gpt-4o" max_tokens 4000 temperature 0.9:
  say chunk

Building a progress indicator

You can collect chunks and track progress as the response streams in:

fullResponse is ""
chunkCount is 0

stream claude "Write a detailed essay about climate change" with max_tokens 2000:
  fullResponse is fullResponse + chunk
  chunkCount is chunkCount + 1

say "Received " + chunkCount + " chunks"
say "Total length: " + fullResponse.length + " characters"
Streaming works with all 4 providers using the same syntax. Just swap claude for openai, gemini, or ollama. The chunk variable and block syntax are identical across providers.

Structured Output

Structured output lets you get typed, parsed data back from any LLM instead of free-form text. Add as followed by a type shape to your ask call, and Quill will instruct the LLM to return JSON and automatically parse it into a typed object.

Supported types

Type Description Example value
text A string value, converted with String() "John Smith"
number A numeric value, converted with Number() 42
bool A boolean value, converted with Boolean() true
list An array of values. Non-array values are wrapped in [ ] ["red", "blue", "green"]

Example 1: Simple extraction

-- Extract name and age from natural language
person is ask claude "Extract: John Smith is 30 years old" as {name: text, age: number}
say person.name    -- "John Smith"
say person.age     -- 30

Example 2: Product information

-- Extract multiple types including booleans and lists
product is ask openai "Extract product info: The Nike Air Max 90 costs $120, is currently in stock, and comes in white, black, and red" as {name: text, price: number, inStock: bool, colors: list}
say product.name       -- "Nike Air Max 90"
say product.price      -- 120
say product.inStock    -- true
say product.colors     -- ["white", "black", "red"]

Example 3: Extracting a list of items

-- Extract structured data from a paragraph
paragraph is "Our team includes Alice (engineer, 5 years), Bob (designer, 3 years), and Carol (manager, 8 years)."
team is ask claude paragraph with system "Extract the team members" as {names: list, roles: list, years: list}
say team.names    -- ["Alice", "Bob", "Carol"]
say team.roles    -- ["engineer", "designer", "manager"]
say team.years    -- [5, 3, 8]

How it works

Under the hood, Quill modifies your prompt to ask the LLM to return JSON matching your schema, then parses the response with __parse_structured. It handles JSON wrapped in markdown code fences or embedded in text.

Error handling

If the LLM returns text that cannot be parsed as JSON, the result will contain an error field and the raw response:

result is ask claude "Tell me a joke" as {setup: text, punchline: text}

if result.error:
  say "Parsing failed: " + result.error
  say "Raw response: " + result.raw
otherwise:
  say result.setup
  say result.punchline
Tip: For best results, be specific in your prompt about the expected format. Structured output works with all 4 providers (Claude, OpenAI, Gemini, Ollama) — just swap the provider name.

AI Agents

An AI agent is an LLM combined with tools and a loop. You give the agent a goal, and it autonomously decides which tools to call, processes the results, and repeats until the task is complete. Think of it as giving the LLM hands and eyes — it can take actions, not just generate text.

How the agent loop works:
  1. You call agent.run(goal) with a task description
  2. The LLM sees the goal and available tools, then decides which tool to call
  3. The tool runs and returns a result
  4. The LLM sees the tool result and decides the next action
  5. This repeats until the LLM signals it is done, or maxTurns is reached

API Reference

createAgent(name, options)

Creates a new agent with the given name and configuration.

Option Type Default Description
provider string "claude" Which LLM provider to use ("claude", "openai", "gemini", "ollama")
model string (provider default) Which model to use. If not set, uses the provider's default model
system string (auto-generated) Custom system prompt. The default instructs the agent to use JSON for tool calls
maxTurns number 10 Maximum number of LLM calls before the agent stops. Prevents runaway loops

agent.addTool(name, description, function)

Registers a tool that the agent can call. The description is shown to the LLM so it knows when to use this tool.

agent.run(goal)

Starts the agent loop with the given goal. Returns the final result when the agent signals it is done, or a text response if maxTurns is reached.

Basic example

-- Create a simple math agent
to calculate expression:
  give back eval(expression)

myAgent is createAgent("math-helper", {provider: "claude", maxTurns: 5})
myAgent.addTool("calculate", "Evaluate a math expression", calculate)

result is await myAgent.run("What is 15% of 280, then add 42?")
say result

Real-world example: Research agent

Here is a more complete agent that can search the web and summarize findings:

-- Research agent with search and summarize tools

to searchWeb query:
  data is await fetchJSON("https://api.duckduckgo.com/?q=" + query + "&format=json")
  give back data.AbstractText or "No results found"

to summarizeText text:
  give back ask claude "Summarize in 3 bullets: " + text

myAgent is createAgent("researcher", {provider: "claude", maxTurns: 5})
myAgent.addTool("search", "Search the web for information", searchWeb)
myAgent.addTool("summarize", "Summarize a piece of text", summarizeText)

result is await myAgent.run("Find and summarize the latest news about AI safety")
say result

Using agents with different providers

Agents work with any provider. Just change the provider option:

-- Use OpenAI as the agent's brain
agent is createAgent("helper", {provider: "openai", model: "gpt-4o"})

-- Use a local Ollama model (free, private)
agent is createAgent("local-agent", {provider: "ollama", model: "llama3", maxTurns: 3})

How the agent communicates

The agent uses JSON messages to interact with tools. When it wants to call a tool, it responds with:

{"tool": "search", "args": {"query": "AI safety news"}}

When it has finished the task, it responds with:

{"done": true, "result": "Here are the findings..."}

If the LLM responds with plain text (not JSON), the agent treats it as the final answer and returns it directly.

Watch your token costs. Each agent turn uses tokens from your provider. Set maxTurns to a reasonable limit (3–10) to prevent runaway costs. The default is 10 turns. If the agent reaches maxTurns without completing, it returns "Agent reached max turns without completing."

Embeddings & Vector Search

Embeddings convert text into arrays of numbers (vectors) that capture semantic meaning. Similar text produces similar vectors, which lets you search by meaning rather than keywords. This is the foundation of RAG (Retrieval-Augmented Generation) and semantic search.

embed(text, provider, model)

Generates an embedding vector for a piece of text. Returns an array of numbers.

Parameter Type Default Description
text string (required) The text to convert into a vector
provider string "openai" "openai" or "ollama"
model string (see below) Embedding model name. Defaults: "text-embedding-3-small" (OpenAI), "nomic-embed-text" (Ollama)
-- Generate an embedding with OpenAI (default)
vec is await embed("Hello world")
say "Vector has " + vec.length + " dimensions"

-- Generate an embedding with Ollama (local, free)
vec is await embed("Hello world", "ollama", "nomic-embed-text")

cosineSimilarity(a, b)

Compares two embedding vectors and returns a similarity score. Use this when you want to compare two specific texts without a full vector store.

vec1 is await embed("I love programming")
vec2 is await embed("Coding is my passion")
vec3 is await embed("The weather is nice today")

say cosineSimilarity(vec1, vec2)  -- ~0.85 (very similar)
say cosineSimilarity(vec1, vec3)  -- ~0.15 (unrelated)
Understanding cosine similarity scores. Scores range from 0 to 1. A score of 1.0 means the texts are identical in meaning. Scores above 0.7 indicate strong similarity. Scores below 0.3 suggest the texts are unrelated.

VectorStore API

The VectorStore is an in-memory database for embedding vectors. It handles embedding, storage, and similarity search in one convenient interface.

createVectorStore()

Creates an empty vector store.

store is createVectorStore()

store.add(text, metadata)

Adds a single text to the store. Automatically generates the embedding. The optional metadata object lets you attach extra information (source, date, category, etc.).

await store.add("Quill compiles to JavaScript", {source: "docs", page: 1})
await store.add("Quill has built-in AI support", {source: "readme"})

store.addMany(texts, metadatas)

Batch-add multiple texts at once. The metadatas array is optional and should match the length of texts.

texts are ["First document", "Second document", "Third document"]
metas are [{id: 1}, {id: 2}, {id: 3}]
await store.addMany(texts, metas)

store.search(query, topK)

Searches the store for the most similar texts to the query. Returns an array of results, each with text, metadata, and score fields. The topK parameter controls how many results to return (default: 5).

-- Search for the 3 most relevant documents
results is await store.search("How does Quill work?", 3)
for each result in results:
  say result.text + " (score: " + result.score + ")"
  say "  Source: " + result.metadata.source

store.toJSON() / VectorStore.fromJSON(json)

Serialize and deserialize the vector store for persistence. This lets you build the store once and reuse it later without re-embedding all your documents.

-- Save the vector store to a file
write("knowledge.json", store.toJSON())

-- Load it later in another script
stored is read("knowledge.json")
store is VectorStore.fromJSON(stored)

-- Search immediately, no re-embedding needed
results is await store.search("my question", 3)
Tip: For large knowledge bases, save the vector store to a file after building it. Loading from JSON is instant, while re-embedding hundreds of documents can take minutes and use API credits.

Document Processing / RAG

Quill provides built-in functions to extract text from files, split it into chunks, and prepare it for embedding. These are the building blocks of any RAG (Retrieval-Augmented Generation) pipeline.

extract(filePath)

Reads a file and returns its text content as a string. Supports multiple file formats:

Format Extensions Notes
Plain text .txt Read as-is
Markdown .md Read as-is (raw markdown text)
CSV .csv Read as-is (raw CSV text)
JSON .json Parsed and pretty-printed with 2-space indentation
HTML .html, .htm HTML tags are stripped, leaving only text content
PDF .pdf Requires npm install pdf-parse
-- Extract text from different file types
text is await extract("document.pdf")
text is await extract("page.html")
text is await extract("data.json")
text is await extract("notes.md")
PDF support requires an extra package. Run npm install pdf-parse before using extract with PDF files. If the package is missing, you will get a clear error message telling you to install it.

chunk(text, size, overlap)

Splits a long text into overlapping chunks suitable for embedding. The chunker tries to break at sentence boundaries (periods and newlines) rather than mid-word, so chunks read naturally.

Parameter Type Default Description
text string (required) The text to split into chunks
size number 500 Maximum characters per chunk
overlap number 50 Character overlap between consecutive chunks to preserve context at boundaries
-- Split text into overlapping chunks
chunks are chunk(text, 500, 50)
say "Split into " + chunks.length + " chunks"

-- Smaller chunks with more overlap for higher precision
chunks are chunk(text, 300, 100)
Chunk size guidance. For RAG, chunk sizes of 300–500 characters work best. Use 50–100 character overlap to avoid losing context at boundaries. Smaller chunks give more precise search results; larger chunks give more context per result.

splitSentences(text)

Splits text into an array of individual sentences. Sentences are delimited by periods, exclamation marks, and question marks.

sentences are splitSentences("Hello world. How are you? I am fine!")
-- ["Hello world.", " How are you?", " I am fine!"]
say "Found " + sentences.length + " sentences"

splitParagraphs(text)

Splits text at double newlines (blank lines) into an array of paragraphs. Empty paragraphs are filtered out.

paragraphs are splitParagraphs(text)
for each p in paragraphs:
  say "---"
  say p

Practical example: PDF Q&A

Extract a PDF, chunk it, embed the chunks, and answer questions about the document:

-- Extract and process a PDF for question answering
text is await extract("research-paper.pdf")
say "Extracted " + text.length + " characters"

-- Chunk the text
chunks are chunk(text, 400, 80)
say "Created " + chunks.length + " chunks"

-- Build a vector store from the chunks
store is createVectorStore()
for each c in chunks:
  await store.add(c, {source: "research-paper.pdf"})

-- Ask a question
question is "What methodology did the authors use?"
results is await store.search(question, 3)
context is join(map_list(results, with r: r.text), "\n\n")

answer is ask claude question with system "Answer based only on this context:\n\n" + context
say answer

Full RAG pipeline

What is RAG? Retrieval-Augmented Generation (RAG) is a pattern where you first retrieve relevant documents from a knowledge base, then pass them as context to the LLM. This grounds the LLM's response in your actual data, reducing hallucinations and enabling answers about private or recent information the LLM was not trained on.

Here is a complete RAG pipeline with error handling and persistence:

-- Full RAG pipeline: extract -> chunk -> embed -> search -> ask

try:
  -- 1. Extract text from a PDF
  text is await extract("knowledge-base.pdf")
  say "Extracted " + text.length + " characters"

  -- 2. Chunk into manageable pieces
  chunks are chunk(text, 500, 50)
  say "Created " + chunks.length + " chunks"

  -- 3. Build the vector store
  store is createVectorStore()
  for each c in chunks:
    await store.add(c)

  -- 4. Save the vector store for reuse
  write("knowledge-store.json", store.toJSON())
  say "Vector store saved to knowledge-store.json"

  -- 5. Search for relevant context
  question is "What are the main features?"
  results is await store.search(question, 3)

  -- 6. Build context string
  context is join(map_list(results, with r: r.text), "\n\n")

  -- 7. Ask the LLM with retrieved context
  answer is ask claude question with system "Answer based on this context:\n\n" + context
  say answer

catch err:
  say "Error in RAG pipeline: " + err.message

Loading a saved vector store

Once you have saved a vector store, you can load it in subsequent runs without re-embedding:

-- Load a previously saved vector store
stored is read("knowledge-store.json")
store is VectorStore.fromJSON(stored)

-- Search and ask immediately
question is "How does authentication work?"
results is await store.search(question, 3)
context is join(map_list(results, with r: r.text), "\n\n")
answer is ask claude question with system "Answer based on this context:\n\n" + context
say answer
Next steps