Knowledge

Three ways to ground your agent in real information — uploaded PDFs (Documents), short Q&A snippets (Knowledge), and crawled public pages (Web Sources). Each surface lives as a tab on the agent detail page and feeds the same retrieval layer that runs on every request.

When to use which

  • DocumentsLong-form PDFs — manuals, handbooks, contracts, product spec sheets. We chunk, embed, and retrieve the matching passages on every request. Best for content you already have as a file.
  • Knowledge (Q&A)Short, opinionated snippets you want the agent to recite verbatim — pricing rules, brand voice guidelines, frequently asked questions. Max 300 characters per snippet. Best for content you can write in a single sentence.
  • Web SourcesA public URL (often your help centre, docs site, or product pages). We crawl it, chunk every page, and retrieve passages just like Documents. Best when the content already lives on the web and you want it to stay in sync.

Documents

Upload a PDF. We extract pages, embed chunks, and the agent retrieves the top-matching passages alongside the user message on every request — automatic RAG, no setup.

Agent → Documents
Documents · Website ChatbotSearch documents+ Add Document
FilePagesStatusUploaded
acme-product-guide.pdf42Ready2 days ago
refund-policy.pdf8Ready5 days ago
release-notes-v3.pdf3Ready1 week ago
draft-launch.pdf16Processing 64%Just uploaded
broken-scan.pdfError · OCR failed10 minutes ago
5 documents · 69 pages indexed12 credits used this month
  • + Add DocumentOpens a file picker. PDFs only. The file is uploaded, queued for OCR/extraction, then chunked + embedded. Status flips from Processing to Ready once indexing finishes.
  • Status pillProcessing N%, Ready, or Error. Errors show the reason on hover — common causes are scanned PDFs with no OCR layer and password-protected files.
  • SearchFilters the table by filename. Doesn't search inside document content (the agent does that at request time).
  • DeleteRemoves the document and all its embeddings. The next request to the agent no longer has access to it.
  • Credit indicatorDocument uploads cost credits proportional to total page count. The chip at the bottom shows your running usage. Low-credit accounts see an inline warning above the table.

How to upload a PDF

  1. 1

    Open the Documents tab

    From an agent's detail page, click the Documents tab.
  2. 2

    Click + Add Document

    The file picker opens. Pick a PDF — single file at a time.
  3. 3

    Wait for processing

    The row appears immediately with status Processing. Page count fills in as we extract. Typical 50-page PDF takes 10–30 seconds.
  4. 4

    Test in Playground

    Open Playground and ask a question whose answer is in the document. The reply should cite the matching passage.

Gotchas

  • Scanned PDFs without an OCR layer will fail with Error · OCR failed. Run the PDF through your preferred OCR tool first, then re-upload.
  • Very large PDFs (>500 pages) hit the per-document chunk limit and may take minutes to index. Split them into logical sections instead.
  • Updating a document means delete + re-upload. There's no in-place re-index yet — old embeddings stay associated with the old file until you remove it.
  • Password-protected files are rejected at upload time. Decrypt before uploading.

Knowledge (Q&A)

Short snippets the agent should treat as authoritative. Unlike Documents, snippets aren't chunked — each one is stored as a complete unit and retrieved when its embedding is most similar to the user's question.

Agent → Knowledge
Knowledge · Website ChatbotSearch snippets+ Add Knowledge
New snippet

Snippet

Add a short knowledge snippet (max 300 chars)…
0 / 300Save
SnippetHits · 7dAdded
Our standard refund window is 30 days from the delivery date.1842 weeks ago
Free shipping kicks in at $75 — applies in both US and Canada.921 week ago
Business hours: Mon–Fri 9 AM to 6 PM Pacific. Closed weekends and US holidays.613 days ago
Loyalty points expire 12 months after the date they were earned.12Yesterday
4 snippets · 349 hits this week
  • + Add KnowledgeOpens the inline composer. Type up to 300 characters and save. Saved snippets start retrieving on the next request.
  • Character counterLive counter under the textarea — turns red at 280 and blocks save at 300. Keeping snippets short forces precise wording, which retrieves better.
  • Hits columnHow many times this snippet was retrieved into a model context in the last 7 days. Low-hit snippets are dead weight; high-hit snippets are your most valuable.
  • SearchFilters the table by snippet text. Doesn't affect retrieval — purely a UI filter.
  • DeleteRemoves the snippet. Reversible only by re-typing.

How to add a snippet

  1. 1

    Click + Add Knowledge

    The composer appears at the top of the panel.
  2. 2

    Write one sentence

    Be specific. "Refunds are issued within 30 days" is good. "We have a flexible refund policy" is bad — too vague to retrieve well.
  3. 3

    Save

    The snippet is embedded and ready to retrieve on the very next request.
  4. 4

    Watch the hits column

    After a day or two, sort by Hits to see which snippets the agent actually uses. Tune wording or delete dead snippets.

Gotchas

  • Snippets aren't chunked. If your fact is longer than 300 characters, split it into two snippets — don't try to cram everything into one sentence at the cost of specificity.
  • Generic wording retrieves poorly. Embeddings reward concrete, distinctive phrases. "Free shipping over $75" retrieves better than "Shipping policies are generous".
  • Duplicate snippets dilute hits. Two near-identical snippets split traffic between themselves and neither shows as high-hit. Consolidate.
  • Snippets don't replace the system prompt. Use the prompt for tone, persona, and behaviour; use snippets for facts.

Web Sources

Point the crawler at a public URL — typically your help centre, docs site, or product catalogue — and we'll fetch the pages, extract clean text, embed each one, and serve them at retrieval time. Re-crawl on demand to pick up changes.

Agent → Web Sources
Web Sources · Website ChatbotSearch sources+ Add Web Source
New web source

Domain

example.com

Include paths

/docs, /help

Exclude paths

/admin, /internal

Max depth

3

Max pages

100

Concurrency

4
Start crawlEstimate: ~32 pages, ~14 credits
SourcePagesStatusLast crawl
docs.acme.com84 / 84ReadyYesterday
help.acme.com12 / 32Crawling 38%Just now
shop.acme.com210 / 300Partial · 12 errors2 days ago
legacy.acme.com0Error · 403 Forbidden1 hour ago
4 sources · 306 pages indexedRe-crawl all
  • + Add Web SourceOpens the form. Domain is required; everything else has sensible defaults.
  • DomainThe root the crawler starts at (example.com). Use the bare domain — protocol andwww are normalised automatically.
  • Include pathsComma-separated path prefixes the crawler is allowed to follow. Leave blank to crawl everything under the domain.
  • Exclude pathsComma-separated path prefixes to skip. Useful for keeping /admin, /account, or noisy archive sections out of the index.
  • Max depth / Max pages / ConcurrencyBounds the crawl. Defaults are 3 / 100 / 4 — enough for most product help centres without burning credits.
  • Page statusPer-page sub-status visible when you expand a source: ready, processing,error, skipped (robots.txt or non-HTML).
  • Re-crawlPer-source action. Re-fetches every page, updates embeddings, and replaces stale content.
  • DeleteRemoves the source and all its indexed pages. Reversible only by re-adding and re-crawling.

How to add and crawl a site

  1. 1

    Click + Add Web Source

    The form opens. Type your domain.
  2. 2

    Set include + exclude paths

    Narrow the crawl. For a help centre, /docs, /help is usually enough. Exclude /admin and anything that requires auth.
  3. 3

    Click Start crawl

    The source row appears with status Crawling N%. The estimate chip tells you how many credits it will likely cost.
  4. 4

    Wait for Ready

    For 100 pages, expect ~1–3 minutes. You can keep using the platform — crawls run in the background.
  5. 5

    Test in Playground

    Ask a question whose answer is on the crawled site. The reply should cite the matching URL.

Gotchas

  • Authentication walls block the crawler. Pages behind login render as the sign-in page. Either expose a public alternative or skip them with an exclude path.
  • JavaScript-rendered SPAs with no server-rendered HTML extract very little. Use SSR / SSG, or pre-render the pages you want indexed.
  • robots.txt is respected. Pages disallowed by your robots file are marked skipped. If you own the site and want them crawled, update robots.
  • Rate limits apply per domain. The crawler honours Retry-After and backs off. Huge sites (10k+ pages) take time — raise Max pages in steps, not in one jump.
  • Re-crawls cost credits. The page-count estimate shows up front. Re-crawl on a cadence that matches how often your source actually changes (weekly is usually plenty).

How retrieval works under the hood

On every API call, the agent embeds the user's message and looks up the top-K most similar chunks across Documents, Knowledge snippets, and Web Source pages combined. Matching passages are injected into the prompt before the LLM runs. You don't configure any of this — it's automatic and tuned per agent.

Open Tracing on any request to see exactly which chunks were retrieved, their similarity scores, and which source they came from. That's the fastest way to debug "why didn't the agent know X?".

Mix all three

Most production agents combine all three surfaces. Documents for long-form reference, Knowledge for crisp facts and tone enforcement, Web Sources for content that's already maintained on the public web. The retrieval layer ranks across all of them — you don't need to pick.