meta

AI for Internal Tools — RAG over your docs

AI chat and search across your internal documents — wiki, PDFs, SharePoint, Notion, Google Drive, Confluence — with citations on every answer and guardrails against hallucinated facts.

Talk to Vatsal

The most common request we get is some version of: build me a chatbot over our docs. The hard part is not the chatbot — it's the retrieval, the citation accuracy, the freshness, the access permissions, and the cost predictability after the first month. We build retrieval-augmented systems that answer questions from your real document corpus, cite their sources, and stay honest about what they don't know. Most engagements ship a working pilot by week four.

Fit for

  • Canadian businesses with at least a few hundred internal documents that are referenced regularly and where finding the right answer takes meaningful time today
  • Teams whose documents live in a small number of known sources (SharePoint, Notion, Google Drive, Confluence, a help center, a shared file system) with stable access
  • Operators who want a system that cites sources and admits when it doesn't know, not one that always sounds confident

Not a fit for

  • Companies whose 'docs' are mostly tribal knowledge that lives in heads and Slack threads — there's nothing to retrieve
  • Teams that need an answer engine for highly regulated decisions without any human review — RAG is a tool, not a decision-maker
  • Use cases where the underlying documents are so out of date that the AI would mostly cite stale information

What you walk away with

  • A chat or search interface deployed where your team will actually use it — in Slack, Teams, your intranet, or as a standalone web app
  • Ingestion pipeline for your source systems — SharePoint, Notion, Google Drive, Confluence, a help center, a file share, a PDF folder — with incremental sync so new and edited documents flow in automatically
  • Vector storage in your environment using Pinecone, Weaviate, Qdrant, or pgvector on Postgres, chosen with your team based on scale and cost
  • Embeddings generated from your documents using the right embedding model for the language and domain, with re-embedding triggered when documents change
  • Citation on every answer — the user sees which document chunks the answer came from and can click through to the source
  • Hallucination guardrails — refusal prompts when retrieval confidence is low, exact-quote mode for compliance-sensitive answers, eval set for answer faithfulness
  • Cost monitoring across the three cost drivers — vector storage, embeddings, and inference — with alerts when usage drifts from forecast
  • Access permission inheritance so a user only sees answers from documents they have access to in the source system

How the engagement runs

  1. 1

    Discovery · Week 1

    We map the document corpus, the source systems, the access model, and the questions users actually ask today. Pull a sample of 30–50 real questions to use as the eval set.

  2. 2

    Architecture · Week 2

    Chunking strategy, embedding model, vector database, retrieval approach (semantic, hybrid, reranked), citation format, refusal policy. You see the architecture and the cost-per-query estimate before we build.

  3. 3

    Pilot build · Weeks 3–4

    Ingestion pipeline running against your real documents, retrieval and answer generation tested against the eval set. Weekly demo, weekly eval results with the questions the system got wrong called out explicitly.

  4. 4

    Production deployment · Weeks 5–6

    Phased rollout to a single team or department with monitoring already wired — query volume, retrieval accuracy, answer faithfulness, cost per query, user feedback. Expand once the metrics hold.

  5. 5

    Optimization · Weeks 7–10

    Tune against real user queries, expand the eval set, add new source systems if needed, hand off the runbook for re-indexing, prompt tuning, and cost management.

By industry

For Canadian Financial Services

Internal RAG for credit unions, MGAs, and brokerages — policy and procedure search for front-line staff, product-and-rate lookup for member service teams, compliance-document search for back-office. Citations on every answer, access permissions inherited from your existing document store, hosted in Canadian regions.

For Canadian Healthcare

Internal RAG for multi-site clinics, dental groups, and physiotherapy networks — clinical protocol search for providers, billing-code lookup for admin staff, provincial-program-rule search for intake. Tuned for your provincial framework and grounded in your own documents rather than a generic medical corpus.

For Canadian Customer Operations

Internal RAG for support, success, and operations teams — help-center search for agents, internal-runbook lookup for engineers, policy-and-process search for back-office. Surfaces in Slack, Teams, your helpdesk side panel, or wherever the team already works.

Selected work

Recent RAG and internal-tools work.

Vatsal is an excellent full stack developer and highly skilled project manager. He identified our business needs quickly and established a very strong framework. His incredible speed should be noted, this is a developer who doesn't waste time and hit every target date we threw at him.
Josiah Liesemer
Josiah Liesemer
IT Specialist and Developer, Zucora Home
Read more in our engineering log

Frequently asked

What data sources can you integrate with?
SharePoint, OneDrive, Notion, Google Drive, Confluence, GitHub wikis, Zendesk and HubSpot help centers, Intercom articles, S3 or Azure Blob document stores, Dropbox, Box, and most file systems with an API. PDFs, Word docs, Markdown, HTML, and slide decks are all handled. For sources without an API we build a sync adapter.
Where do the vectors live?
In a vector database we deploy with you — Pinecone, Weaviate, Qdrant, or pgvector running on your existing Postgres. Choice depends on scale, cost, and whether you want a managed service or to run it yourself. For Canadian data residency, we use Canadian regions on the managed services or self-host in AWS ca-central-1 or Azure Canada Central.
How often does the index get re-indexed?
Incrementally, in near real time. When a document changes in the source system, the ingestion pipeline detects it, re-chunks and re-embeds the affected chunks, and updates the vector store. Full reindex is only needed for things like an embedding-model change.
How predictable is the cost?
Three cost drivers: vector storage (typically the smallest and the steadiest), embeddings (one-time per document plus deltas), and inference (per query, the largest at scale). We share a forecast in the architecture phase and monitor against it monthly. Most engagements come in within 15% of the original forecast in steady state.
How do you prevent hallucinated answers?
Three layers. Retrieval confidence thresholds — below a cutoff the system refuses to answer rather than guess. Answer-faithfulness checks — the answer is verified against the retrieved chunks before it's returned. Citation on every claim — the user sees the source and can verify. For compliance-sensitive answers, we offer an exact-quote mode that returns retrieved passages verbatim.
How accurate are the citations?
Citations link back to the specific document chunks used to generate the answer. We measure citation accuracy as part of the eval — typically 90%+ for the cited-source-was-actually-used measure. The eval surfaces the cases where the system cited a source it didn't actually use and we tune retrieval to close those gaps.
What does it cost to build?
Pricing is custom per engagement and depends on the number of source systems, document volume, language and domain complexity, and the surface where the chat or search lives. We share pricing on the first discovery call. Inference and vector-database costs are passed through transparently — no markup.
Who does the work?
Two to three engineers from our Toronto-based team, led by Vatsal. The people who scope the engagement are the people who write the code. The full team is named on our team page — you can see and talk to them before we start.