Knowledge Base

How to build and maintain your chatbot's knowledge base.

What is a Knowledge Base

The Knowledge Base (KB) is the collection of all the information your chatbot uses to answer visitor questions. Think of it as the bot's "memory": everything it knows, it knows because you taught it through the KB.

If a visitor asks a question about a topic that is not in the KB, the bot won't have the information needed to answer accurately. Conversely, if the KB contains clear, complete, and well-organized information, the bot will be able to provide precise and helpful answers.

In short: the quality of the bot's answers depends directly on the quality of your Knowledge Base. A bot with a well-curated KB answers well; a bot with a messy or incomplete KB answers poorly.

How search works

When a visitor asks a question, the bot doesn't search for exact words like a traditional search engine would. Instead, it uses a technology called semantic search: it understands the meaning of the question and looks for the most relevant passages in the KB.

This means that:

  • If a visitor asks "What time do you open?" and the KB says "Opening hours are 9 AM to 6 PM", the bot will find the information even though the words are different
  • You don't need to predict every possible way a question might be phrased — the bot understands variations
  • However, the information must exist somewhere in the KB: the bot cannot make up answers

Once the most relevant passages are found, the bot uses them as the basis to compose a natural-language response, adapting it to the tone and style you've configured.

Content types

IKIbrain's KB supports six content types, each suited to different situations:

Web Pages

Pages imported directly from your website. Ideal for product pages, site FAQs, blog posts, "About Us" pages. You can import a single page or scan an entire section of your site.

Files

Uploaded documents (PDF, DOCX, TXT, PPTX, CSV). Ideal for manuals, catalogs, brochures, price lists — content that already exists as a document. The system automatically splits them into text blocks the bot can search.

File privacy: Uploaded files are stored privately in secure cloud storage and are never publicly accessible. Only users with dashboard access to that bot can download the original file, via a temporary link that expires after 10 minutes. Chat widget visitors cannot access uploaded files — they only receive the text the bot extracts from them.

Snippets

Text notes written directly in the panel. Ideal for information that doesn't exist in any document or web page: opening hours, contacts, specific instructions, company policies, answers to particular questions.

Q&A

Question-answer pairs. Unlike other content, when the bot matches a Q&A it returns the exact answer you wrote, without rephrasing it. Ideal for information that must be 100% accurate: prices, legal terms, contact details.

Media

Images and videos (YouTube/Vimeo) that the bot can show during conversations. The bot decides when to display them based on the usage context you associate with each media item.

Products

Product catalog imported automatically from your store (WooCommerce, Shopify, or IKIshop). Each product is indexed so the bot can answer about name, price, availability, and details. Unlike the other content types, products are read-only: they are managed in your store and only synchronized here. This content type is available only for e-commerce chatbots connected to a store.

Which type to use? When the information already exists in a document, use Files. When it's on a web page, use Web Pages. When it doesn't exist anywhere, create a Snippet. When the answer must be exact word-for-word, use a Q&A.

Variants and the Knowledge Base

If your bot uses variants, the Knowledge Base is shared between the main bot and all its variants. However, for Files, Snippets, Q&A, and Media you can create content exclusive to a variant using the "Scope" selector at the top of the page.

  • Web Pages are always shared across all variants
  • Files, Snippets, Q&A, and Media can be shared or exclusive to a variant
The "Clear all" buttons delete all content of that type, including exclusive content from all variants — not just those currently visible.

Preparing documents

How you prepare content before uploading it has a huge impact on response quality. Here are the fundamental rules:

Well-focused documents

Prefer small, specific documents rather than one enormous PDF with everything in it. A 5-page document on a single topic produces better results than a 200-page document covering everything about your business.

Clear, well-structured text

Documents with clear headings, subheadings, and well-defined paragraphs produce better results. Avoid blocks of unformatted text: the AI that powers the system works better when it can identify logical blocks of information.

Add descriptive titles and notes

When you upload a file or import a web page, it can be useful to add a descriptive title and notes explaining the content or providing additional information about the source. This helps the bot understand the context and quickly find the right information at the right time.

PDFs: tips for best results

The OCR technology built into IKIbrain can understand the visual layout of pages, including tables and multi-column formats. However, some types of content may produce imperfect results:

  • Highly complex tables with merged, nested, or irregularly structured cells may not be reconstructed correctly
  • Diagrams, arrows, and graphic references — the system extracts any text present, but does not reconstruct visual relationships (e.g., “A points to B” or the meaning of arrows and legends)
  • Scanned PDFs with low quality or handwritten text may produce inaccurate results — whenever possible, prefer PDFs with selectable text

For best results, prefer documents with a clear structure and predominantly textual content.

Remove the unnecessary

Before uploading a document, remove parts that aren't useful: covers, tables of contents, blank pages, irrelevant sections (e.g., credits, forewords, etc.). The less "noise" there is, the more accurate the answers will be.

Avoiding contradictions

This is one of the most important rules: KB content must not contradict itself.

If one document says a service costs €100 and another says the same service costs €150, the bot will face conflicting information and may give a wrong or confusing answer.

The most common situations

  • Outdated versions — you uploaded a 2024 price list and then the 2025 one, but didn't remove the old one. The bot might quote the wrong prices
  • Duplicate content — the same information exists in a PDF file, a web page, and a snippet, but with slightly different details
  • Partial information — one document talks about a product generically, another describes it in detail with different specifications

How to prevent contradictions

  • When you update a document, always delete the previous version before uploading the new one
  • Avoid putting the same information in multiple places — choose one authoritative source for each piece of data
  • If in doubt, ask the bot a question and check which source it cites in the response
  • Use the Performance section to spot inconsistent answers
Golden rule: for every piece of information, there should be one single version in the KB. If data changes, update the original source and remove outdated versions.

What NOT to add to the KB

Everything you upload to the Knowledge Base becomes the bot's knowledge: it can be used to answer any visitor. This means you need to think carefully about what you include, because the bot does not distinguish between information you want to share and information you'd prefer to keep private.

Always remember: never upload any information to the KB that you would not want communicated to a customer or a visitor on your website.

Confidential or internal documents

Supplier contracts, purchase price lists, meeting minutes, org charts, internal notes, drafts. If a customer asks "how much do you pay your supplier?" or "who is responsible for...", the bot could answer using this information.

Personal data

If a document contains names, email addresses, phone numbers, or other personal data (e.g., a customer list or attendance sheet), the bot could reveal them in its responses. Beyond reputational damage, this can create GDPR compliance issues.

Scans and low-quality files

Do not upload scans or files that are poorly readable or of low quality. The processing result will be unpredictable: the bot could quote corrupted, partial, or incomprehensible text in its responses.

Content with temporary information

Be careful with documents containing phrases like "the promotion is active this month" or "until the end of the year" without specifying exact dates. When the promotion expires, the bot will continue to offer it as if it were still valid. If you upload content with expiration dates, remember to update or remove it at the appropriate time.

Content in multiple languages

Do not upload the same content translated into different languages: this would be a mistake. The bot can respond in any language regardless of the language of the content in the KB. Uploading duplicates in multiple languages introduces noise and increases the risk of contradictions between versions.

Upload content once, in the original language. The bot will take care of responding in the visitor's language.

Quality vs quantity

A common mistake is thinking that more content = better bot. In reality, the opposite is true: uploading large amounts of irrelevant or low-quality content significantly worsens the answers.

Why too much content is a problem?

When the bot searches the KB, it needs to find the most relevant content among all available items. If the KB contains a lot of "noise" (vague, duplicate, or irrelevant content), the bot doesn't "understand" like a person: it selects the content that seems most relevant to the question — if the content is confusing or redundant, it increases the risk of picking the wrong one.

What to upload

  • Information your customers actually ask about — check the Performance and Topics sections to understand what they're looking for
  • Up-to-date, accurate content — a price list that's 5 years old does more harm than good
  • Clear, direct text — marketing language with little substance doesn't help the bot provide useful answers

Keeping the KB updated

A KB is not something you set up once and forget about. Information changes: prices, hours, products, policies. A bot that gives outdated information is worse than a bot that says "I don't know".

Maintenance routine

  • Check the Performance section at least once a week: it shows questions the bot struggles with and the weakest sources
  • Update content when information changes (prices, hours, services, etc.)
  • Remove outdated content — an old catalog with wrong prices can cause harm
  • Check Topics to discover new topics users are asking about that the bot isn't yet prepared for

Signs that the KB needs attention

  • Many low-confidence interactions in the Performance section
  • The same question appears repeatedly among interactions needing improvement
  • Customers leave negative feedback (thumbs down)
  • The bot responds with outdated information

Need support?

The IKIweb team configures and maintains IKIbrain for you. Tell us about your project.

Get in touch