Knowledge Discovery: Best practices for knowledge sources
Learn the best practices for sourcing knowledge in the field of AI knowledge discovery to enhance your research and analysis capabilities.
Knowledge Discovery: Best practices for knowledge sources
Learn the best practices for sourcing knowledge in the field of AI knowledge discovery to enhance your research and analysis capabilities.
How to write effective knowledge sources for RAG-Based Systems (Knowledge discovery)
This guide provides practical recommendations for writing or improving articles, manuals, and guides that will be used as source material in a RAG (Retrieval-Augmented Generation) AI system. The goal is to make it as easy as possible for the model to understand and retrieve relevant, precise information.
1. 🎯 Define a clear purpose for each document
Each document should answer one specific question or use case.
Clearly state what the document is about in the title and headings (e.g., “How to reset your password in ESM”, “VPN setup for remote employees”).
Avoid mixing topics — it's better to split long documents into smaller, focused ones.
2. 🧱 Use structured sections and clear headings
Organize content into logical sections: Introduction, Step-by-step, Notes, Common Issues, FAQ.
Use H2/H3 headings — this helps the model understand the structure and context of the content.
Each action or instruction should be in its own step or paragraph.
3. 🔢 Use numbered steps and bullet points
Instead of long paragraphs, use:
## How to reset your password
1. Go to the login page.
2. Click on “Forgot password”.
3. Enter your company email address.
4. Check your inbox and click the reset link.
AI handles this format much better than unstructured text.
4. 📌 Use consistent terminology
Use clear and consistent names for systems, teams, functions, and acronyms (e.g., “ESM Platform”, “IT Support Team”, “FortiClient VPN”).
Avoid using abbreviations or internal jargon unless defined.
5. 🧠 Provide context within the document
Add a brief explanation of why something is done. For example:
“Password reset is required every 90 days to comply with company security policy.”
The model understands purpose and intent better when context is available.
6. 🔍 Highlight key information
Emphasize important points (like paths, settings, warnings, tips) with bold text or visual formatting:
Note: After resetting your password, you will need to re-login to VPN.
This helps the model recognize what’s important in the response.
7. 📂 One document = one topic
❌ Poor practice: “Everything you need to know about ESM”
✅ Good practice:
“How to submit a leave request in ESS2”
“How to check ticket status in ESM”
“How to enable MFA for your company account”
8. 💬 Include FAQs and typical user questions
AI often responds to natural language questions. Include a section like:
## Frequently Asked Questions
- What if I didn’t receive the reset email?
- Can I reset my password outside the office network?
9. 🧪 Test before publishing
Try real user-like questions and see if the answers are easily found in the text.
Optionally, test the document through the AI system and evaluate the quality of generated responses.
10. 🧼 Use simple, correct language
AI performs best with short, clear, and grammatically correct sentences.
Avoid overly technical or complex phrases.
Use direct instructions rather than conditional or suggestive language.
✍️ Example: Bad vs Good
❌ Bad:
If the user is having login issues, they might try to remember the password, or go to the ESM login page and maybe use the reset option.
✅ Good:
How to reset your password:
Go to the ESM login page.
Click “Forgot password”.
Enter your company email.
Check your inbox and click the reset link.
⚠️ Important: Use a single language across all documents
To maintain a high level of accuracy and response quality in the AI system, all source documents must be written in the same language. Mixing languages within the knowledge base significantly reduces model efficiency and answer clarity.
Please decide upfront which primary language will be used (e.g., English, Finish, German, Swedish or Polish), and ensure that all articles, instructions, and documentation follow that choice consistently.
Table of Contents