Zum Inhalt springen
>_<
AI EngineeringWiki

RAG Guide

Tools · 8 min

RAG (Retrieval-Augmented Generation) combines the power of LLMs with your own documents. The AI can answer questions about your data - without training.

How RAG Works

1. User asks question
2. Question → Embedding Model
3. Embedding → Vector Database
4. Find similar documents
5. Documents + Question → LLM
6. LLM generates answer

Components

  • Document Loader: PDF, Markdown, HTML, Text
  • Text Splitter: Split into chunks
  • Embedding Model: Convert to vectors
  • Vector Database: Store and search
  • LLM: Generate answer from context

Popular Tools

ToolTypeBest For
ChromaDBVector DBSimple setups
QdrantVector DBProduction
Neo4jGraph DBKnowledge Graphs
pgvectorVector DBPostgreSQL users

Basic RAG Pipeline

# 1. Load and split documents
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader("my-docs.txt")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter()
chunks = splitter.split_documents(docs)

# 2. Create embeddings and store
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OllamaEmbeddings(model="nomic-embed-text")
db = Chroma.from_documents(chunks, embeddings)

# 3. Query
query = "What is our return policy?"
docs = db.similarity_search(query)

# 4. Get answer from LLM
from langchain_community.chat_models import ChatOllama
llm = ChatOllama(model="llama3:8b")
result = llm.invoke(f"Answer based on: {docs}")

Next step: ship workflows that stay operable

Use proven n8n patterns, templates and integrations for workflows that stay local, documented, and auditable.

Why AI Engineering
  • Local and self-hosted by default
  • Documented and auditable
  • Built from our own runtime
  • Made in Austria
Not legal advice.