Building a RAG Pipeline to Reduce LLM Hallucination
Building a RAG Pipeline to Reduce LLM Hallucination
Problem
Participated in “PNU x Upstage Document AI Challenge 2025” and developed “DocDoc,” a medical AI assistant for overseas medical teams.
The core feature was “paper search and summarization.” Initially, I just passed user questions directly to the LLM. The problem was the LLM kept hallucinating - making up research results that didn’t exist.
Solution: RAG (Retrieval-Augmented Generation)
Instead of relying solely on the LLM’s knowledge, I made it answer based on real documents.
Pipeline:
- Pre-process medical papers (split text into chunks)
- Convert each chunk to vector embeddings using Upstage Embedding API
- Store in Pinecone (vector database)
- When user asks a question, convert it to embedding and search similar chunks
- Pass retrieved chunks + question to LLM, generate answer
Tech Stack
- Vector DB: Pinecone
- Embedding: Upstage Embedding API
- LLM: Upstage Solar
- Framework: LangChain
- Backend: Node.js + Express
Results
After applying RAG:
- Answers were based on actual papers, not hallucinated
- LLM could cite which paper the information came from
- Response quality improved a lot
Lessons Learned
- LLM by itself has knowledge limits and hallucination risks
- Combining LLM with document retrieval (RAG) maintains reliability
- Vector DB is a key component in AI applications
From developing “DocDoc,” which advanced to finals at PNU x Upstage DOCUMENT AI CHALLENGE 2025.
This post is licensed under CC BY 4.0 by the author.