Post

Building a RAG Pipeline to Reduce LLM Hallucination

Building a RAG Pipeline to Reduce LLM Hallucination

Problem

Participated in “PNU x Upstage Document AI Challenge 2025” and developed “DocDoc,” a medical AI assistant for overseas medical teams.

The core feature was “paper search and summarization.” Initially, I just passed user questions directly to the LLM. The problem was the LLM kept hallucinating - making up research results that didn’t exist.


Solution: RAG (Retrieval-Augmented Generation)

Instead of relying solely on the LLM’s knowledge, I made it answer based on real documents.

Pipeline:

  1. Pre-process medical papers (split text into chunks)
  2. Convert each chunk to vector embeddings using Upstage Embedding API
  3. Store in Pinecone (vector database)
  4. When user asks a question, convert it to embedding and search similar chunks
  5. Pass retrieved chunks + question to LLM, generate answer

Tech Stack

  • Vector DB: Pinecone
  • Embedding: Upstage Embedding API
  • LLM: Upstage Solar
  • Framework: LangChain
  • Backend: Node.js + Express

Results

After applying RAG:

  • Answers were based on actual papers, not hallucinated
  • LLM could cite which paper the information came from
  • Response quality improved a lot

Lessons Learned

  • LLM by itself has knowledge limits and hallucination risks
  • Combining LLM with document retrieval (RAG) maintains reliability
  • Vector DB is a key component in AI applications

From developing “DocDoc,” which advanced to finals at PNU x Upstage DOCUMENT AI CHALLENGE 2025.

This post is licensed under CC BY 4.0 by the author.