[LLM] 자동차 메뉴얼 질의응답 RAG W5-8

2025-07-25 11 분 소요

W5

4/7

목적

한국어 기반으로 구축된 RAG 시스템에 영어 모델을 도입하여 성능 및 활용성 확장
질의응답, 문서 청크, 검색 문맥을 번역하는 한-영, 영-한 번역 로직

번역 로직 추가

영어 모델로 실험하기 위해 오픈소스 번역 모델인 facebook/m2m100_418M 모델을 사용해 영-한, 한-영 번역 로직을 추가했으나, 번역 성능이 매우 떨어짐

주요 이슈

모델 추론 시 logits에 NaN 또는 음수 확률이 포함
해당 오류는 잘못된 입력 시퀀스, 학습되지 않은 특수 토큰, 비정상 토큰화로 발생

RuntimeError: probability tensor contains either inf, nan or element < 0

logits이 NaN값 반환

NaN in logits: True  Inf in logits: False

→ 확률 분포 계산 중 NaN값 포함, softmax 불가능

모델에 , 와 같은 특수토큰을 학습하지 않아서 발생하는 문제
특수토큰 직접 등록

SPECIAL_TOKENS = {
        "bos_token": "<bos>",
        "eos_token": "<eos>",
        "pad_token": "[PAD]",
        "additional_special_tokens": [
            "<start_of_turn>", "<end_of_turn>", "<start_of_turn>user", "<start_of_turn>model"
        ]
    }
        _tokenizer.add_special_tokens(SPECIAL_TOKENS)

토크나이저가 내부에서 사용하는 sacremoses 라이브러리 설치치

pip install sacremoses

청크 결과 이상

오류는 해결했으나, 저장된 한국어 청크와 영어로 번역된 청크가 이상하게 저장

원래는 이렇게 저장돼야 정상

4/8

목적

청크가 이상하게 저장되는 문제 해결
다양한 플랫폼(로컬모델, Ollama, OpenAI API 등)의 모델을 유연하게 교체하면서 사용할 수 있도록 하기 위해 모델 로더 추상화

주요 이슈

청크 저장 이상 : 청크 저장 시 정보가 누락되는 문제 발생
번역 품질 : 로컬 모델을 사용했을 때, 짧은 문장은 해석을 잘 하지만 비격식체나 길이가 긴 문장에 치명적

Model Loader 인터페이스

get_model_loader(provider=”local”) 형식으로 모델 로드하는 구조 통일
다양한 환경의 모델을 손쉽게 교체 및 테스트 가능

청크 띄어쓰기 누락

모듈화를 한 이후 답변 품질이 저하되는 현상 발생
검색된 문서들을 확인한 결과 모듈화 이후 띄어쓰기가 누락되는 현상 발견
띄어쓰기 누락으로 인해 답변의 품질이 낮아진 것으로 추정
전처리 모듈에서 정규표현식에 모든 공백을 제거하도록 하는 로직 있었음

4/9

번역 도구

로컬 → 성능 안좋음
deepL API → 테스트 결과, 길이가 길어지거나 비격식체를 사용하더라도 번역 성능 좋음. 50만단어 무료 제공
파파고 API

실험 결과

로컬

로컬모델로 번역하는 과정에서 토큰의 길이가 길어졌을 때, 성능이 현저히 떨어지는 문제 발생
전체 청크를 집어넣는게 아닌 문장단위로 해석한 뒤 청크로 나누도록 수정
품질이 조금 나아졌지만 로컬 모델로는 한계

deepL API

마찬가지로 한문장씩 번역 후 번역된 결과를 청킹하는 방식
번역 품질 좋음

청크 유사도 임계치 조정

청킹하는 과정에서 유사도 임계치가 0.8 → 청킹 기준이 엄격해 검색된 문서가 많이 안나와 성능 오히려 저하
0.6으로 조정 → 너무 많은 정보가 검색
0.7로 조정 → 적당한 양의 정보를 검색

4/11

모듈화

추가된 코드에 대해서 모듈화 진행

질의응답 내역 벡터DB 저장

질의응답을 벡터DB에 저장한 뒤, 모델이 과거 답변 내역을 함께 참고할 수 있도록 기능 추가
사용자 피드백이 good일 경우 해당 답변은 벡터DB에 저장되는 방식

질문 분할

사용자의 질문의 길이가 길어졌을 때, 성능이 떨어질 가능성 존재
질문의 길이가 일정 길이 이상 길어졌을 때 문장을 연결어를 기준으로 분할하도록 구현

['How can I properly prepare a vehicle for long-term storage to avoid battery drain and tire issues, and what maintenance should be performed before restarting it after being idle for an extended time?'] ['How can I properly prepare a vehicle for long-term storage to avoid battery drain and tire issues,',  'what maintenance should be performed before restarting it after being idle for an extended time?']

W6

요약

날짜	내용
4/14	MCP 서버 구축 및 파일 변환 자동화
4/15	LLaMA4, Gemma3 모델 오류 해결 시도
4/16	문서 cleaning 작업, 문서 주제별 분할
4/17	Dockerfile 제작, 문장 반복문제 해결
4/18	이미지 삽입, PDF 출력 자동화

Dockerfile

# CUDA 12.2 + cuDNN 8 runtime + Ubuntu 22.04 FROM nvidia/cuda:12.8.1-cudnn-runtime-ubuntu22.04 WORKDIR /workspace # 기본 패키지 설치 RUN apt-get update && apt-get install -y \    wget \    curl \    gnupg \    libxml2 \    build-essential \    nodejs \    npm \    && rm -rf /var/lib/apt/lists/* # CUDA 환경 변수 (필요시) ENV PATH=/usr/local/cuda/bin:$PATH ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH ENV NVIDIA_VISIBLE_DEVICES=all ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility ENV DEBIAN_FRONTEND=noninteractive ENV TZ=Asia/Seoul ENV TRANSFORMERS_CACHE=/workspace/.cache/huggingface/transformers ENV HF_HOME=/workspace/.cache/huggingface # Miniconda 설치 RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /miniconda.sh && \    bash /miniconda.sh -b -p /opt/conda && \    rm /miniconda.sh ENV PATH=/opt/conda/bin:$PATH RUN conda update -n base -c defaults conda && \    conda init bash # VSCode Server 설치 RUN wget https://github.com/coder/code-server/releases/download/v4.10.1/code-server-4.10.1-linux-amd64.tar.gz && \    tar -xvzf code-server-4.10.1-linux-amd64.tar.gz && \    mv code-server-4.10.1-linux-amd64 /usr/local/lib/code-server && \    ln -s /usr/local/lib/code-server/bin/code-server /usr/local/bin/code-server && \    rm code-server-4.10.1-linux-amd64.tar.gz # Conda 환경 구성 COPY environment.yml . RUN conda env create -f environment.yml # Conda 환경을 Jupyter Kernel로 등록 RUN /bin/bash -c "source /opt/conda/bin/activate sangwon && \    python -m ipykernel install --user --name sangwon --display-name 'sangwon'" # PyTorch + TensorFlow 최신 GPU 버전 설치 (CUDA 12.2 지원) RUN /bin/bash -c "source /opt/conda/bin/activate sangwon && \    pip install --upgrade pip && \    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 && \    pip install xformers flash-attn --no-build-isolation" SHELL ["/bin/bash", "-c"] CMD ["source /opt/conda/bin/activate sangwon && /bin/bash"]
name: sangwon channels:  - defaults  - conda-forge dependencies:  - python=3.11.11  - pip  - pip:      - sentence-transformers==3.4.1      - transformers==4.52.0.dev0      - torch==2.2.2      - requests==2.32.3      - openai==1.66.3      - numpy==1.26.4      - spacy==3.8.4      - scikit-learn==1.3.2 

데이터 전처리 및 문서 분할

목표
- 긴 설명서 PDF를 주제별로 나누어 모델이 집중된 컨텍스트를 이해할 수 있도록 유도
- 기존 벡터 기반 유사도 검색은 정보의 손실 발생 → 높은 정답률 도출
PDF → txt 변환
- 단어 중간에 개행문자가 삽입되어 의미 단위가 깨짐
- 문단 순서가 섞임
- 표가 깨진채 추출
해결 방법
- 수작업으로 cleaning 진행
- 문서 내용을 12개의 주제로 분할

모델 실험 및 정답률

분할 수준	사용 모델	정답률
3개	Gemma3	0
3개	Cogito-Qwen
12개	Gemma3	13
12개	Cogito-Qwen	19

반복 응답 문제 및 후처리 전략

문제점
- Cogito 모델이 답변 내에서 같은 문장을 여러번 반복하여 출력
초기 시도
- temperature, top_p, repetition_penalty, no_repeat_ngram_size 등 하이퍼파라미터 조정
- 프롬프트에 반복 금지 명시

해결 전략

프롬프트에서 <

answer

>, <

endanswer

> 토큰 사이에 답변을 하도록 강제

모델 출력 결과에서 해당 토큰 사이만 추출
정규표현식 기반 후처리 적용

결과

 <|answer|> 참고 이미지: ![시트 높이 조절 이미지](./image/image_52_2.png) 52페이지에 따르면, 시트 높이는 레버를 사용하여 조절할 수 있으며 "레버를 위로 반복하여 올릴수록 시트가 높아집니다." 또한 51페이지에 따르면, "모든 방향 및 디스플레이 계기에 대하여 명료한 시야를 확보할 수 있도록 시트 높이를 충분히 높게 맞추십시오." <|endanswer|> Here's my response in English based on the provided Korean documentation and instructions: The user asked how to adjust a seat that is too low. According to pages 51 and 52 of the manual, you can raise the seat height by using the lever - pulling it upward repeatedly will increase the seat height until desired position is reached. The manual also recommends adjusting the seat high enough for clear visibility of all directions and displays. I've included the relevant image reference from page 52 as specified in the formatting guidelines. No external knowledge was used beyond what's documented in these specific pages about seat adjustment procedures. Here's my response in English based on the provided Korean documentation and instructions: The user asked how to adjust a seat that is too low. According to pages 51 and 52 of the manual, you can raise the seat height by using the lever - pulling it upward repeatedly will increase the seat height until desired position is reached. The manual also recommends adjusting the seat high enough for clear visibility of all directions and displays. I've included the relevant image reference from page 52 as specified in the formatting guidelines. No external knowledge was used beyond what's documented in these specific pages about seat adjustment procedures. Here's my response in English based on the provided Korean documentation and instructions: The user asked how to adjust a seat that is too low. According to pages 51 and 52 of the manual, you can raise the seat height by using the lever - pulling it upward repeatedly will increase the seat height until desired position is reached. The manual also recommends adjusting the seat high enough for clear visibility of all directions and displays. I've included the relevant image reference from page 52 as specified in the formatting guidelines. No external knowledge was used beyond what's documented in these specific pages about seat adjustment procedures. Here's my response in English based on the provided Korean documentation and instructions: The user asked how to adjust a seat that is too low. According to pages 51 and 52 of the manual, you can raise the seat height by using the lever - pulling it upward repeatedly will increase the seat height until desired position is reached. The manual also recommends adjusting the seat high enough for clear visibility of all directions and displays. I've included the relevant image reference from page 52 as specified in the formatting guidelines. No external knowledge was used beyond what's documented in these specific pages about seat adjustment procedures. Here's my response in English based on the provided Korean documentation and instructions: The user asked how to adjust a seat that is too low. According to pages 51 and 52 of the manual, you can raise the seat height by using the lever - pulling it upward repeatedly will increase the seat height until desired position is reached. The manual also recommends adjusting the seat high enough for clear visibility of all directions and displays. I've included the relevant image reference from page 52 as specified in the formatting guidelines. No external knowledge was used beyond what's documented in these specific pages about seat adjustment procedures. Here's my response in English based on the provided Korean documentation and instructions: The user asked how to adjust a seat that is too low. According to pages 51 and 52 of the manual, you can raise the seat height by using the lever - pulling it upward repeatedly will increase the seat height until desired position is reached. The manual also recommends adjusting the seat high enough for clear visibility of all directions and displays. I've included the relevant image reference from page 52 as specified in the formatting guidelines. No external knowledge was used beyond what's documented in these specific pages about seat adjustment procedures. Here's my response in English based on the provided Korean documentation and instructions: The user asked how to adjust a seat that is too low. According to pages 51 and 52 of the manual, you can raise the seat height by using the lever - pulling it upward repeatedly will increase the seat height until desired position is reached. The manual also recommends adjusting the seat high enough for clear visibility of all directions and displays. I've included the relevant image reference from page 52 as specified in the formatting

후처리 답변

<후처리 답변> 참고 이미지: ![시트 높이 조절 이미지](./image/image_52_2.png) 52페이지에 따르면, 시트 높이는 레버를 사용하여 조절할 수 있으며 "레버를 위로 반복하여 올릴수록 시트가 높아집니다." 또한 51페이지에 따르면, "모든 방향 및 디스플레이 계기에 대하여 명료한 시야를 확보할 수 있도록 시트 높이를 충분히 높게 맞추십시오."

이미지 삽입 실험

목표
- 답변 내에서 이미지 정보를 함께 제공하여 이해도 향상
수행 방법
- 수동으로 문서 내 이미지 위치에 ![이미지](./image/image_24_1) 이미지 경로 입력
- 모델이 문서 내용을 읽고 해당 이미지를 포함하여 응답 생성
- 후처리 시 정규표현식으로 ![]() 태그를 추출
- PIL.Image.open()을 이용해 실제 이미지 파일 불러오기
- FPDF로 질문-답변-이미지를 하나의 PDF에 저장
결과

W7

날짜 : 4/21~4/25

빠른 모델 탐색

기존 사용중이던 cogito 모델은 추론 시 시간이 오래 걸리는 문제 발생→ 20문항에서 평균 55.51초 소요됨
빠른 추론 속도와 적정 수준의 응답 품질을 모두 확보할 수 있는 모델 탐색

모델명	파라미터수	용량	답변 성능
nikravan/glm-4vq	9B	7GB	X
mistralai/Mistral-Small-3.1-24B-Instruct-2503	24B	40GB	X
dnotitia/Llama-DNA-1.0-8B-Instruct	8B	15GB	X
MLP-KTLim/llama-3-Korean-Bllossom-8B	8B	16GB	X
ollama cogito:27b	27B	20GB	13/20
beomi/Yi-Ko-34B	34B	69.21GB	X
CohereLabs/aya-expanse-32b	32B	64.09GB	13/20
nbeerbower/gemma2-gutenberg-27B	27B	49GB
CohereLabs/aya-101	13B	51GB
HumanF-MarkrAI/Gukbap-Ovis2-34B-VL	34B	70GB

대형 모델 리스트

모델 이름	파라미터수	용량
deepcogito/cogito-v1-preview-llama-70B	70B	140GB
CohereLabs/c4ai-command-r-plus	104B	212GB
deepseek-ai/DeepSeek-R1	671B	855GB
deepseek-ai/DeepSeek-V3	671B	855GB

Ollama와 HuggingFace 품질 다른 이유

두가지 플랫폼에서 똑같은 크기의 똑같은 모델을 사용했지만, 답변 속도가 달랐고, 품질에도 차이가 났음.

모델의 용량 자체도 달랐음. → cogito-qwen 모델의 경우 huggingface는 60~70GB, Ollama에서는 20GB

참고 링크 : HuggingFace vs Ollama

플랫폼	내부 모델 형식	실행 방식
Ollama	GGUF(llama.cpp 기반)	CPU/GPU 없는 로컬 환경에서 빠르게 실행 가능
Hugging Face Transformers	Pytorch/Tensorflow 기반	일반적인 딥러닝 환경(GPU 필요, 정밀도 높음)

양자화 방식

ollama 모델은 GGUF 포맷 적용(Q4_0, Q4_K, Q6_K …)
HuggingFace 모델은 BitsAndBytes 양자화 적용(nf4, fp16, q4)
양자화 방법이 달라서 이론적으로 동일한 결과가 불가능함

GGUF 포맷

llama.cpp 계열 LLM을 위한 경량 통합 모델 포맷
llama.cpp 프로젝트 기반으로 만들어짐
CPU, GPU 없이도 로컬에서 빠르고 가볍게 LLM을 실행

문서 페이지 이동 기능

모델이 생성한 답변 내에서 참조된 페이지를 기반으로 브라우저에서 해당 위치를 직접 열 수 있도록 함
방식
- 정규표현식 기반 페이지 번호 추출(x페이지에 따르면…, x~y페이지에 따르면…)
- 하이퍼링크 생성
- 자동 크롬 실행

답변 토큰 일치율 계산

답변-문서 정합성 평가를 위한 지표 구현
키워드 추출
- 답변 내 명사 추출
- 문서 내 단어 집합과 겹치는 단어 개수 계산
- 일치한 단어 수 / 총 추출 키워드수 * 100

예상 질문 데이터 구축

질문, 문서 클래스, 질문에 해당되는 문서 내용 일부, 페이지, 유사 질문 데이터 리스트를 json 형태로 구축
{ "question": "차량이 자동으로 멈출 수 있나요?", "label": "4", "context": "#### 190페이지\n일부 차량에는 전방 충돌 방지 보조 시스템이 탑재되어 있으며, 긴급 상황 시 자동으로 브레이크를 작동시켜 사고를 방지합니다.", "page": "190", "similar_questions": [ "운전자가 반응 못 해도 브레이크 작동하나요?", "자동으로 차가 멈추는 기능 있어요?" ] }
목표 : 클래스 당 1,000개 이상 확보

분류기 모델

분할된 문서에 관한 질문을 입력했을 때, 어느 문서와 가장 관련이 있는지 분류해주는 모델
사용 모델 : monologg/kobert
input : 질문 텍스트
output : 문서 클래스(1~12번 문서의 다중 클래스 분류)

벤치마크

벤치마크	설명
Ko-GPQA	General-Purpose Question Answering. 한국어로 된 광범위한 일반 지식 질문을 통해 백그라운드 지식 수준을 평가.
Ko-Winogrande	Winograd Schema 기반으로, 문장 내 대명사의 지시 대상 판단 등 상식 기반 추론 능력을 테스트.
Ko-GSM8k	Grade School Math 8k의 한국어 버전. 초등 수준의 수학 문제 풀이 능력 평가. 다단계 추론 필요.
Ko-EQ Bench	Equivalent Question Benchmark. 다양한 방식으로 표현된 질문들 간의 의미적 등가성 판단. 의미 일반화 능력 평가.
Ko-IFEval	Inference Evaluation. 복잡한 조건, 전제, 결론 간의 관계를 이해하고 논리적 추론 정확도를 평가하는 벤치마크.
KorNAT-CKA	Causal Knowledge Assessment. 한국어 자연어 명령 수행 관련 문제에서 인과관계 추론 능력을 평가.
KorNAT-SVA	Situated Value Alignment. 특정 상황에서 명령에 따라 올바른 행동을 선택할 수 있는지, 즉 가치 정렬 능력을 테스트.
Ko-Harmlessness	유해한 발언이나 위험한 조언을 회피할 수 있는지 측정. 안전한 응답 생성 능력 평가.
Ko-Helpfulness	사용자의 질문에 대해 얼마나 유익하고 구체적인 정보를 제공하는지 평가. 실제 사용 유용성 지표.

W8

데이터 증강 및 정제작업

클래스 당 1,000개(총 12개 클래스)
ChatGPT를 활용한 데이터 증강
품질이 낮은 질문 삭제 및 수정

Twitter Facebook LinkedIn

W5

4/7

목적

번역 로직 추가

주요 이슈

청크 결과 이상

4/8

목적

주요 이슈

Model Loader 인터페이스

청크 띄어쓰기 누락

4/9

번역 도구

실험 결과

로컬

deepL API

청크 유사도 임계치 조정

4/11

모듈화

질의응답 내역 벡터DB 저장

질문 분할

W6

요약

Dockerfile

데이터 전처리 및 문서 분할

모델 실험 및 정답률

반복 응답 문제 및 후처리 전략

이미지 삽입 실험

W7

빠른 모델 탐색

대형 모델 리스트

Ollama와 HuggingFace 품질 다른 이유

양자화 방식

GGUF 포맷

문서 페이지 이동 기능

답변 토큰 일치율 계산

예상 질문 데이터 구축

분류기 모델

벤치마크

W8

데이터 증강 및 정제작업

공유하기

댓글남기기

참고

[LLM] Mercury Coder

[LLM] Self-RAG : Learning To Retrieve, Generate, and Critique Through Self-Reflection

[llm] 파인튜닝 기법

[LLM] MCP(Model Context Protocol)