Claude MCP 에이전트 디버깅: 도구 호출 추적부터 로그 시각화까지

MCP 에이전트 투명성 문제의 구조
JSON-RPC 메시지 로깅 구현
- 서버 측 도구 호출 로깅
- 클라이언트 측 디버그 로그 확인
도구 호출 체인 추적
- 세션 기반 체인 추적기
- 추적기와 로깅 서버 통합
로그 파싱과 구조화
Claude MCP 에이전트 디버깅 실전 패턴
로그 시각화 파이프라인
- Streamlit 경량 대시보드
- Grafana + Loki 조합
자동화된 디버깅 파이프라인
- 에러율 기반 알림
- 실패 체인 자동 덤프
MCP 서버 테스트와 디버깅 도구 비교
전체 파이프라인과 다음 단계

Claude MCP 에이전트 디버깅이 어려운 이유는 구조적이다. MCP(Model Context Protocol)는 JSON-RPC 2.0 기반으로 클라이언트-서버 간 메시지가 전부 구조화되어 있지만, 이 메시지를 수집하고 읽는 파이프라인은 기본 제공되지 않는다. 에이전트가 하나의 태스크에서 도구를 5~20회 연쇄 호출하는 상황에서, 중간 한 곳이 예상과 다른 값을 반환하면 최종 결과가 어긋난다. 어디서 틀어졌는지 추적할 수단이 없으면 원인 파악에 걸리는 시간은 호출 수에 비례해 늘어난다.

이 글은 MCP 서버의 도구 호출 로깅 → 체인 추적 → 로그 구조화 → 시각화 → 자동화까지, Claude MCP 에이전트 디버깅에 필요한 전체 파이프라인을 코드와 함께 다룬다. Python mcp 패키지 v1.27.0 기준이며, FastMCP 고수준 API를 주로 사용한다.

MCP 에이전트 투명성 문제의 구조

MCP 아키텍처에서 Claude(클라이언트)는 MCP 서버에 tools/call 요청을 보내고, 서버가 실행 결과를 반환한다. 단일 호출이면 단순하지만, 에이전트는 이전 호출 결과를 보고 다음 도구를 결정한다. 각 호출의 출력이 다음 호출의 컨텍스트가 되는 구조다.

sequenceDiagram
    participant C as Claude (Client)
    participant S as MCP Server
    participant T as Tool (DB/API/FS)
    C->>S: tools/call {name: "query_db", arguments: {...}}
    S->>T: SQL 실행
    T-->>S: 결과 반환
    S-->>C: result: {content: [...], isError: false}
    Note over C: 결과 기반으로 다음 도구 결정
    C->>S: tools/call {name: "write_file", arguments: {...}}
    S->>T: 파일 쓰기
    T-->>S: 성공/실패
    S-->>C: result: {content: [...], isError: false}

투명성 문제는 세 레이어에서 동시에 발생한다.

클라이언트 측 블랙박스

Claude가 왜 특정 도구를 선택했는지는 모델 내부 추론이라 직접 볼 수 없다. tools/call 요청의 arguments를 보면 의도를 역추론할 수는 있지만, 선택 이유 자체는 로그에 남지 않는다. 도구 목록이 10개 이상일 때 에이전트가 엉뚱한 도구를 고르는 경우, arguments 로그만이 유일한 단서다.

전송 계층에서 메시지가 사라지는 경로

stdio 전송을 쓸 때 MCP 서버는 stdout으로 JSON-RPC 응답을 보내고, stderr로 로그를 출력한다. 문제는 서버 코드에서 print()를 쓰거나 라이브러리가 stdout에 뭔가를 쓰면 JSON-RPC 메시지와 섞여 파싱이 깨진다는 점이다. MCP 공식 디버깅 가이드에서도 “local MCP servers should not log messages to stdout”이라고 명시하고 있다. Streamable HTTP 전송이면 이 문제는 없지만, 대신 HTTP 레벨에서 요청/응답을 캡처해야 하므로 별도 미들웨어가 필요하다. 어느 쪽이든 기본 설정에서는 JSON-RPC 메시지 원문이 어디에도 저장되지 않는다.

서버 측 에러의 두 가지 계층

MCP 스펙은 에러를 두 계층으로 구분한다.

계층	발생 조건	JSON-RPC 응답 형태	예시
프로토콜 에러	존재하지 않는 도구 호출, 잘못된 요청 구조	`"error": {"code": -32602, "message": "..."}`	`Unknown tool: invalid_name`
도구 실행 에러	도구가 정상 호출됐지만 내부에서 실패	`"result": {"content": [...], "isError": true}`	API 키 만료, 입력값 범위 초과

프로토콜 에러는 요청 구조 자체가 잘못된 것이고, 도구 실행 에러는 도구가 호출은 됐지만 내부 로직에서 실패한 것이다. 로깅과 디버깅 시 이 두 계층을 분리해야 원인 분류가 가능하다.

isError 플래그를 무시하면 안 된다
도구 실행 에러(`isError: true`)는 JSON-RPC 레벨에서는 “성공” 응답이다. HTTP 200, JSON-RPC result 필드가 존재한다. `isError` 플래그를 체크하지 않으면 실패한 도구 호출을 성공으로 집계하게 된다.

## JSON-RPC 메시지 로깅 구현

Claude MCP 에이전트 디버깅의 첫 단계는 클라이언트-서버 간 오가는 JSON-RPC 메시지를 전부 캡처하는 것이다. FastMCP의 Context 객체가 제공하는 request_id, session_id를 활용하면 각 요청을 식별할 수 있다.

서버 측 도구 호출 로깅

FastMCP의 @mcp.tool() 데코레이터로 도구를 등록하고, 핸들러 내부에서 Context를 주입받아 입출력을 기록한다. 에러 발생 시 isError: true를 명시적으로 반환하는 게 핵심이다.

import json
import time
from datetime import datetime, timezone
from pathlib import Path
from fastmcp import FastMCP, Context

LOG_DIR = Path("./mcp_logs")
LOG_DIR.mkdir(exist_ok=True)

mcp = FastMCP(name="debug-server")

def _append_log(path: Path, entry: dict):
    with open(path, "a", encoding="utf-8") as f:
        f.write(json.dumps(entry, ensure_ascii=False, default=str) + "\n")

def _get_log_file() -> Path:
    return LOG_DIR / f"tools_{datetime.now(timezone.utc).strftime('%Y%m%d')}.jsonl"

@mcp.tool()
async def query_db(ctx: Context, sql: str, params: dict | None = None) -> str:
    """데이터베이스 쿼리를 실행한다."""
    log_file = _get_log_file()
    start = time.monotonic()

    entry_req = {
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "direction": "request",
        "request_id": ctx.request_id,
        "session_id": ctx.session_id,
        "tool_name": "query_db",
        "arguments": {"sql": sql, "params": params},
    }
    _append_log(log_file, entry_req)
    await ctx.info(f"query_db called: {sql[:80]}")

    try:
        # 실제 DB 쿼리 로직 (프로젝트에 맞게 교체)
        result = json.dumps({"rows": [], "count": 0})
        duration_ms = (time.monotonic() - start) * 1000

        entry_resp = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "direction": "response",
            "request_id": ctx.request_id,
            "tool_name": "query_db",
            "is_error": False,
            "duration_ms": round(duration_ms, 1),
            "content_length": len(result),
        }
        _append_log(log_file, entry_resp)
        return result

    except Exception as e:
        duration_ms = (time.monotonic() - start) * 1000
        entry_err = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "direction": "error",
            "request_id": ctx.request_id,
            "tool_name": "query_db",
            "error_type": type(e).__name__,
            "error_message": str(e),
            "duration_ms": round(duration_ms, 1),
        }
        _append_log(log_file, entry_err)
        await ctx.error(f"query_db failed: {e}")
        raise  # FastMCP가 isError: true로 변환

마지막 raise가 중요하다. FastMCP는 도구 핸들러에서 예외가 발생하면 자동으로 isError: true 응답을 생성한다. 이전 버전 글에서 return [TextContent(type="text", text=f"Error: {e}")]로 반환하던 방식은 정상 응답과 에러 응답이 구분되지 않는 문제가 있었다. 예외를 re-raise하면 MCP 클라이언트(Claude)가 이 응답을 에러로 인식하고 자체적으로 재시도하거나 다른 도구를 선택할 수 있다.

JSONL(JSON Lines) 포맷을 쓰는 이유가 있다. 일반 JSON 배열([{}, {}])은 파일 끝에 ]를 붙여야 해서 비정상 종료 시 파싱이 깨진다. JSONL은 각 줄이 독립적인 JSON 객체라 한 줄이 깨져도 나머지를 읽을 수 있다.

클라이언트 측 디버그 로그 확인

MCP 서버의 로그 외에, 클라이언트 측 로그도 확인해야 전체 그림이 보인다. Claude Desktop은 MCP 서버와의 통신 로그를 자동으로 기록한다.

# macOS — Claude Desktop 로그 확인
tail -n 50 -F ~/Library/Logs/Claude/mcp*.log

# Windows (PowerShell)
Get-Content "$env:AppData\Claude\logs\mcp*.log" -Tail 50 -Wait

이 로그에는 서버 연결 이벤트, 설정 문제, 런타임 에러, 메시지 교환 내역이 기록된다. Chrome DevTools도 쓸 수 있다. ~/Library/Application Support/Claude/developer_settings.json에 {"allowDevTools": true}를 저장한 뒤 Command-Option-I로 DevTools를 열면 Network 패널에서 실시간 메시지를 볼 수 있다.

MCP Inspector로 서버 단독 테스트
클라이언트를 거치지 않고 MCP 서버만 독립적으로 테스트하려면 [MCP Inspector](https://github.com/modelcontextprotocol/inspector)를 쓴다. `npx @modelcontextprotocol/inspector uv –directory ./my-server run my-server` 한 줄이면 브라우저에서 도구 호출, 응답 확인, 알림 스트림 모니터링이 전부 가능하다.

## 도구 호출 체인 추적

개별 로그 라인이 쌓여도 “이 호출이 어떤 대화의 몇 번째 단계인지”를 알 수 없으면 디버깅이 안 된다. 에이전트가 한 태스크에서 도구를 8번 호출했으면, 그 8개가 하나의 체인임을 식별해야 한다.

MCP 프로토콜 자체에는 “체인 ID” 개념이 없다. JSON-RPC의 id 필드는 요청-응답 매칭용이지 체인 추적용이 아니다. FastMCP의 Context.session_id는 MCP 세션 단위 식별자로, 같은 세션 안의 여러 tools/call을 묶는 데 쓸 수 있다. 하지만 세션 안에서도 에이전트가 여러 태스크를 순차적으로 수행할 수 있으므로, 태스크 단위의 추적이 필요하면 시간 기반 윈도우 방식을 조합해야 한다.

세션 기반 체인 추적기

import time
from dataclasses import dataclass, field

@dataclass
class ChainStep:
    step: int
    tool: str
    request_id: str
    args_summary: dict
    result_size: int
    duration_ms: float
    is_error: bool
    timestamp: str

@dataclass
class ChainTracker:
    """session_id 기반으로 도구 호출 체인을 추적한다."""
    chains: dict[str, list[ChainStep]] = field(default_factory=dict)
    _last_call_time: dict[str, float] = field(default_factory=dict)
    chain_timeout_sec: float = 30.0  # 이 시간 이상 간격이 벌어지면 새 체인

    def record_step(
        self,
        session_id: str,
        request_id: str,
        tool_name: str,
        args: dict,
        result_size: int,
        duration_ms: float,
        is_error: bool,
        timestamp: str,
    ) -> str:
        """호출을 기록하고 체인 키를 반환한다."""
        now = time.monotonic()
        last = self._last_call_time.get(session_id, 0)

        # 타임아웃 초과 시 새 체인 시작
        if now - last > self.chain_timeout_sec and session_id in self.chains:
            # 기존 체인을 아카이브하고 새로 시작
            archive_key = f"{session_id}_{int(last)}"
            self.chains[archive_key] = self.chains.pop(session_id)

        if session_id not in self.chains:
            self.chains[session_id] = []

        self._last_call_time[session_id] = now
        step_num = len(self.chains[session_id]) + 1

        self.chains[session_id].append(
            ChainStep(
                step=step_num,
                tool=tool_name,
                request_id=request_id,
                args_summary={k: type(v).__name__ for k, v in args.items()},
                result_size=result_size,
                duration_ms=round(duration_ms, 1),
                is_error=is_error,
                timestamp=timestamp,
            )
        )
        return session_id

    def get_chain_summary(self, session_id: str) -> dict:
        steps = self.chains.get(session_id, [])
        if not steps:
            return {"chain_id": session_id, "total_steps": 0}
        return {
            "chain_id": session_id,
            "total_steps": len(steps),
            "tools_used": [s.tool for s in steps],
            "total_duration_ms": round(sum(s.duration_ms for s in steps), 1),
            "has_error": any(s.is_error for s in steps),
            "error_steps": [s.step for s in steps if s.is_error],
        }

이전 글의 체인 추적기는 서버 내부에서 uuid로 새 ID를 생성했기 때문에, 여러 tools/call 요청이 들어와도 같은 에이전트 작업인지 식별할 방법이 없었다. 위 구현은 session_id(MCP 세션 식별자)를 체인 키로 쓰고, 호출 간 시간 간격이 chain_timeout_sec(기본 30초)을 넘으면 새 체인으로 분리한다. 에이전트의 연쇄 호출은 보통 수 초 이내 간격이므로 30초면 충분하다.

추적기와 로깅 서버 통합

앞서 만든 로깅 서버에 ChainTracker를 연결하면, 도구 호출마다 자동으로 체인에 기록된다.

from datetime import datetime, timezone

tracker = ChainTracker()

@mcp.tool()
async def search_docs(ctx: Context, query: str, limit: int = 10) -> str:
    """문서 검색 도구."""
    log_file = _get_log_file()
    start = time.monotonic()
    ts = datetime.now(timezone.utc).isoformat()

    _append_log(log_file, {
        "timestamp": ts,
        "direction": "request",
        "session_id": ctx.session_id,
        "request_id": ctx.request_id,
        "tool_name": "search_docs",
        "arguments": {"query": query, "limit": limit},
    })

    try:
        # 실제 검색 로직
        result = json.dumps({"matches": [], "total": 0})
        duration_ms = (time.monotonic() - start) * 1000

        chain_key = tracker.record_step(
            session_id=ctx.session_id,
            request_id=ctx.request_id,
            tool_name="search_docs",
            args={"query": query, "limit": limit},
            result_size=len(result),
            duration_ms=duration_ms,
            is_error=False,
            timestamp=ts,
        )
        await ctx.debug(f"Chain {chain_key}: step {len(tracker.chains[chain_key])}")
        return result

    except Exception as e:
        duration_ms = (time.monotonic() - start) * 1000
        tracker.record_step(
            session_id=ctx.session_id,
            request_id=ctx.request_id,
            tool_name="search_docs",
            args={"query": query, "limit": limit},
            result_size=0,
            duration_ms=duration_ms,
            is_error=True,
            timestamp=ts,
        )
        await ctx.error(f"search_docs failed: {e}")
        raise

로깅 데코레이터로 보일러플레이트 줄이기
도구마다 로깅 코드가 반복된다면 데코레이터로 추출할 수 있다. 핸들러 함수를 감싸서 시작/종료/에러 로그를 자동 기록하고, `ChainTracker.record_step`도 자동 호출하는 구조다. 다만 `Context` 객체 주입 방식이 데코레이터에서 까다로울 수 있으므로, 도구가 5개 이하면 인라인으로 두는 편이 유지보수가 쉽다.

## 로그 파싱과 구조화

JSONL 로그가 쌓이면 원시 파일을 그대로 읽는 건 비현실적이다. 구조화된 쿼리가 필요하다.

import json
from pathlib import Path
from collections import Counter

def parse_tool_logs(log_dir: Path, date_str: str) -> list[dict]:
    """특정 날짜의 JSONL 로그를 파싱하여 리스트로 반환한다."""
    log_file = log_dir / f"tools_{date_str}.jsonl"
    if not log_file.exists():
        return []
    entries = []
    with open(log_file, "r", encoding="utf-8") as f:
        for line_num, line in enumerate(f, 1):
            line = line.strip()
            if not line:
                continue
            try:
                entries.append(json.loads(line))
            except json.JSONDecodeError:
                # 비정상 종료로 깨진 줄은 건너뛴다
                print(f"Skipped malformed line {line_num}")
    return entries

def build_error_report(entries: list[dict]) -> dict:
    """에러 로그만 추출하여 도구별 에러 빈도를 집계한다."""
    errors = [e for e in entries if e.get("direction") == "error"]
    by_tool = Counter(e.get("tool_name", "unknown") for e in errors)
    by_type = Counter(e.get("error_type", "unknown") for e in errors)
    return {
        "total_errors": len(errors),
        "by_tool": dict(by_tool.most_common()),
        "by_error_type": dict(by_type.most_common()),
        "sample_errors": errors[:5],  # 최근 5개 샘플
    }

def compute_tool_latency(entries: list[dict]) -> dict[str, dict]:
    """도구별 응답 시간 통계를 계산한다."""
    from statistics import mean, median
    latencies: dict[str, list[float]] = {}
    for e in entries:
        if e.get("direction") == "response" and "duration_ms" in e:
            tool = e.get("tool_name", "unknown")
            latencies.setdefault(tool, []).append(e["duration_ms"])
    result = {}
    for tool, times in latencies.items():
        result[tool] = {
            "count": len(times),
            "mean_ms": round(mean(times), 1),
            "median_ms": round(median(times), 1),
            "max_ms": round(max(times), 1),
        }
    return result

build_error_report는 어떤 도구에서 어떤 종류의 에러가 많이 발생하는지 빈도를 보여준다. compute_tool_latency는 도구별 응답 시간 분포를 계산한다. 특정 도구의 median 대비 max가 10배 이상이면 간헐적 타임아웃이 의심된다.

Claude MCP 에이전트 디버깅 실전 패턴

로그 인프라가 갖춰졌으면, 실제로 자주 만나는 문제 패턴과 로그에서 읽어내는 방법을 알아야 한다.

도구 선택 오류

에이전트가 search_docs 대신 query_db를 호출하는 경우. 로그에서 이런 패턴이 보인다:

{"direction":"request","tool_name":"query_db","arguments":{"sql":"SELECT * FROM docs WHERE content LIKE '%kubernetes%'"}}

SQL 쿼리 내용이 전문 검색이면 search_docs가 적합하다. 이건 도구의 description이 불충분해서 Claude가 잘못 선택한 경우다. 수정 방법은 도구 설명을 구체적으로 바꾸는 것이다.

@mcp.tool()
async def search_docs(ctx: Context, query: str, limit: int = 10) -> str:
    """문서 전문 검색. 키워드 기반 문서 내용 검색에 사용한다.
    SQL 쿼리가 아닌 자연어 검색어를 받는다.
    데이터베이스 테이블 조회가 필요하면 query_db를 사용할 것."""
    ...

도구 설명에 “언제 이 도구를 쓰고, 언제 쓰지 말아야 하는지”를 명시하면 선택 정확도가 올라간다.

파라미터 타입 불일치

도구가 int를 기대하는데 에이전트가 "10"(문자열)을 보내는 경우. JSON-RPC 레벨에서는 에러가 아니지만 도구 내부에서 TypeError가 발생한다.

{"direction":"error","tool_name":"query_db","error_type":"TypeError","error_message":"'>' not supported between instances of 'str' and 'int'"}

방어 코드를 넣거나, inputSchema에서 타입을 엄격히 정의하면 된다. FastMCP는 Python 타입 힌트에서 inputSchema를 자동 생성하므로, 함수 시그니처의 타입 어노테이션이 곧 스키마가 된다.

체인 중간 실패

5단계 체인에서 3단계가 실패하면 4, 5단계의 입력이 오염된다. ChainTracker.get_chain_summary()로 확인한다:

summary = tracker.get_chain_summary("session-abc123")
# 출력 예:
# {
#   "chain_id": "session-abc123",
#   "total_steps": 5,
#   "tools_used": ["search_docs", "query_db", "transform_data", "validate", "write_file"],
#   "has_error": True,
#   "error_steps": [3]
# }

error_steps가 [3]이면 transform_data에서 실패한 것이다. 해당 step의 request_id로 JSONL 로그를 필터링하면 에러 메시지 원문과 입력 arguments를 볼 수 있다.

체인 중간 실패 후 에이전트 동작
Claude는 `isError: true` 응답을 받으면 재시도하거나 다른 도구를 선택할 수 있다. 문제는 재시도 시 동일한 파라미터로 같은 에러를 반복하는 루프에 빠질 수 있다는 점이다. 도구 실행 에러 메시지에 “무엇이 잘못됐고 어떻게 고쳐야 하는지”를 포함하면 Claude가 파라미터를 수정해서 재시도할 확률이 높아진다.

## 로그 시각화 파이프라인

로그가 구조화되면 시각화는 비교적 단순해진다. 규모에 따라 두 가지 경로가 있다.

Streamlit 경량 대시보드

도구 10개 이하, 일일 호출 수천 건 이하 규모에서는 Streamlit으로 충분하다. JSONL 파일을 직접 읽어서 차트를 그린다.

import streamlit as st
import pandas as pd
from pathlib import Path
from datetime import date

st.title("MCP Tool Call Dashboard")

log_dir = Path("./mcp_logs")
selected_date = st.date_input("날짜 선택", value=date.today())
date_str = selected_date.strftime("%Y%m%d")

entries = parse_tool_logs(log_dir, date_str)
if not entries:
    st.warning("해당 날짜의 로그가 없다.")
    st.stop()

df = pd.DataFrame(entries)

# 도구별 호출 빈도
if "tool_name" in df.columns:
    tool_counts = df[df["direction"] == "request"]["tool_name"].value_counts()
    st.subheader("도구별 호출 빈도")
    st.bar_chart(tool_counts)

# 에러율
error_df = df[df["direction"] == "error"]
total_requests = len(df[df["direction"] == "request"])
if total_requests > 0:
    error_rate = len(error_df) / total_requests * 100
    st.metric("에러율", f"{error_rate:.1f}%")

# 도구별 응답 시간
response_df = df[df["direction"] == "response"].dropna(subset=["duration_ms"])
if not response_df.empty:
    st.subheader("도구별 응답 시간 분포")
    st.dataframe(
        response_df.groupby("tool_name")["duration_ms"]
        .describe()
        .round(1)
    )

pip install streamlit pandas로 설치하고 streamlit run dashboard.py로 실행하면 된다.

Grafana + Loki 조합

일일 호출 수만 건 이상이거나 여러 MCP 서버를 운영한다면 Grafana + Loki 조합이 적합하다. JSONL 로그를 Promtail로 수집하고 Loki에 저장한 뒤 Grafana에서 쿼리한다.

Promtail 설정 예시:

scrape_configs:
  - job_name: mcp_tools
    static_configs:
      - targets: [localhost]
        labels:
          job: mcp_debug
          __path__: /path/to/mcp_logs/tools_*.jsonl
    pipeline_stages:
      - json:
          expressions:
            direction: direction
            tool_name: tool_name
            is_error: is_error
            duration_ms: duration_ms
      - labels:
          direction:
          tool_name:
      - metrics:
          tool_duration:
            type: histogram
            description: "Tool call duration"
            source: duration_ms
            config:
              buckets: [10, 50, 100, 500, 1000, 5000]

Grafana에서 LogQL로 특정 도구의 에러만 필터링하는 쿼리:

{job="mcp_debug", tool_name="query_db"} | json | is_error = "true"

항목	Streamlit	Grafana + Loki
설치 복잡도	`pip install` 한 줄	Docker Compose 3개 서비스
실시간 모니터링	수동 새로고침	자동 갱신 (설정 가능)
적합 규모	호출 수천 건/일	수만~수십만 건/일
알림 기능	없음 (별도 구현)	내장 Alert Rules
로그 보존	파일 직접 관리	Loki 보존 정책

규모별 선택 기준
개발/스테이징 환경에서는 Streamlit으로 시작하고, 프로덕션에서 호출량이 늘어나면 Grafana로 마이그레이션하는 게 일반적인 경로다. 두 방식 모두 JSONL 로그를 소스로 쓰므로 로깅 레이어를 바꿀 필요는 없다.

## 자동화된 디버깅 파이프라인

시각화까지 만들었으면 마지막은 자동화다. 에러가 발생할 때 사람이 대시보드를 들여다보고 있을 수 없다.

에러율 기반 알림

최근 N분간 에러율이 임계치를 넘으면 알림을 보내는 간단한 모니터다.

import json
import time
from pathlib import Path
from datetime import datetime, timezone, timedelta

def check_error_rate(
    log_dir: Path,
    window_minutes: int = 5,
    threshold_percent: float = 10.0,
) -> dict | None:
    """최근 window_minutes 동안의 에러율을 계산한다.
    임계치 초과 시 알림 정보를 반환한다."""
    log_file = log_dir / f"tools_{datetime.now(timezone.utc).strftime('%Y%m%d')}.jsonl"
    if not log_file.exists():
        return None

    cutoff = datetime.now(timezone.utc) - timedelta(minutes=window_minutes)
    requests, errors = 0, 0

    with open(log_file, "r", encoding="utf-8") as f:
        for line in f:
            try:
                entry = json.loads(line.strip())
            except json.JSONDecodeError:
                continue
            ts = datetime.fromisoformat(entry.get("timestamp", ""))
            if ts < cutoff:
                continue
            if entry.get("direction") == "request":
                requests += 1
            elif entry.get("direction") == "error":
                errors += 1

    if requests == 0:
        return None

    error_rate = errors / requests * 100
    if error_rate > threshold_percent:
        return {
            "alert": "high_error_rate",
            "error_rate_percent": round(error_rate, 1),
            "errors": errors,
            "requests": requests,
            "window_minutes": window_minutes,
            "threshold_percent": threshold_percent,
        }
    return None

이 함수를 cron이나 systemd timer로 1분마다 실행하고, 반환값이 있으면 Slack webhook이나 이메일로 보내면 된다.

실패 체인 자동 덤프

에러가 감지되면 해당 세션의 전체 체인을 파일로 덤프한다. 나중에 원인 분석할 때 체인 전체를 한눈에 볼 수 있다.

def dump_error_chains(tracker: ChainTracker, output_dir: Path) -> list[Path]:
    """에러가 포함된 모든 체인을 개별 JSON 파일로 덤프한다."""
    output_dir.mkdir(exist_ok=True)
    dumped = []
    for chain_id, steps in tracker.chains.items():
        if not any(s.is_error for s in steps):
            continue
        summary = tracker.get_chain_summary(chain_id)
        dump_data = {
            **summary,
            "steps": [
                {
                    "step": s.step,
                    "tool": s.tool,
                    "request_id": s.request_id,
                    "args_summary": s.args_summary,
                    "result_size": s.result_size,
                    "duration_ms": s.duration_ms,
                    "is_error": s.is_error,
                    "timestamp": s.timestamp,
                }
                for s in steps
            ],
        }
        ts = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
        path = output_dir / f"error_chain_{chain_id}_{ts}.json"
        with open(path, "w", encoding="utf-8") as f:
            json.dump(dump_data, f, indent=2, ensure_ascii=False)
        dumped.append(path)
    return dumped

덤프된 JSON 파일 하나에 체인의 전체 단계가 담긴다. error_steps 필드를 보면 어디서 실패했는지 즉시 파악되고, 해당 step의 request_id로 JSONL 원본 로그에서 상세 에러 메시지를 찾을 수 있다.

MCP 서버 테스트와 디버깅 도구 비교

MCP 서버를 테스트하고 디버깅하는 도구는 여러 가지다. 상황에 따라 적합한 도구가 다르다.

도구	용도	전송 지원	특징
MCP Inspector	대화형 서버 테스트	stdio, SSE, Streamable HTTP	브라우저 UI, 도구/리소스/프롬프트 테스트, 알림 스트림
Claude Desktop DevTools	클라이언트 측 디버깅	stdio	Chrome DevTools 내장, Network 패널로 메시지 캡처
`ctx.info()` / `ctx.debug()`	서버 내부 로깅	전체	FastMCP Context 객체, 클라이언트로 로그 전송
커스텀 JSONL 로깅	영구 로그 저장	전체	이 글에서 구현한 방식, 후속 분석/시각화 가능
`stderr` 출력	간단한 디버그 출력	stdio만	`print(..., file=sys.stderr)`, 호스트 앱이 캡처

MCP Inspector는 개발 초기에 서버의 도구 스키마가 제대로 노출되는지, 개별 도구가 예상대로 동작하는지 확인하는 데 가장 빠르다. 설치 없이 npx @modelcontextprotocol/inspector로 바로 실행된다. 프로덕션 모니터링에는 적합하지 않고, 그 역할은 JSONL 로깅 + Grafana 조합이 맡는다.

ctx.info()와 ctx.debug()는 MCP 프로토콜의 notifications/message를 통해 클라이언트로 전달되므로, Claude Desktop의 로그 파일이나 Inspector의 Notifications 패널에서 실시간으로 볼 수 있다. FastMCP 공식 문서에 따르면 RFC 5424 기반 8단계 심각도 레벨(debug~emergency)을 지원하며, 클라이언트가 logging/setLevel 요청으로 최소 레벨을 조절할 수 있다.

전체 파이프라인과 다음 단계

지금까지 구축한 Claude MCP 에이전트 디버깅 파이프라인을 정리하면 이런 흐름이다:

flowchart LR
    A["도구 호출\n(tools/call)"] --> B["JSONL 로깅\n+ ChainTracker"]
    B --> C["로그 파싱\n(build_error_report)"]
    C --> D{"규모?"}
    D -->|소규모| E["Streamlit\n대시보드"]
    D -->|대규모| F["Promtail → Loki\n→ Grafana"]
    B --> G["에러율 모니터\n(check_error_rate)"]
    G --> H["알림\n(Slack/Email)"]
    G --> I["에러 체인 덤프\n(dump_error_chains)"]

핵심은 세 가지다. 첫째, 도구 호출의 입출력을 빠짐없이 JSONL로 기록한다. 둘째, session_id + 시간 윈도우로 개별 호출을 체인으로 묶는다. 셋째, isError 플래그를 정확히 구분해서 에러를 분류한다.

이 파이프라인이 잡히면 다음으로 볼 만한 주제는 MCP 서버의 도구 스키마 설계다. inputSchema와 outputSchema를 정밀하게 정의하면 에이전트의 도구 선택 정확도가 올라가고, 구조화된 출력(structuredContent)까지 활용하면 도구 간 데이터 전달이 깔끔해진다. MCP의 Streamable HTTP 전송 모드도 관심 가질 만하다. stdio 전송의 stdout 충돌 문제를 근본적으로 없애고, 세션 관리(Mcp-Session-Id 헤더)까지 프로토콜 레벨에서 지원하므로 체인 추적이 한결 단순해진다. 그리고 Claude의 Tool Use 패턴 자체를 분석하는 것도 가치가 있다. 어떤 도구 설명이 선택률을 높이는지, 에이전트가 재시도할 때 파라미터를 어떻게 바꾸는지를 로그에서 통계적으로 분석하면 도구 설계를 데이터 기반으로 개선할 수 있다.

Gemini Live API 음성 챗봇 구현 — 3.1 Flash Live 오디오 스트리밍 실전 가이드

Claude MCP 에이전트 디버깅: 도구 호출 추적부터 로그 시각화까지

목차

MCP 에이전트 투명성 문제의 구조

클라이언트 측 블랙박스

전송 계층에서 메시지가 사라지는 경로

서버 측 에러의 두 가지 계층

서버 측 도구 호출 로깅

클라이언트 측 디버그 로그 확인

세션 기반 체인 추적기

추적기와 로깅 서버 통합

Claude MCP 에이전트 디버깅 실전 패턴

도구 선택 오류

파라미터 타입 불일치

체인 중간 실패

Streamlit 경량 대시보드

Grafana + Loki 조합

에러율 기반 알림

실패 체인 자동 덤프

MCP 서버 테스트와 디버깅 도구 비교

전체 파이프라인과 다음 단계

관련 글