v0.3.0 · Self-host free · Apache 2.0

Your agent failed.
Which tool broke — and
how do we stop it next time?

Detect loops. Enforce budgets. Break failing tools. Map blast radius. For MCP servers: health checks, security scanning, and schema drift detection.

Not another prompt, eval, or simulation platform. LangSight is the runtime reliability layer for AI agent toolchains.
$pip install langsight&&langsight init
LangChain · CrewAI · Pydantic AIPostgres + ClickHouse1,003 tests · 77% coverage
langsight · session trace
$ langsight sessions --id sess-f2a9b1
 
Trace: sess-f2a9b1 (support-agent)
5 tool calls · 1 failed · 2,134ms · $0.023
 
sess-f2a9b1
├── jira-mcp/get_issue 89ms ✓
├── postgres-mcp/query 42ms ✓
├── → billing-agent handoff
│ ├── crm-mcp/update 120ms ✓
│ └── slack-mcp/notify — ✗ timeout
 
Root cause: slack-mcp timed out at 14:32 UTC
└── Fix: check SLACK_TIMEOUT (currently 500ms)

What question are you
trying to answer?

Langfuse watches the brain. LangSight watches the hands. Use them together — they never overlap.

QuestionBest tool
Did the prompt/model perform well?LangWatch / Langfuse / LangSmith
Should I change prompts or eval policy?LangWatch / Langfuse / LangSmith
Is my server CPU/memory healthy?Datadog / New Relic
Which tool call failed in production?LangSight
Is my agent stuck in a loop?LangSight
Is an MCP server unhealthy or drifting?LangSight
Is an MCP server exposed or risky?LangSight
Why did this session cost $47 instead of $3?LangSight
If this tool goes down, which agents break?LangSight

LLM quality is only
half the problem.

Teams already have ways to inspect prompts and eval scores. What they still cannot answer fast enough:

Agent stuck in a loop

Your agent retries the same tool with the same args 47 times. Burns $200. Produces nothing. Nobody detects it until the invoice arrives.

Tool failure cascades across agents

postgres-mcp goes down. 3 agents depend on it. All sessions fail. You don't know which agents are affected or how many users are impacted.

Cost explosion with no guardrails

A sub-agent retries geocoding-mcp endlessly. At $0.005/call, that's $1,800/week. No budget limit existed to stop it. You need tool-level cost control.

MCP server changed and nobody noticed

Schema drifted. A field was renamed. Auth expired. The agent keeps calling, gets corrupted data, and hallucinates downstream. Silent until users complain.

Prevent. Detect.
Monitor. Map.

01

Prevent

Stop loops, enforce budgets, break failing tools — before users notice. Configure thresholds per-agent from the dashboard. No code change needed after initial SDK setup.

from langsight.sdk import LangSightClient

client = LangSightClient(
    url="http://localhost:8000",
    loop_detection=True,       # same tool+args 3x → stop
    max_cost_usd=1.00,         # budget limit per session
    max_steps=25,              # step limit
    circuit_breaker=True,      # auto-disable after 5 failures
)

# Override thresholds per-agent from the dashboard —
# Settings → Prevention → Add agent override
# No code change needed.
02

Detect

See exactly which tool failed, when, and why. Every session gets a health tag: success, loop_detected, budget_exceeded, tool_failure. Filter and investigate instantly.

$ langsight sessions --id sess-f2a9b1

sess-f2a9b1  (support-agent)  [LOOP_DETECTED]
├── jira-mcp/get_issue        89ms  ✓
├── postgres-mcp/query        42ms  ✓
├──  → billing-agent          handoff
│   ├── crm-mcp/update    120ms  ✓
│   └── slack-mcp/notify    —   ✗  timeout

Root cause: slack-mcp timed out at 14:32
03

Monitor

MCP health checks, security scanning, schema drift detection. Proactive — catches problems before agents start failing. Alerts via Slack, OpsGenie, PagerDuty.

$ langsight mcp-health

Server           Status   Latency   Schema    Circuit
snowflake-mcp    ✅ UP    142ms     Stable    closed
slack-mcp        ⚠️ DEG  1,240ms   Stable    closed
jira-mcp         ❌ DOWN  —         —         open
postgres-mcp     ✅ UP    31ms      Changed   closed
04

Map

Lineage shows which agents call which tools. Blast radius shows what breaks when a tool goes down. Impact alerts include affected agents and session counts.

postgres-mcp ❌ DOWN

Blast radius:
  support-agent   200 sessions/day  HIGH
  billing-agent    50 sessions/day  MEDIUM
  data-agent       10 sessions/day  LOW

Total: ~260 sessions/day affected
Circuit breaker: active
📖
Deep dive: How LangSight detects agent loops3 detection patterns, real incident walkthrough
Read the post →

Zero to traced
in 5 minutes.

1

Install & discover

30 seconds

pip install langsight
langsight init

# Auto-discovered 4 MCP servers
2

Instrument your agent

2 lines of code

from langsight.sdk import LangSightClient

client = LangSightClient(url="...")
traced = client.wrap(mcp, server_name="pg")
3

See everything

real-time

langsight sessions
langsight mcp-health
langsight security-scan
langsight costs --hours 24

Built for production.

Prevention Guardrails

v0.3

Loop detection, budget limits, and circuit breakers. Configure thresholds per-agent from the dashboard — no code change needed after initial SDK setup.

Multi-Agent Call Trees

Core

parent_span_id links sub-agent calls across any depth. See the path from orchestrator to leaf tool.

Session Replay

v0.2

Re-execute any session against live MCP servers. Compare two runs side-by-side to see what changed.

Anomaly Detection

v0.2

Z-score analysis against 7-day baseline. Warning at |z|>=2, critical at |z|>=3. No manual thresholds.

Agent SLO Tracking

v0.2

Define success_rate and latency_p99 targets per agent. Get alerted before you breach availability.

AI Root Cause Analysis

4 LLMs

langsight investigate sends evidence to Claude, GPT-4o, Gemini, or Ollama and returns remediation steps.

Prometheus Metrics

v0.2

Native /metrics endpoint. Plug into your existing Grafana stack. Request counts, latencies, SSE connections.

Drop into any framework.

LangChain
LangGraph · Langflow
CrewAI
Multi-agent orchestration
Pydantic AI
Type-safe agents
LibreChat
Self-hosted chat
OTLP
Any OpenTelemetry framework
Claude · Cursor
Auto-discovered by init

Langfuse watches the brain. LangSight watches the hands.

Use alongside Langfuse, LangWatch, or LangSmith. They trace model reasoning. LangSight guards the tool layer — loops, budgets, health, security, blast radius.

Apache 2.0 · Self-host free forever

Your data. Your infra.
No vendor dependency.

Self-host on your own infrastructure. No data ever leaves your network. No paid tiers. No gated features. No usage limits.

Your data stays yours

PostgreSQL + ClickHouse via docker compose up. Both fully under your control. No telemetry phoning home.

No vendor lock-in

Apache 2.0 — fork it, modify it, embed it, sell it. No restrictions.

5-minute setup

One script generates secrets, starts 5 containers, seeds demo data. You're looking at traces before your coffee is ready.

Own the runtime layer
of your agent systems.

If your agents depend on tools, LangSight keeps them reliable, safe, and within budget.

Prevent loops. Enforce budgets. Break failing tools. Map blast radius.

$pip install langsight && langsight init
Apache 2.0 — self-host freeNo account neededdocker compose up — full stack in 5 min