Troubleshooting
A checklist of the failures you’re most likely to hit, with the
fix. The agent logs to pino — set LOG_LEVEL=debug for verbose
output, and check the logs first before guessing.
Boot fails
Section titled “Boot fails”LLM_PROVIDER=anthropic but ANTHROPIC_API_KEY is not set
Section titled “LLM_PROVIDER=anthropic but ANTHROPIC_API_KEY is not set”Self-explanatory. Set ANTHROPIC_API_KEY (or OPENAI_API_KEY,
matching LLM_PROVIDER). Not needed for ollama.
LLM_PROVIDER=ollama requires LLM_MODEL
Section titled “LLM_PROVIDER=ollama requires LLM_MODEL”Ollama has no universal default — every operator pulls their own
model. Set LLM_MODEL to the exact name you ran ollama pull
with:
ollama pull llama3.1LLM_PROVIDER=ollama LLM_MODEL=llama3.1 npx davepi-agentAGENT_SESSION_SECRET must be set when using per-user auth
Section titled “AGENT_SESSION_SECRET must be set when using per-user auth”The HTTP channel signs its session cookie with this secret. Set it to a high-entropy value:
AGENT_SESSION_SECRET=$(openssl rand -hex 32)If you’ve left this unset in dev, every restart rotates the implicit secret and invalidates every session cookie. Persist a real one even in dev.
Slack channel enabled but SLACK_BOT_TOKEN / SLACK_SIGNING_SECRET are missing
Section titled “Slack channel enabled but SLACK_BOT_TOKEN / SLACK_SIGNING_SECRET are missing”Either set both, or set SLACK_ENABLED=false to opt out
explicitly. See Channels → Slack for the
full bot setup.
Unknown auth mode: ... / Unknown LLM provider: ...
Section titled “Unknown auth mode: ... / Unknown LLM provider: ...”Typo in AGENT_AUTH_MODE or LLM_PROVIDER. Valid values:
AGENT_AUTH_MODE:service(default) orper-user.LLM_PROVIDER:anthropic(default),openai, orollama.
Chat returns errors
Section titled “Chat returns errors”401 UNAUTHENTICATED on every chat in service mode
Section titled “401 UNAUTHENTICATED on every chat in service mode”The bearer expired or is invalid.
# Confirm the token still works against davepi directly:curl -s http://localhost:5050/api/user/me -H "authorization: Bearer $DAVEPI_BEARER"{ error: { code: 'UNAUTHENTICATED' } } → mint a fresh one. The
default ACCESS_TOKEN_TTL is 15 minutes; for development bump it
in your davepi server’s .env (max 2h — the policy ceiling for
access tokens). For production prefer per-user mode — the agent
rotates tokens automatically and access tokens stay short-lived.
401 UNLINKED with a linkUrl
Section titled “401 UNLINKED with a linkUrl”Expected in per-user mode on first contact. Open the URL, complete the email/password form, retry the chat. The link nonce is one-shot and expires after 15 minutes — trigger a new chat to mint a fresh one if the first expired.
If you get UNLINKED repeatedly on the same user, check the agent
log for warnings — STORE_URL might be memory: (loses tokens on
restart) or the file path might not be writable.
403 FORBIDDEN writing memory / customer profile / proposing a persona patch
Section titled “403 FORBIDDEN writing memory / customer profile / proposing a persona patch”The agent’s bearer was issued for a user without role agent.
Field-level ACL on the learning-layer schemas is keyed off this
role:
# Confirm the role on the user behind the agent's token:curl -s http://localhost:5050/api/user/me -H "authorization: Bearer $DAVEPI_BEARER" | jq .roles# Expect: ["agent"] (plus possibly other roles for human ops, but `agent` must be present)Add the role with a one-time update from an admin token, or
register the agent’s user with roles: ['agent'] from the start.
404 link on opening a link URL
Section titled “404 link on opening a link URL”The nonce was already consumed (someone hit the page and submitted), or it expired (default 15 minutes). Trigger a fresh chat as the same channel user to issue a new nonce.
403 on POST /oauth/callback
Section titled “403 on POST /oauth/callback”Expected. That endpoint was removed in PR #128 review (refresh
tokens in URL query strings leak via logs/referrer/history) and
retained only as a loud refusal. Use the /link/:nonce flow.
Tool calls fail
Section titled “Tool calls fail”Unknown resource: <name> from use_resource
Section titled “Unknown resource: <name> from use_resource”Routing is on (more than AGENT_TOOL_LIMIT MCP tools) and the
model asked for a resource that doesn’t exist. The model usually
recovers by calling list_resources and trying again. If it
doesn’t, the resource name parsing in
toolRouter.js
might be misclassifying a tool — open an issue with the tool list.
Tools don’t appear at all
Section titled “Tools don’t appear at all”Confirm the MCP server is up:
curl -s http://localhost:5050/mcp \ -H "authorization: Bearer $DAVEPI_BEARER" \ -H "content-type: application/json" \ -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'You should see a list of tools. If you don’t:
- Confirm
DAVEPI_URLis right and reachable from the agent host. - Confirm
DAVEPI_MCP_PATH(default/mcp) matches the davepi server’s mount point. - Check the davepi server logs for MCP wiring errors.
The agent caches the tool list on first use; if you added a schema
mid-process and want the agent to see it without a restart, send a
tools/list_changed notification from davepi (emitted automatically
on hot reload in dev) or call mcpClient.refreshTools() from the
programmatic API.
MCP tool calls hang
Section titled “MCP tool calls hang”The davepi MCP transport is stateless StreamableHTTPServerTransport,
so every call opens a fresh transport. If calls hang:
- Confirm the davepi server isn’t itself stuck (long-running plugin in the request path?).
- Check for a network policy blocking the agent host from reaching
DAVEPI_URL. - If the agent is in a container, make sure
DAVEPI_URLpoints to the host that’s reachable from inside the container (oftenhost.docker.internalrather thanlocalhost).
Slack issues
Section titled “Slack issues”Bot doesn’t respond to @-mentions
Section titled “Bot doesn’t respond to @-mentions”- Confirm the bot is a member of the channel you’re mentioning
it from.
/invite @your-botif not. - Confirm the Event Subscriptions include
app_mentionandmessage.im. - Confirm
SLACK_BOT_TOKEN(startsxoxb-) andSLACK_SIGNING_SECRETare set correctly. The signing secret is the Signing Secret on the Basic Information page, not the Verification Token. - If using socket mode, confirm
SLACK_APP_TOKEN(startsxapp-) andSLACK_SOCKET_MODE=true. The agent log should printsocket connectedon startup. - If using HTTP mode, confirm your event URL is reachable
(
ngrok http 5061or similar) and the URL ends with/slack/events.
invalid_blocks or invalid_arguments from Slack
Section titled “invalid_blocks or invalid_arguments from Slack”Usually a render_table with cells that have unusual characters or
a render_chart with an oversized Vega-Lite spec. The agent’s
render tools cap rows at 500; check the model isn’t dumping a
runaway listing. Open the agent log for the render payload and
inspect it.
Slack 401 / not_authed
Section titled “Slack 401 / not_authed”Token was rotated and the SLACK_BOT_TOKEN env is stale, or the
app was uninstalled from the workspace. Re-install and copy a fresh
bot token.
Anthropic cache isn’t hitting
Section titled “Anthropic cache isn’t hitting”cache events should fire on every turn with a non-zero
cacheReadInputTokens after the first turn of a session. If
they’re always zero:
| Cause | Fix |
|---|---|
| Provider isn’t Anthropic | OpenAI / Ollama don’t use this caching primitive. Expected. |
LLM_PROMPT_CACHING=false | You turned it off. Unset or =true to re-enable. |
| Every turn is a NEW session | AGENT_SESSION_IDLE_SECONDS=0 or very small. Bump it (default 1800). |
Conversation isn’t persisting (no stable conversationId) | Service-mode HTTP has no channelUserId / conversationId, so it can’t persist. Use per-user mode or pass an explicit channelCtx with conversationId programmatically. |
| Persona / memory was just rewritten and the prefix changed | Snapshot is frozen within a session. Mid-session writes don’t bust the cache; the cache rebuilds on the next session. Expected once. |
| Tool list changed (hot reload, new schema) | Cache invalidates when the tool descriptions change. Expected after a hot reload. |
Conversation persistence
Section titled “Conversation persistence”History resets across restarts
Section titled “History resets across restarts”Either AGENT_PERSIST_CONVERSATIONS=false, or you don’t have a
stable conversationId (service-mode HTTP has none). In per-user
HTTP and Slack the conversation row should survive — confirm
conversation rows are being written:
curl -s http://localhost:5050/api/conversation \ -H "authorization: Bearer $DAVEPI_BEARER" | jq '.results[] | .conversationId'mid-session writes can't alter the in-flight prefix
Section titled “mid-session writes can't alter the in-flight prefix”Working as intended. The frozen snapshot is captured once at
session start and held byte-stable for the whole conversation —
that’s what makes the Anthropic cache hit, and what keeps
prompt-injection in a customerProfile.notes write from rewriting
this session’s identity tier. Self-authored memory writes take
effect on the next session.
If a write must take effect mid-session (operational reasons), an
operator can force a new session by setting lastTurnAt to far in
the past on the conversation row — the next turn will re-snapshot.
Ollama tool-calling issues
Section titled “Ollama tool-calling issues”Model ignores tools, hand-writes markdown tables
Section titled “Model ignores tools, hand-writes markdown tables”Smaller / older models often have weak tool-calling. Try:
- A model with better tool support:
llama3.1,qwen2.5(work well). Smaller variants are flakier. AGENT_TOOL_LIMIT=20to force the router earlier — a shorter tool list helps weaker models stay focused.- A stricter
LLM_SYSTEM_PROMPTthat explicitly demandsrender_tablefor tabular data.
model not found from Ollama
Section titled “model not found from Ollama”The LLM_MODEL you set doesn’t match a pulled model. ollama list
to see what you’ve got; ollama pull <model> to fetch one.
Tool-call parameters look mangled
Section titled “Tool-call parameters look mangled”Some Ollama models reject strict JSON-schema envelopes. The agent
already sets compatibility: 'compatible' on the OpenAI-compatible
client to relax this. If you’re still seeing breakage with a small
model, the model itself is the limitation — try a larger one.
”It works on my machine”
Section titled “”It works on my machine””Agent runs locally but not in production
Section titled “Agent runs locally but not in production”Check, in order:
DAVEPI_URL— is it reachable from the production agent host?curl $DAVEPI_URL/healthfrom the same network.- Provider key —
ANTHROPIC_API_KEY/OPENAI_API_KEYset in production env (and not committed to a.envthat didn’t ship)? - Service-mode bearer — production deployment shouldn’t be using a short-lived dev JWT. Mint a long-lived one or switch to per-user.
- CORS — does
AGENT_CORS_ORIGINSinclude the production front-end origin? Empty disables CORS entirely; an exact string match is required. - Cookie security —
AGENT_COOKIE_SECURE=true(the default) means the session cookie is only sent over HTTPS. If the agent is fronted by HTTP, the cookie won’t round-trip and per-user mode will look “stuck” — use HTTPS, orAGENT_COOKIE_SECURE=falsefor staging. STORE_URL—file:./...is relative to the agent’s cwd. In a container that cwd is rarely what you expect. Use an absolute path, mount a volume, or switch tomemory:(and accept that restarts lose links).
Reading the agent log
Section titled “Reading the agent log”LOG_LEVEL=debug npx davepi-agentLook for:
agent built— startup info (provider, model id, auth mode).mcp tool list loaded count=N— confirms MCP is reachable.prompt cache usage cacheReadInputTokens=X cacheCreationInputTokens=Y— per-turn cache metrics.snapshot fetch failed; omitting slot— a persona/memory/profile fetch threw. Tenancy issue or schema not loaded.conversation persist failed; history not saved this turn— davepi rejected the write. Probably an ACL or validation issue.
Still stuck?
Section titled “Still stuck?”Open an issue with:
- The exact env (with secrets redacted).
- The agent log at
LOG_LEVEL=debug. - The davepi server log for the same request window.
- The MCP tool list (
curl ... tools/listas shown above).