Yash030's picture
Initialize Hugging Face Space deployment for AgentMemory Python (clean without assets)
b2d9e47

Commit: <sha> Bench: LongMemEval _s / coding-agent-life-v1 / ... N: 500 / 15 / ... K: 5 Hardware: macos-15 / ubuntu-22.04 / ... OpenAI model: text-embedding-3-small Anthropic model: N/A (no LLM in retrieval loop)

Headline

agentmemory-hybrid: R@5 = XX.XX%, P@5 = XX.XX%, p50 latency = XXms

Beats grep baseline by +X.Xpt R@5, vector by +X.Xpt R@5.

Per-adapter

Adapter P@5 R@5 Hit rate p50 latency
grep
vector
agentmemory-hybrid

Per-question-type

Type grep R@5 vector R@5 agentmemory R@5
single-session-bug
single-session-refactor
preference
multi-session-causal
temporal

Methodology

  • Sessions ingested via POST /agentmemory/remember with type=eval-session
  • Queries hit POST /agentmemory/smart-search with limit=k*4
  • No LLM in retrieval loop. Direct rank from hybrid scoring.
  • Ranks dedup by sessionId before truncating to K
  • Latency measured as init+query for LongMemEval (per-question fresh state), query-only for coding-life (shared state)

Reproduce

git checkout <sha>
npm install --legacy-peer-deps
OPENAI_API_KEY=sk-... AGENTMEMORY_BASE_URL=http://localhost:3111 \
  npm run eval:longmemeval -- --stratify 10

Notes

<what surprised, what regressed, what's load-bearing>