Spaces:

Yash030
/

agentmemory-python

Running

Initialize Hugging Face Space deployment for AgentMemory Python (clean without assets)

b2d9e47 4 days ago

1.45 kB

	# <YYYY-MM-DD> — <benchmark-name>

	Commit: `<sha>`
	Bench: LongMemEval `_s` / coding-agent-life-v1 / ...
	N: 500 / 15 / ...
	K: 5
	Hardware: macos-15 / ubuntu-22.04 / ...
	OpenAI model: text-embedding-3-small
	Anthropic model: N/A (no LLM in retrieval loop)

	## Headline

	agentmemory-hybrid: R@5 = XX.XX%, P@5 = XX.XX%, p50 latency = XXms

	Beats grep baseline by +X.Xpt R@5, vector by +X.Xpt R@5.

	## Per-adapter

	\| Adapter \| P@5 \| R@5 \| Hit rate \| p50 latency \|
	\|---\|---\|---\|---\|---\|
	\| grep \| \| \| \| \|
	\| vector \| \| \| \| \|
	\| agentmemory-hybrid \| \| \| \| \|

	## Per-question-type

	\| Type \| grep R@5 \| vector R@5 \| agentmemory R@5 \|
	\|---\|---\|---\|---\|
	\| single-session-bug \| \| \| \|
	\| single-session-refactor \| \| \| \|
	\| preference \| \| \| \|
	\| multi-session-causal \| \| \| \|
	\| temporal \| \| \| \|

	## Methodology

	- Sessions ingested via `POST /agentmemory/remember` with `type=eval-session`
	- Queries hit `POST /agentmemory/smart-search` with `limit=k*4`
	- No LLM in retrieval loop. Direct rank from hybrid scoring.
	- Ranks dedup by sessionId before truncating to K
	- Latency measured as init+query for LongMemEval (per-question fresh state), query-only for coding-life (shared state)

	## Reproduce

	```sh
	git checkout <sha>
	npm install --legacy-peer-deps
	OPENAI_API_KEY=sk-... AGENTMEMORY_BASE_URL=http://localhost:3111 \
	npm run eval:longmemeval -- --stratify 10
	```

	## Notes

	<what surprised, what regressed, what's load-bearing>