An in-memory decision database โ your app learns from every interaction and makes smarter decisions automatically.
Four moving parts. Four failure modes. Weeks to build. Months to maintain.
Four API calls โ one to define, three to learn.
BanditDB keeps weight matrices in memory. Every outcome you report updates those weights in microseconds, gradually building intuition about which choice wins for which context.
Name the campaign, list the arms, and set the context dimension. BanditDB initialises the weight matrices and is immediately ready to serve predictions.
curl -X POST http://localhost:8080/campaign \
-H "Content-Type: application/json" \
-d '{
"campaign_id": "sleep",
"arms": [
"decrease_temperature",
"decrease_light",
"decrease_noise"
],
"feature_dim": 5
}'
Pass the participant's context vector. Get back the recommended intervention and an interaction ID to track it.
curl -X POST http://localhost:8080/predict \
-H "Content-Type: application/json" \
-d '{
"campaign_id": "sleep",
"context": [1.0, 0.35, 0.50, 0.60, 0.96]
}'
# โ {"arm_id": "decrease_temperature",
# "interaction_id": "a1b2c3..."}
Apply the chosen intervention. BanditDB holds the context in its TTL cache, ready to receive the reward when the outcome is known.
# arm = "decrease_temperature"
apply_intervention(user_id, arm)
# lower bedroom temperature to 17ยฐC
# BanditDB waits for the outcome...
Report the outcome the next morning. Matrices update in microseconds. Every subsequent participant gets a smarter recommendation.
curl -X POST http://localhost:8080/reward \
-H "Content-Type: application/json" \
-d '{
"interaction_id": "a1b2c3...",
"reward": 0.27
}'
# โ "OK"
Standard LLM agents are stateless โ if they make a bad decision, they repeat it tomorrow. BanditDB's built-in MCP server gives your entire agent swarm shared persistent memory.
Two commands and BanditDB is a native tool in Claude. Your agents can get intuition, record outcomes, and inspect learning state โ no config file editing required.
Every decision made by any agent in the network improves the routing for all future agents.
BanditDB works wherever you can define a context, a set of choices, and a reward.
Route tasks to the right model. Learn which model wins for which task type across your entire agent fleet.
Learn which price maximises revenue for which customer profile. Adapt in real time as behaviour changes.
Show the right content, offer, or layout to each user segment without a data science team.
Adaptive trial designs that route patients to the most promising treatment arm as evidence accumulates.
Learn which upsell offer converts best for which cart composition and customer history.
Route inbound matters to the right response โ consult, intake form, referral, or decline โ based on case value, capacity, and conflict risk.
Six end-to-end examples, ordered from simplest to most advanced.
Temperature, light, or noise โ which adjustment works best for each person? A pure curl walkthrough, no SDK needed.
Discount, free shipping, or nothing โ learns which checkout offer closes each shopper without giving margin away.
Consult, intake form, refer, or decline โ learns which response maximises matter value for each enquiry profile, accounting for capacity and conflict risk.
Hold margin or liquidate? Learns from sell-through rate, holiday proximity, and competitor pricing โ context describes the market, not the user.
Learns which prompt strategy โ zero-shot, chain-of-thought, few-shot, structured โ produces the best response for each task type. Your evals run in production, not in a spreadsheet.
Routes patients toward the most effective treatment arm in real time as evidence accumulates โ no waiting months for interim analysis.
No sign-up. No cloud account. No configuration required.
Binary โ Linux, macOS, Windows
Docker