Written in Rust · Concurrent reads · μs writes · ~11MB binary

The Intuition Database

An in-memory decision database — your app learns from every interaction and makes smarter decisions automatically.

$ docker run -d -p 8080:8080 simeonlukov/banditdb:latest

$ pip install banditdb-python

★ GitHub · 🐳 Docker Hub · 🐍 PyPI · · 🟢 Live Sandbox · Apache-2.0

Building a self-learning system today is a plumbing nightmare

Four moving parts. Four failure modes. Weeks to build. Months to maintain.

✗ The old way

📨 Kafka — event streaming

⚡ Redis — state management

🐍 Python worker — matrix math

🐘 Postgres — interaction logs

✓ The BanditDB way

🎰 BanditDB — everything, in one binary

✓ In-memory matrix updates (microseconds)
✓ Built-in exploration algorithms, tunable per campaign
✓ Write-Ahead Log for crash recovery
✓ TTL cache for delayed rewards
✓ Propensity-logged Parquet export
✓ Native MCP tools for AI agents

How it works

Four API calls — one to define, three to learn.

BanditDB keeps weight matrices in memory. Every outcome you report updates those weights in microseconds, gradually building intuition about which choice wins for which context.

1

Create

run once at startup

Name the campaign, list the arms, and set the context dimension. BanditDB initialises the weight matrices and is immediately ready to serve predictions.

curl -X POST http://localhost:8080/campaign \
  -H "Content-Type: application/json" \
  -d '{
    "campaign_id": "sleep",
    "arms": [
      "decrease_temperature",
      "decrease_light",
      "decrease_noise"
    ],
    "feature_dim": 5
  }'

db.create_campaign(
    "sleep",
    arms=[
        "decrease_temperature",
        "decrease_light",
        "decrease_noise",
    ],
    feature_dim=5,
)

↻ loops with every participant

2

Predict

Pass the participant's context vector. Get back the recommended intervention and an interaction ID to track it.

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{
    "campaign_id": "sleep",
    "context": [1.0, 0.35, 0.50, 0.60, 0.96]
  }'

# → {"arm_id": "decrease_temperature",
#    "interaction_id": "a1b2c3..."}

context = [
    1.0,   # female
    0.35,  # age 35
    0.50,  # 75 kg
    0.60,  # active
    0.96,  # bedtime 23:00
]

arm, iid = db.predict("sleep", context)
# → "decrease_temperature"

3

Act

Apply the chosen intervention. BanditDB holds the context in its TTL cache, ready to receive the reward when the outcome is known.

# arm = "decrease_temperature"
apply_intervention(user_id, arm)

# lower bedroom temperature to 17°C
# BanditDB waits for the outcome...

4

Reward

Report the outcome the next morning. Matrices update in microseconds. Every subsequent participant gets a smarter recommendation.

curl -X POST http://localhost:8080/reward \
  -H "Content-Type: application/json" \
  -d '{
    "interaction_id": "a1b2c3...",
    "reward": 0.27
  }'

# → "OK"

# next morning: PSQI scores
score_before = 62
score_after  = 79

reward = (score_after - score_before) / score_before
# reward = 0.27

db.reward(iid, reward)

register with claude

$ claude mcp add banditdb banditdb-mcp --env BANDITDB_URL=http://localhost:8080

create_campaign

Define a new decision

get_intuition

Ask which action to take

record_outcome

Report success or failure

campaign_diagnostics

Inspect learning state

For AI Agents

Your agent swarm builds shared intuition

Standard LLM agents are stateless — if they make a bad decision, they repeat it tomorrow. BanditDB's built-in MCP server gives your entire agent swarm shared persistent memory.

Two commands and BanditDB is a native tool in Claude. Your agents can get intuition, record outcomes, and inspect learning state — no config file editing required.

Every decision made by any agent in the network improves the routing for all future agents.

Any decision with a measurable outcome

BanditDB works wherever you can define a context, a set of choices, and a reward.

🤖

LLM Routing

Route tasks to the right model. Learn which model wins for which task type across your entire agent fleet.

💰

Dynamic Pricing

Learn which price maximises revenue for which customer profile. Adapt in real time as behaviour changes.

🎯

Personalisation

Show the right content, offer, or layout to each user segment without a data science team.

🏥

Clinical Trials

Adaptive trial designs that route patients to the most promising treatment arm as evidence accumulates.

🛒

Checkout Optimisation

Learn which upsell offer converts best for which cart composition and customer history.

⚖️

Legal Intake Routing

Route inbound matters to the right response — consult, intake form, referral, or decline — based on case value, capacity, and conflict risk.

Walkthroughs

Learn by example

Six end-to-end examples, ordered from simplest to most advanced.

★ Start here

🌙

Sleep Improvement

Temperature, light, or noise — which adjustment works best for each person? A pure curl walkthrough, no SDK needed.

1

Measurable lift after ~300 rewarded outcomes — assuming ≥ 70% next-morning reporting compliance.

curl 3 arms · 5 features Read walkthrough →

🛒

E-Commerce Upsell

Discount, free shipping, or nothing — learns which checkout offer closes each shopper without giving margin away.

2

Measurable lift after ~1,500 checkout interactions at 50% completion — binary reward is noisiest of the four examples.

python 3 arms · 3 features Read walkthrough →

⚖️

Law Firm Client Intake

Consult, intake form, refer, or decline — learns which response maximises matter value for each enquiry profile, accounting for capacity and conflict risk.

3

Measurable lift after ~600 intake decisions at 50% outcome rate — reward is multi-valued, not binary.

python 4 arms · 5 features Read walkthrough →

💰

Dynamic Pricing

Hold margin or liquidate? Learns from sell-through rate, holiday proximity, and competitor pricing — context describes the market, not the user.

4

Measurable lift after ~500 hourly cycles on common states — rare holiday combinations require 2–3 full seasons.

python 4 arms · 5 features Read walkthrough →

🤖

Prompt Optimisation

Learns which prompt strategy — zero-shot, chain-of-thought, few-shot, structured — produces the best response for each task type. Your evals run in production, not in a spreadsheet.

5

Measurable lift after ~400 requests — LLM-as-judge gives near-100% reward observability.

python 4 arms · 5 features Read walkthrough →

🏥