# Disk fills / database grows unbounded

> The observability database grows far beyond its live data and fills the disk.

## Symptom

The box runs low on disk, and the observability database file is enormous relative to how much data it should hold — for example tens of GB on disk while only ~1 GB of predictions are actually retained. Deleting old rows (or shortening retention) does **not** shrink the file.

The cause: old rows are deleted on schedule, but on a database created without incremental auto-vacuum the freed pages are kept in an internal free-list and never returned to the operating system. The file stays at its high-water mark forever.

## Confirm

Check the database file size inside the backend container against the expected live size:

```bash
docker compose -f docker-compose.release.yml exec backend \
  sh -c 'ls -lh /data/aiboard.db'
```

If the file is many times larger than the volume of data your retention window should hold (a multi-GB file for a few days of predictions), you have free-list bloat. A single high-rate write burst can inflate the file far past steady-state size.

## Fix

Current releases reclaim space automatically after each retention sweep, so a healthy box self-corrects. A database that bloated **before** that behavior was in place needs a one-time reclaim.

1. **Stop the backend** so the database is not being written during the reclaim:

   ```bash
   docker compose -f docker-compose.release.yml stop backend
   ```

2. **Run the one-time reclaim** against the database file. This converts the file to incremental auto-vacuum and compacts it, returning the free-list pages to the OS:

   ```bash
   docker compose -f docker-compose.release.yml run --rm --entrypoint sh backend -c \
     'sqlite3 /data/aiboard.db "PRAGMA auto_vacuum=INCREMENTAL; VACUUM;"'
   ```

3. **Verify** the file shrank and the data is intact:

   ```bash
   docker compose -f docker-compose.release.yml run --rm --entrypoint sh backend -c \
     'ls -lh /data/aiboard.db; sqlite3 /data/aiboard.db "PRAGMA integrity_check;"'
   # expect a much smaller file and: ok
   ```

4. **Start the backend** again:

   ```bash
   docker compose -f docker-compose.release.yml start backend
   ```

The one-time reclaim is only needed for a database that grew before automatic
reclaim was in place. Fresh deployments are created able to shrink and keep
themselves compact after each retention sweep.

## Prevent

- **Keep retention bounded.** The retention window (`InferenceObservability:RetentionDays`, default 3 days) deletes old prediction rows on a schedule; the post-delete reclaim returns the freed pages to disk so the file stays lean.
- **Cap row count as a burst guard.** `InferenceObservability:MaxRows` (default 5,000,000) trims to the newest N rows even inside the time window, so a sudden high-rate spike cannot fill the disk before the time-based sweep runs. Lower it if your box has limited disk.
- **Be careful with throughput/stub testing.** High-rate write bursts (for example sensor-pipeline throughput tests) are what inflate the file in the first place. Avoid leaving a high-rate test running against persisted storage on a production box.

## Related

- [Observability & Alerts](/operate/monitoring/) — what the observability database stores.
