# X-Edge AI Box & Runtime

> Industrial Edge AI platform for real-time predictive analytics, anomaly detection, and operational optimization.

> Full documentation corpus. Source: https://xisom-docs.pages.dev/

---

## Product Overview

> What Xisom Edge AI Box & Runtime is, who it's for, and how it fits your industrial stack.

**Xisom Edge AI Box** is an industrial edge appliance and runtime that brings real-time AI inference to your plant floor — no cloud round-trip required.

### Core capabilities

- **Real-time inference** — ONNX models with TensorRT, CUDA, or CPU execution to match your hardware.
- **Streaming ingestion** — connect OPC-UA, MQTT, and CSV input sources.
- **Output writeback** — send predictions and setpoints to PLCs, brokers, and files.
- **Model lifecycle** — upload, pair with a datasource, and activate.
- **Observability** — built-in latency, throughput, and execution-provider dashboard.
- **Hardened deployment** — air-gapped offline install, non-root containers, RBAC, audit log.

### Who it's for

- **Plant engineers** automating quality inspection and predictive maintenance.
- **System integrators** connecting AI models to existing SCADA/MES systems.
- **OT/IT teams** needing on-prem inference without exposing data to cloud.

### Architecture at a glance

```mermaid
flowchart LR
  Sensors[PLC / Sensors] --> Gateway[Datasource Gateway]
  Gateway --> Runtime[Inference Runtime]
  Runtime --> Bus[Event Bus / SignalR]
  Bus --> UI[Web Dashboard]
  Bus --> Mes[MES / Historian]
```

### Next

- Try the [Quickstart](/install-deploy/quickstart/) (≈15 min)
- Install on an edge box with the [Offline Bundle Install](/install-deploy/offline-bundle-install/)
- Browse [Use Cases](/use-cases/)

---

## Use Cases

> How customers deploy Xisom Edge AI in production.

### Predictive maintenance

Stream vibration and temperature data from rotating equipment. Detect bearing wear and motor faults hours before failure.

### Quality inspection

Run vision models on conveyor cameras. Reject defective parts in real time, log every decision with image evidence.

### Anomaly detection

Continuous monitoring of process variables. Surface unusual patterns that rule-based SCADA alarms miss.

### Energy & yield optimization

Combine sensor streams + model predictions to recommend setpoint adjustments to operators or controllers.

### Operational visibility

Aggregate cell- and line-level KPIs into a unified plant view with drill-down to inference-level detail.

---

## Offline Bundle Install

> Install the Xisom Edge AI Box on an air-gapped site from the offline release bundle — load images, run the installer, set secrets, bring the stack up, and verify health.

This is the recommended way to install on a customer edge site with **no internet, no
registry access, and no image building on the box**. You install everything from a
single self-contained **release bundle** that ships on USB media.

**When to use this page**

Use the offline bundle for production edge boxes at a plant or OT site. For a quick
laptop evaluation with internet access, the [Quickstart](/install-deploy/quickstart/)
is faster.

### What you receive

A release bundle is one directory, named for its hardware profile and version
(for example `amd64-cpu-v1.5.0`). Choose the bundle that matches your hardware:

| Bundle profile | Use it when the box has… |
|----------------|---------------------------|
| `amd64-cpu`    | x86_64 CPU, no GPU — CPU-only inference |
| `amd64-gpu`    | x86_64 CPU + NVIDIA GPU — TensorRT acceleration |
| `jetson`       | NVIDIA Jetson (arm64 / L4T) edge module |

Inside the bundle:

- amd64-cpu-v1.5.0/
  - images/
    - amd64-cpu-v1.5.0.images.tar.zst   the three service images, compressed
    - manifest.txt                       image tags + checksums
  - compose/
    - docker-compose.release.yml         image-based stack (no build step)
    - .env.template                      port + secret placeholders
  - models/
    - predictive_maintenance.onnx        demo seed model (random weights)
  - install.sh
  - update.sh
  - uninstall.sh
  - README.txt                           operator one-pager

**The seed model is a demo**

The bundled model has random weights — its predictions are meaningless until you
replace it with your trained model. See step 5 of the [install runbook](#install-runbook).

### Before you start

The target box must already have these installed (they ship on the golden OS
image — the installer **checks** for them and stops if anything is missing; it does
not install them for you):

- Docker Engine 20.10 or newer, with the `docker compose` v2 plugin
- `zstd` and `openssl`
- **NVIDIA Container Toolkit** — GPU and Jetson bundles only

Install on a **clean target**. The release stack reuses the same container names,
volumes, and host ports as a development stack — running it on a box that already
has Xisom running will collide.

### Install runbook

1. ### Copy the bundle to the box

   Copy the whole bundle directory from your USB media to the target box, then
   open a terminal in it:

   ```bash
   cd amd64-cpu-v1.5.0/
   ```

2. ### Run the installer

   The installer auto-detects your hardware profile, loads the container images,
   generates secrets, brings the stack up, and waits for health.

   ```bash
   sudo ./install.sh
   ```

   To force a specific profile, or to also install a boot-start service:

   ```bash
   sudo ./install.sh --profile amd64-cpu   # force the profile
   sudo ./install.sh --with-systemd        # also start on boot
   ```

   Under the hood `install.sh` runs these steps in order:

   1. **Preflight** — checks Docker, the GPU toolkit (GPU/Jetson), and disk.
   2. **Verify** the image archive checksum (detects corrupt or tampered media).
   3. **Load images** — `docker load` from the bundled archive (no pull, no build).
   4. **Generate secrets** — creates the JWT signing secret and admin password on
      the box and writes them to a protected `.env` (mode `600`). No secret ever
      ships inside the bundle.
   5. **Start the stack** — `docker compose up -d`.
   6. **Health gate** — waits for all three services to report healthy.
   7. **Seed the admin** and **print the dashboard URL + admin password once**.

3. ### Save the admin password

   
**Printed only once**

   The installer prints the dashboard URL and the generated admin password **a
   single time** at the end of the run. Copy it into your password manager
   immediately — it is not stored anywhere you can read it back.
   

4. ### Confirm the secret was written

   The installer generates the JWT signing secret automatically. You normally never
   touch it, but you can confirm the protected `.env` exists:

   ```bash
   ls -l .env        # expect mode -rw------- (600)
   ```

   If you ever need to provide your own value (e.g. a corporate secret-management
   policy), edit `.env` before the first `up` and set a strong random value:

   ```ini
   # .env  (placeholder — generate a strong random value, do not reuse)
   AIBOARD_JWT_SECRET=your-strong-random-jwt-secret
   ```

5. ### Replace the demo model

   The bundle ships a demo model with random weights. Replace it with your trained
   `predictive_maintenance.onnx` so predictions are meaningful:

   - **Before first install:** overwrite `models/predictive_maintenance.onnx` in the
     bundle, then run `install.sh`.
   - **After install:** drop your model into the model-data volume at `/data/models`
     and restart the inference service.

   You can also upload models later from the dashboard — see
   [Deploying Models](/configure/models/).

6. ### Verify health

   Confirm all three services are healthy:

   ```bash
   docker compose -f compose/docker-compose.release.yml ps
   ```

   Expect three containers in a `healthy` state. Then open the dashboard URL the
   installer printed and sign in with the admin account.

   
**GPU first run is slow**

   On GPU and Jetson profiles, the inference service may sit in `unhealthy` for
   2–3 minutes on first boot while it compiles the TensorRT engine. This is
   expected — wait, then re-check `ps`.
   

### Day-2 operations

  
**Update**

   Deploy a newer bundle. The update snapshots the database first so you have a
   rollback path.

   ```bash
   sudo ./update.sh
   ```

   
**Carry your live .env into the new bundle**

   `update.sh` reads the `.env` from the bundle directory — it holds your
   per-deployment secret, ports, and image references. A freshly downloaded bundle
   has none, so copy the existing install's `.env` into the new bundle directory
   before running the update, or it will abort.
   

   The pre-update snapshot covers the **database only** (not uploaded models or the
   TensorRT cache). Back those volumes up separately if you need to roll models back.

  
  
**Uninstall**

   ```bash
   ./uninstall.sh          # stop the stack, KEEP all data
   ./uninstall.sh --purge  # also delete models + database (confirmation prompt)
   ```

  

### If something goes wrong

| Symptom | What it means |
|---------|---------------|
| `Bundle manifest missing` | You ran the installer from a source folder, not an assembled bundle. Use the bundle directory you copied from USB. |
| `Architecture mismatch` | Wrong bundle profile for this box's CPU. Use the matching profile. |
| `NVIDIA toolkit not wired in` | A GPU/Jetson bundle on a box without the container toolkit. Use the `amd64-cpu` bundle, or fix the GPU stack on the OS image. |
| Inference stuck `unhealthy` | First-run TensorRT compile (GPU) takes 2–3 min — wait. Otherwise check the inference logs. |
| Archive checksum mismatch | Corrupted or tampered USB media. Re-copy the bundle and retry. |

See the full [Troubleshooting](/troubleshooting/) runbook for symptom-keyed fixes.

### Next steps

  - [Connect an input datasource](/configure/input-datasources/) — Stream sensor data into the runtime.
  - [Deploy a model](/configure/models/) — Upload and activate your trained model.
  - [Your first inference](/install-deploy/first-inference/) — Seed a model and see live predictions.

---

## Quickstart

> Run your first inference on Xisom Edge AI Box in 15 minutes.

This quickstart spins up the demo stack on your laptop using Docker Compose — a fast
way to try the platform. No edge hardware required. For a production edge install, use
the [Offline Bundle Install](/install-deploy/offline-bundle-install/).

### Prerequisites

- Docker 24+ and Docker Compose v2
- 8 GB RAM, 10 GB free disk
- (Optional) NVIDIA GPU + driver for TensorRT acceleration

### 1. Pull the demo stack

```bash
git clone https://github.com/xisom/edge-demo.git
cd edge-demo
docker compose pull
```

### 2. Start the runtime

```bash
docker compose up -d
docker compose ps
```

Open `http://localhost:8080` — the web dashboard.

### 3. Send a sample inference

```bash
curl -X POST http://localhost:8080/api/inference \
  -H 'Content-Type: application/json' \
  -d '{"model": "demo-anomaly", "payload": {"temp": 78.4, "rpm": 1450}}'
```

You should see the prediction returned and the latency tile updated in the dashboard.

### 4. Next steps

- [Hardware Setup](/install-deploy/hardware-setup/) — deploy on real edge hardware
- [First Inference](/install-deploy/first-inference/) — load your own model
- [Connect an input datasource](/configure/input-datasources/)

---

## Hardware Setup

> Provision the Xisom Edge AI Box, pick an execution-provider mode, and connect it to your network.

Provision the box, choose how inference runs (GPU vs CPU), and wire it into your
plant network before installing the software.

### Supported hardware

| Tier | CPU | GPU | RAM | Inference mode |
|------|-----|-----|-----|----------------|
| Lite | 8-core x86_64 | — | 16 GB | CPU-only |
| Pro  | 8-core x86_64 | NVIDIA T4 / A2 | 32 GB | TensorRT-ready |
| Max  | 16-core x86_64 | NVIDIA L4 / A10 | 64 GB | Multi-model |
| Edge | NVIDIA Jetson (Orin / Xavier) | integrated | 16–32 GB | TensorRT on L4T (arm64) |

### Edge GPU prerequisites

If the box has an NVIDIA GPU (Pro, Max, or Jetson tiers), the operating-system
image must have these in place before you install Xisom:

- A compatible **NVIDIA driver** for the GPU.
- The **NVIDIA Container Toolkit**, so containers can access the GPU.
- For Jetson modules: an **L4T (arm64)** base image with the toolkit wired in.

On a CPU-only (Lite) box you need none of the above — inference runs on the CPU.

### Execution-provider modes

The inference runtime picks an **execution provider** that matches your hardware.
You choose the matching profile when you install (see
[Offline Bundle Install](/install-deploy/offline-bundle-install/)).

  
**TensorRT (GPU)**

  **Best performance.** Uses the NVIDIA GPU with TensorRT acceleration for the
  lowest inference latency. Available on Pro, Max, and Jetson tiers.

  - Requires the NVIDIA driver + Container Toolkit on the host.
  - The first run compiles a TensorRT engine — expect a 2–3 minute warm-up before
    the inference service reports healthy.

  
  
**CUDA (GPU)**

  **GPU acceleration without TensorRT engine compilation.** Runs models on the GPU
  via CUDA. A good fallback when a model is not TensorRT-compatible, or to avoid the
  first-run compile delay.

  - Requires the NVIDIA driver + Container Toolkit on the host.

  
  
**CPU**

  **No GPU required.** Inference runs entirely on the CPU. Use the `amd64-cpu`
  install profile on Lite-tier boxes or anywhere a GPU is unavailable.

  - Highest portability, higher latency than the GPU modes.

  

Pick the GPU mode that matches your model. If a GPU box falls back to CPU
unexpectedly, the dashboard shows the active execution provider so you can spot it.
See [Monitoring](/operate/monitoring/).

### Network requirements

- One LAN port for management (dashboard + API).
- An optional second port for the OT network where your sensors and PLCs live.
- Outbound HTTPS to your license and update servers, if used (configurable
  allowlist). Air-gapped sites install from the [offline
  bundle](/install-deploy/offline-bundle-install/) and need no outbound access.

### First boot

1. Connect power and both network ports.
2. Browse to `https://your-device-ip` (replace with the box's management IP).
3. Sign in with the admin account created during install.
4. Continue to [Connect an input datasource](/configure/input-datasources/).

### If something goes wrong

- GPU box falling back to CPU, or stuck `unhealthy` on first boot — see the
  [Troubleshooting](/troubleshooting/) runbook.

### Next steps

  - [Install from the offline bundle](/install-deploy/offline-bundle-install/) — Air-gapped, USB-shippable install runbook.
  - [Connect a datasource](/configure/input-datasources/) — Stream sensor and process data in.

---

## Your First Inference

> Seed a model, enable a datasource, and watch live predictions appear on the dashboard.

After install, run one end-to-end loop: **seed a model → enable a datasource → see
predictions**. This is the quickest way to confirm the runtime is working.

Each step below has a dedicated guide if you want the full detail. This page is the
happy-path walkthrough.

1. ### Sign in

   Open the dashboard URL printed during install and sign in with your admin account.

2. ### Upload a model

   Go to **Models** and upload your ONNX model. Note its **window size** and
   **feature count** — they must match the datasource you pair it with.

   See [Deploying Models](/configure/models/) for formats and versioning.

3. ### Add an input datasource

   Go to **Datasources → Input** and add a source — for example an MQTT topic or an
   OPC-UA endpoint. Map each sensor tag to a model input channel, and set the
   **window size** to match the model.

   See [Connecting Input Datasources](/configure/input-datasources/) for the full
   per-protocol walkthrough.

4. ### Pair the model with the datasource

   On the datasource, pair the model you uploaded. The platform runs a compatibility
   check — `window size × feature count` must agree on both sides. If it fails,
   adjust the datasource window size or regenerate the model.

5. ### Enable, then start streaming

   Toggle **Enable** on the datasource row. Enabling validates the configuration,
   loads the paired model, and connects the adapter — but it does **not** start
   streaming yet. Click **Start** to begin inference.

   
**Enable and Start are two steps**

   Enable prepares the runtime (loads model, connects the source). Start begins the
   data stream. Only one input datasource can be enabled at a time — enabling a new
   one disables the previous.
   

6. ### Watch predictions

   Open the **Dashboard**. Within about 20 seconds (once the first window of samples
   arrives) the predictions card updates in real time, and the sensor cards show
   live readings.

### Inspect results

The monitoring view shows latency (p50 / p95 / p99), throughput, error rate, and the
active execution provider. See [Monitoring](/operate/monitoring/) to read the KPIs.

### If something goes wrong

- Pairing rejected, datasource won't enable, or no predictions appear — see the
  [Troubleshooting](/troubleshooting/) runbook.

### Next steps

  - [Send results out](/configure/output-datasources/) — Write predictions back to a PLC or broker.
  - [Monitor the runtime](/operate/monitoring/) — Latency, throughput, and drift.

---

## Connecting Input Datasources

> Stream sensor and process data into the Xisom runtime — add an OPC-UA, MQTT, or CSV source, map tags, pair a model, enable, and start inference.

An **input datasource** is one logical source of sensor data — an OPC-UA endpoint, an
MQTT topic group, a CSV replay file, and so on. Each source is a typed adapter with
its own connection settings and tag list. You map its tags to your model's input
channels, then enable it to start inference.

### Supported input sources

- **OPC-UA** — subscribe to tags from any compliant server.
- **MQTT** — bring-your-own broker, TLS supported.
- **CSV** — replay recorded data from a file (great for testing).
- **PLC** — Modbus TCP and other industrial protocols.

**One input enabled at a time**

At most **one** input datasource can be enabled at any moment. Enabling a new source
automatically disables the previous one. This keeps the runtime bound to a single,
predictable stream.

### Walkthrough: add a source and start inference

1. ### Open the Datasources page

   In the dashboard, go to **Datasources → Input**, then click **+ Create**.

2. ### Pick the adapter type and fill the connection

   Select the source type. The form adapts to the protocol you choose — fill in the
   endpoint, credentials, and any protocol-specific fields. Secret fields (passwords,
   tokens) are masked as `***` after saving.

   
     
**OPC-UA**

     | Field | Example |
     |-------|---------|
     | Endpoint | `opc.tcp://your-opcua-host:4840/factory/line1` |
     | Security policy | `None` (or a `Basic256Sha256` profile with certificates) |
     | Authentication | `Anonymous` (or username/password) |

     Tags are referenced by **NodeId**, e.g. `ns=2;i=2` (numeric) or
     `ns=2;s=Temperature` (string).

     
     
**MQTT**

     | Field | Example |
     |-------|---------|
     | Broker URL | `mqtt://your-broker-host:1883` (or `mqtts://` for TLS) |
     | Topic | `factory/line1/temp` |
     | JSON path | `value` (extracts the numeric reading from the message) |
     | Username / password | optional (masked on save) |

     
     
**CSV**

     | Field | Example |
     |-------|---------|
     | File | a recorded dataset to replay |
     | Columns | the sensor columns to feed as channels |

     CSV is replay-only — useful for testing a model without a live plant connection.

     
   

3. ### Map tags to model channels

   List the tags you want to stream and assign each to a numbered **input channel**
   (`0`, `1`, `2`, …). Each channel feeds one model input. Set the **window size** —
   how many consecutive samples the model consumes per prediction — to match the
   model you will pair.

   
**Channel mapping must be complete**

   Provide exactly one tag per channel, with no gaps. Validation rejects an enable if
   any channel is unmapped or over-filled. String tags (e.g. a `RUN`/`STOP` status)
   cannot be used as numeric features — skip them when defining channels.
   

4. ### Save and pair a model

   Save the datasource. Then pair it with a model you uploaded under **Models**. The
   platform runs a compatibility check — `window size × feature count` must agree on
   both sides — and rejects the pairing if they differ.

5. ### Enable, then start

   Toggle **Enable** on the row. This validates the configuration, loads the paired
   model, and connects the adapter. The connection handshake (OPC-UA / MQTT / Modbus)
   happens here, so a network or credential problem surfaces now and rolls the row
   back to **Disabled** with an error message.

   Enabling does **not** auto-start streaming — click **Start** to begin inference.

### Test the connection first

Before enabling, use **Test Connection** on the source. It verifies the live form
values reach the endpoint, so you catch a wrong host or credential before wiring tags.

### Rate limiting and buffering

Each datasource has independent rate limits and a bounded buffer. When a source
produces faster than the runtime consumes, backpressure is reported to the dashboard
so you can spot saturation.

### If something goes wrong

| Symptom | Likely cause |
|---------|--------------|
| Enable fails with `model_not_paired` | Pair a model with the datasource first. |
| Enable rolls back on connect | Endpoint unreachable, wrong credentials, or firewall — verify with **Test Connection**. |
| Row says Enabled but no predictions | By design — press **Start** to begin streaming. |
| Pairing rejected | `window size × feature count` mismatch between model and datasource. |

See the [Troubleshooting](/troubleshooting/) runbook for more.

### Next steps

  - [Send results out](/configure/output-datasources/) — Write predictions back to a PLC, broker, or file.
  - [Deploy a model](/configure/models/) — Upload, pair, and activate models.

---

## Output Datasources

> Write predictions and setpoints back to external systems — configure an OPC-UA, MQTT, or Modbus output and verify it with a test write.

An **output datasource** writes data out to an external system — a PLC, broker, or
control application. Unlike input datasources, outputs have no model or tag mapping:
they accept a JSON payload and deliver it to the target.

**Test-write today**

Output datasources currently support **on-demand test writes** — each write is sent
individually so you can verify connectivity and addressing. Scheduled, batched
production write paths are a future enhancement.

### Supported outputs

| Protocol | Capability |
|----------|------------|
| OPC-UA   | Write to a node (synchronous) |
| MQTT     | Publish to a topic (configurable QoS / retain) |
| Modbus TCP | Write to holding registers or coils |

### Configure an output

1. ### Create the output

   Go to **Datasources → Output**, click **+ Create**, and pick the protocol.

2. ### Fill the connection config

   Each protocol has its own fields. Secret fields are masked as `***` after saving.

   
     
**OPC-UA**

     ```json
     {
       "endpoint": "opc.tcp://your-plc-host:4840",
       "securityPolicy": "None",
       "authentication": "Anonymous"
     }
     ```

     
     
**MQTT**

     ```json
     {
       "brokerUrl": "mqtt://your-broker-host:1883",
       "topic": "devices/plc/setpoint",
       "qos": 1,
       "retain": false,
       "username": "your-username",
       "password": "your-password"
     }
     ```

     
     
**Modbus TCP**

     ```json
     {
       "host": "your-plc-host",
       "port": 502,
       "slaveId": 1,
       "defaultCoilAddress": 1000,
       "defaultRegisterAddress": 40001
     }
     ```

     Addresses follow the Modicon convention (coils from `1`, holding registers from
     `40001`).

     
   

3. ### Save and enable

   Save the output, then enable it. An output must be **enabled** before it will
   accept a test write.

### Send a test write

Open the output's details, find the **Test Write** action, enter a JSON payload, and
send it. The result shows success or the protocol error returned by the target.

  
**OPC-UA**

  Write a single node:

  ```json
  {
    "nodeId": "ns=2;s=Temperature",
    "value": 42.5
  }
  ```

  
  
**MQTT**

  Publish a payload:

  ```json
  {
    "setpoint": 65.0,
    "mode": "heating"
  }
  ```

  
  
**Modbus**

  Write one or more registers:

  ```json
  {
    "addresses": [
      { "address": 1000, "value": 100 },
      { "address": 1002, "value": 250 }
    ]
  }
  ```

  

A successful write returns a success result; a protocol error returns a message such
as `OPC UA: node not found` or `Modbus: address out of range`.

### If something goes wrong

| Symptom | Likely cause |
|---------|--------------|
| `Connection timeout` | Target unreachable — check network path, host, and port (MQTT `1883`, Modbus `502`). |
| Datasource disabled error | Enable the output before sending a test write. |
| `Node not found` (OPC-UA) | Check the NodeId format and that the node exists on the server. |
| Address out of range (Modbus) | Stay within Modicon ranges (coils `1+`, holding registers `40001+`). |
| `Admin only` (403) | Test writes require an admin account. |

See the [Troubleshooting](/troubleshooting/) runbook for more.

### Next steps

  - [Connect an input](/configure/input-datasources/) — Stream sensor data in to drive inference.
  - [Monitor the runtime](/operate/monitoring/) — Latency, throughput, and KPIs.

---

## Deploying Models

> Upload an ONNX model, pair it with an input datasource, and activate it on the Xisom runtime.

Models run on the Xisom runtime in **ONNX** format. You upload a model, pair it with
an input datasource, and activate it to start inference.

### Supported format

- **ONNX** — the runtime accepts ONNX models and executes them with the execution
  provider that matches your hardware (TensorRT, CUDA, or CPU). See
  [Hardware Setup](/install-deploy/hardware-setup/) for the modes.

**Window size and features must match the datasource**

A model is defined by its **window size** (samples per prediction) and **feature
count** (input channels). When you pair a model with a datasource, the platform
checks that `window size × feature count` agrees on both sides — so note these values
before you pair.

### Upload, pair, activate

1. ### Upload the model

   Go to **Models** and upload your `.onnx` file. The platform validates the file
   (extension, type, and schema) before it is available.

2. ### Pair it with a datasource

   On an input datasource, pair this model. The compatibility check runs here — if
   the window size or feature count disagrees, the pairing is rejected. Adjust the
   datasource window size or regenerate the model to match.

3. ### Activate

   Enable the datasource, then **Start** streaming. The runtime loads the paired
   model into the inference service and begins producing predictions. See
   [Your First Inference](/install-deploy/first-inference/) for the full loop.

### Replacing the demo seed model

A fresh install ships a **demo model with random weights** — predictions are
meaningless until you replace it. Upload your trained model as above, or drop it into
the model-data directory and restart the inference service. The offline-bundle
install page covers the file-drop path:
[Replace the demo model](/install-deploy/offline-bundle-install/#install-runbook) (step 5).

### If something goes wrong

- Upload rejected, pairing fails, or the runtime won't load the model — see the
  [Troubleshooting](/troubleshooting/) runbook.

### Next steps

  - [Connect an input datasource](/configure/input-datasources/) — Feed the model with live sensor data.
  - [Monitor predictions](/operate/monitoring/) — Latency, throughput, and KPIs.

---

## Monitoring

> Read the inference dashboard — latency breakdown, throughput, error rate, and the window/bucket time controls.

The **Inference Stats** dashboard shows how your runtime is performing in real time:
latency, throughput, error rate, and where time is spent in the pipeline. It updates
continuously while a datasource is streaming.

### Key indicators

- **Latency percentiles** — p50 / p95 / p99 of end-to-end inference time.
- **Throughput** — predictions per second.
- **Error rate** — share of inferences that failed.
- **Execution provider** — whether inference is running on TensorRT, CUDA, or CPU.
  A GPU box that falls back to CPU shows up here.

### Latency breakdown

Each inference is decomposed so you can see where time goes:

- **Queue wait** — time spent waiting in the request queue
- **Pre-process** — input normalization
- **Model exec** — pure inference time
- **Post-process** — output decoding

If latency rises, the breakdown tells you whether the model itself slowed down or the
box is saturated upstream.

### Window and bucket controls

Two controls shape the time view:

- **Window** — how far back you look (for example `5m`, `1h`, `24h`).
- **Bucket** — how wide each point on the chart is (for example `10s`, `1m`).

The number of points is `window ÷ bucket`. The dashboard auto-snaps the bucket when
you change the window so charts stay readable — wider windows use wider buckets.

**How the chart stays steady**

The latest point is aligned to a bucket boundary, so the current bucket never shows
an artificially low count. Empty buckets are filled with zeros, so the line stays
continuous even when traffic is sparse.

**Data retention**

Live metrics are retained for **3 days**. Windows longer than that return no data. A
brand-new install needs a short warm-up before the charts show traffic.

### Reading a fresh install

If the chart says "no data", the most common reasons are:

- No datasource is **streaming** yet — enable and **Start** a source.
- The install is still warming up — wait for the first window of samples.

### If something goes wrong

- Empty charts, unexpected CPU fallback, or rising error rate — see the
  [Troubleshooting](/troubleshooting/) runbook.

### Next steps

  - [Check versions](/operate/versioning/) — Confirm frontend and backend versions match.
  - [Connect a datasource](/configure/input-datasources/) — Start streaming to populate the dashboard.

---

## Versions & Updates

> Read the About panel, understand frontend/backend version drift, and know what it means operationally.

Every Xisom build carries a version. Knowing which version is running — and whether
all parts of the system agree — is the first thing to check after an update or when
something behaves unexpectedly.

### The About panel

Open **Settings → About**. It shows the running version of the platform, along with
the build identity. The version follows the standard `MAJOR.MINOR.PATCH` format (for
example `v1.5.0`).

You can also confirm the backend version from the health endpoint:

```bash
curl -s https://your-device-ip/api/system/health
# → { "version": "v1.5.0", "commit": "...", "buildTime": "..." }
```

### What version drift means

The About panel shows **both** the frontend version (the dashboard in your browser)
and the backend version (the running service). When they differ, the panel highlights
the mismatch.

**Why drift happens**

Version drift usually means **one part was updated and the other wasn't**, or your
browser is showing a stale, cached dashboard. It is a signal that the system is not in
a fully consistent state.

Operationally:

- **Stale browser tab** — the most common, harmless cause. Hard-refresh the dashboard
  (or close and reopen it) to load the current version.
- **Partial update** — one service was redeployed and another wasn't. Re-run the
  update so all parts land on the same version. See
  [Offline Bundle Install → Update](/install-deploy/offline-bundle-install/).

When in doubt, match the frontend and backend versions before troubleshooting other
behavior — a mismatch can explain symptoms that look like bugs.

### After an update

1. Re-open **Settings → About**.
2. Confirm the frontend and backend versions **match** and read the version you
   expect.
3. If they don't match, hard-refresh; if it persists, re-run the update.

### If something goes wrong

- Persistent version mismatch after a refresh and re-update — see the
  [Troubleshooting](/troubleshooting/) runbook.

### Next steps

  - [Update the box](/install-deploy/offline-bundle-install/) — Deploy a newer release bundle.
  - [Release notes](/release-notes/) — What changed in each release.

---

## Docker Lab Walkthrough

> Stand up the full Xisom stack locally with Docker Compose for evaluation and integration testing.

This walkthrough mirrors the production deployment topology on a single host — handy
for evaluation and integration testing before you deploy to an edge box.

### What you'll build

- Inference runtime (ONNX/TensorRT)
- Datasource gateway with MQTT broker
- Web dashboard
- Postgres for metadata + Redis for cache

### Steps

1. Clone the lab repo.
2. Copy `.env.example` to `.env` and set `XISOM_LICENSE_KEY`.
3. `docker compose --profile full up -d`
4. Wait for `health: ok` on all services.
5. Open the dashboard and run the seed script.

### Verify

- Open **Observability → Inference Stats** — you should see traffic from the seed model.
- Trigger the synthetic anomaly: `make synthetic-anomaly` — alerts panel lights up.

### Tear down

```bash
docker compose down -v
```

### If something goes wrong

Services not reaching `health: ok`, or no traffic in the dashboard — see the
[Troubleshooting](/troubleshooting/) runbook.

---

## Troubleshooting

> Symptom-first runbook for diagnosing and fixing common Xisom Edge AI Box issues.

Find your symptom in the table, jump to the fix. Every runbook page follows the
same shape: **Symptom → Confirm → Fix → Prevent**.

### Find your symptom

| If you see… | Go to |
|---|---|
| GPU installed but inference runs on CPU; log shows `CUDA failure 500: named symbol not found` | [GPU not used — CUDA error 500](/troubleshooting/gpu-cuda-error-500/) |
| Inference container exits/restarts when GPU is enforced and the host GPU is broken | [GPU not used — CUDA error 500](/troubleshooting/gpu-cuda-error-500/) |
| Host GPU works, but provider shows a lower tier than expected (e.g. CUDA instead of TensorRT, or CPU) | [Running on CPU when GPU expected](/troubleshooting/execution-provider-fallback/) |
| Log line `Failed to load library libonnxruntime_providers_tensorrt.so` | [Running on CPU when GPU expected](/troubleshooting/execution-provider-fallback/) |
| Enabling a datasource bounces back to Disabled with `model_not_paired` or `adapter_connect_failed` | [Datasource down or faulted](/troubleshooting/datasource-down-or-faulted/) |
| A streaming OPC-UA / MQTT / CSV source stops; row shows a fault indicator / red banner | [Datasource down or faulted](/troubleshooting/datasource-down-or-faulted/) |
| Datasource is Enabled but no predictions appear | [Datasource down or faulted](/troubleshooting/datasource-down-or-faulted/) |
| **Settings → About** highlights the frontend / backend version in amber | [Frontend / backend version mismatch](/troubleshooting/version-drift-fe-be/) |
| Version reads `0.0.0-dev…` or ends in `-dirty` | [Frontend / backend version mismatch](/troubleshooting/version-drift-fe-be/) |
| Disk filling up; database file far larger than the data it holds; deleting rows doesn't shrink it | [Disk fills / database grows unbounded](/troubleshooting/sqlite-retention-disk/) |
| A container shows `unhealthy` in `docker ps` but the app serves fine | [Container marked unhealthy but service works](/troubleshooting/healthcheck-ipv6-false-negative/) |

The deployed stack uses the release compose file. Commands in these pages are written
for the running containers (`aiboard-inference-real`, `aiboard-backend-real`,
`aiboard-frontend-real`) via `docker compose -f docker-compose.release.yml …`.

---

## GPU not used — CUDA error 500

> Inference container logs "CUDA failure 500" and falls back to CPU though a GPU is installed.

### Symptom

The box has an NVIDIA GPU, but inference runs on CPU. The inference container log shows:

```
CUDA failure 500: named symbol not found
... resolved to CPU only
```

In the dashboard, the **Python Inference** card shows CPU (fallback) instead of `TensorRT` or `CUDA`. With strict GPU enforcement enabled the container instead exits and keeps restarting.

`nvidia-smi` reporting your GPU as present does **not** mean CUDA compute works. The
management layer (`nvidia-smi`) and the compute driver are different things — the
compute driver is what failed here.

### Confirm

The decisive test is NVIDIA's own compute sample. Run it on the host:

```bash
docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
```

- **Healthy host** → prints `Test PASSED`.
- **Broken host** → `Failed to allocate device vector A (error code named symbol not found)!`

If the GPUs are listed by `nvidia-smi -L` but this sample fails, the GPU is unusable for compute in containers. The problem is the host GPU stack, not the Xisom image.

You can also confirm the fallback in the running container's log:

```bash
docker compose -f docker-compose.release.yml logs inference | grep -E "CUDA failure 500|resolved to CPU"
```

### Fix

This is a host-level GPU-compute problem on the platform driver layer, so the fix is on the host — not in the Xisom stack.

1. Update the **NVIDIA GPU driver** on the host to a current build with full CUDA compute support. On a Windows + WSL2 host, also run `wsl --update` then `wsl --shutdown` afterwards.

   
   Updating Docker / the container platform alone does **not** fix this. The compute
   failure lives in the GPU driver layer below the container runtime. A platform update
   can even reset GPU passthrough settings and make things worse.
   

2. Re-wire the container runtime to the refreshed driver:

   ```bash
   sudo nvidia-ctk runtime configure --runtime=docker
   sudo systemctl restart docker
   ```

3. Re-run the compute sample until it prints `Test PASSED`:

   ```bash
   docker run --rm --gpus all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
   ```

4. Restart the inference container and confirm it picked up the GPU. No config change is needed — the default execution mode auto-selects the best available provider:

   ```bash
   docker compose -f docker-compose.release.yml restart inference
   docker compose -f docker-compose.release.yml logs inference | grep "EP enabled"
   # expect: "TensorRT EP enabled" or "CUDA EP enabled"
   ```

#### While the host is still broken

Run cleanly on CPU instead of fighting the GPU probe. Set `EXECUTION_MODE=cpu` (or `FORCE_CPU=1`) on the inference service and restart. The box keeps predicting on CPU until the driver is fixed.

Do **not** enable strict GPU enforcement (`STRICT_EP=1`) with a GPU mode on a broken host — it will correctly refuse to start rather than silently run on CPU, which is the opposite of what you want during the workaround.

### Prevent

- **Make a silent CPU fallback loud where GPU is mandatory.** On boxes that *must* run on GPU, pin the provider and enable strict enforcement (`EXECUTION_MODE=cuda` + `STRICT_EP=1`) so a missing GPU library fails the container at startup instead of quietly degrading to CPU.
- **Treat the compute sample as the acceptance gate** after any GPU-driver or host update — `nvidia-smi` passing is not sufficient evidence that inference will use the GPU.
- **First run is slow, not stuck.** On first GPU start the engine is compiled and cached (2–3 minutes). Subsequent restarts are fast. Don't mistake the long startup window for the failure above.

### Related

- [Execution provider fell back to CPU](/troubleshooting/execution-provider-fallback/) — when the host GPU is fine but the provider still drops a tier.
- [Hardware Setup](/install-deploy/hardware-setup/) — supported GPU tiers.
- [Observability & Alerts](/operate/monitoring/) — where the active provider is shown.

---

## Running on CPU when GPU expected

> Inference degrades to a lower execution provider (TensorRT → CUDA → CPU) though the GPU works.

### Symptom

The host GPU works (the CUDA compute sample passes), but inference is slower than expected and the **Python Inference** card shows a lower tier than you provisioned:

- You expect TensorRT but see CUDA, or
- You expect a GPU provider but see CPU (fallback) or CPU.

The inference container log often contains a benign-looking line such as:

```
Failed to load library libonnxruntime_providers_tensorrt.so
```

This is the provider chain degrading one tier at a time: **TensorRT → CUDA → CPU**.

This page is for when the host GPU is healthy but the provider still drops a tier.
If the container logs `CUDA failure 500`, the GPU itself is broken — see
[GPU not used — CUDA error 500](/troubleshooting/gpu-cuda-error-500/) first.

### Confirm

Check which provider the running service actually initialised:

```bash
docker compose -f docker-compose.release.yml logs inference | grep -E "Active EP|EP enabled|resolved to"
```

- `TensorRT EP enabled` → top tier, nothing to do.
- `CUDA EP enabled` while you expected TensorRT → the TensorRT provider was dropped (missing native parser library).
- `resolved to CPU only` while the GPU works → both GPU providers were dropped.

The default execution mode is `auto`, which is *designed* to degrade gracefully rather than fail. That safety net is also what hides a missing GPU library — a degraded box keeps running, just slower.

### Fix

1. Decide whether the box **must** run on GPU. If GPU is mandatory, make the fallback loud so the gap is visible instead of silent — pin the provider and enable strict enforcement on the inference service:

   ```
   EXECUTION_MODE=tensorrt   # or: cuda
   STRICT_EP=1
   ```

   Restart inference. The container now refuses to start (instead of degrading) if the requested provider is unavailable — turning a silent slowdown into an obvious startup failure you can act on.

2. If the requested tier is **TensorRT** and it was dropped, the TensorRT provider's native parser library is missing from the image. Use a GPU-enabled inference image profile (TensorRT) for this box rather than the CPU profile. Confirm the image tier matches your hardware tier before redeploying.

3. Restart and re-confirm the active provider:

   ```bash
   docker compose -f docker-compose.release.yml restart inference
   docker compose -f docker-compose.release.yml logs inference | grep "EP enabled"
   ```

### Prevent

- **Match the image profile to the hardware.** The inference image ships in CPU, TensorRT, and Jetson profiles. Deploying the CPU profile on a GPU box caps you at CPU by construction — there is no GPU provider in that image to fall back *from*.
- **Pin + strict on GPU-mandatory boxes.** `EXECUTION_MODE=auto` is the right default for mixed fleets, but on a box that must use the GPU, `EXECUTION_MODE=<tier>` + `STRICT_EP=1` converts an invisible degradation into a fail-fast startup error.
- **Watch the provider chip after every deploy.** The dashboard's Python Inference card reflects the live provider; treat a tier drop there as a deployment defect, not normal variance.

### Related

- [GPU not used — CUDA error 500](/troubleshooting/gpu-cuda-error-500/) — when the host GPU compute layer is actually broken.
- [Hardware Setup](/install-deploy/hardware-setup/) — which GPU tier maps to which image profile.
- [Observability & Alerts](/operate/monitoring/) — reading the active provider and latency.

---

## Datasource down or faulted

> An OPC-UA / MQTT / CSV datasource stops streaming, or a row shows a fault indicator.

### Symptom

One of two things:

- **Enable fails.** Toggling **Enable** on an input datasource row bounces back to **Disabled** and shows an error code such as `model_not_paired` or `adapter_connect_failed`.
- **A running source faults mid-stream.** A previously-streaming datasource stops producing predictions; the row shows a fault indicator and a red banner appears. Under the hood the runtime dropped to a faulted, then idle, state.

Enabling a datasource does **not** start streaming. It prepares the runtime (loads the
model, connects the adapter). You press **Start** to begin inference. A row that is
Enabled but not predicting is often just waiting for Start — that is by design.

### Confirm

Identify which stage failed.

- **Read the error code on the row.** The code names the failed step directly:
  - `model_not_paired` → the datasource has no model assigned.
  - `adapter_connect_failed` → the protocol connection (OPC-UA handshake / MQTT broker connect / Modbus probe) failed — almost always a network or endpoint problem.
- **Check the backend log** for the connect attempt and fault reason:

  ```bash
  docker compose -f docker-compose.release.yml logs backend | grep -iE "datasource|adapter|fault|enable"
  ```

- **A mid-stream fault** comes from a read error on the live source (a tag stopped responding). When that happens the runtime tears down to a clean idle state: the model is unloaded and the adapter is unbound. **Recovery is not automatic** — the source must be re-enabled from scratch.

### Fix

1. **`model_not_paired`** — assign a model to the datasource first, then Enable again. A datasource must be paired with a model before it can run.

2. **`adapter_connect_failed`** — the runtime could not reach the source. The Enable already rolled the row back to Disabled cleanly, so fix the connection and retry:
   - Verify the endpoint address, port, and credentials in the datasource config.
   - Confirm the box has network reachability to the OPC-UA server / MQTT broker / device.
   - For a CSV source, confirm the file path and that the file is present and readable.
   - Use the form's **Test Connection** against the *current* values, then Enable again.

3. **A source that faulted mid-stream** has been torn down to idle — the model was unloaded and the adapter unbound. Re-establish it fully:
   - Fix the underlying source problem (reconnect the sensor / restore the tag / fix the network).
   - **Enable** the datasource again (this reloads the model and reconnects the adapter).
   - Press **Start** to resume streaming.

Only one input datasource can be Enabled at a time. Enabling source B automatically
disables source A. If a source you expected to be running shows Disabled, check whether
enabling a different source switched it off.

### Prevent

- **Pair the model and map all channels before going live.** Most Enable failures are pre-flight validation: a missing model pairing or an incomplete tag-to-channel mapping (gaps or over-fills). Complete these once and Enable succeeds first try.
- **Stabilise the source before Start.** A flaky OPC-UA endpoint or intermittent broker will fault the stream and force a full re-enable. Confirm a stable connection with Test Connection before pressing Start.
- **Expect the two-step flow.** Enable prepares; Start streams. Building this into the operating procedure avoids "it's enabled but nothing's happening" tickets.

### Related

- [Connecting Datasources](/configure/input-datasources/) — adapter types, connection config, tag mapping.
- [Your First Inference](/install-deploy/first-inference/) — model load and run flow.

---

## Frontend / backend version mismatch

> The About panel highlights a version mismatch between the frontend and backend in amber.

### Symptom

In **Settings → About**, the frontend version and the backend version are shown side by side and the row is highlighted in **amber** because they differ, for example:

- Frontend `v1.5.0` vs backend `v1.5.0-1-gabc1234`, or
- Frontend `v1.5.0` vs backend `v1.4.0`.

The panel highlights the mismatch on purpose — it is the built-in drift detector. The two versions are expected to match on a correctly deployed box.

### Confirm

Read the backend's reported version directly and compare it to what the About panel shows for the frontend:

```bash
curl -s http://localhost:5000/api/system/health | jq '{version, commit, buildTime}'
```

There are two common causes:

1. **A stale browser tab.** The frontend bundle is baked at build time, so an old tab keeps showing the version it was loaded with even after a redeploy.
2. **One container redeployed without the other.** A new backend image was deployed but the frontend image (or vice-versa) was not rebuilt — so they genuinely run different versions.

A version string containing extra commits after the tag (`...-1-gabc1234`) means that container was built from a commit *past* the release tag — it is ahead of the clean tagged build.

### Fix

1. **First, rule out a stale tab** — it is the cheapest cause. Hard-refresh the browser (reload ignoring cache). If frontend and backend now match, you are done; nothing was actually wrong with the deployment.

2. **If they still differ, redeploy so both images are the same version.** Rebuild and bring the stack up together so frontend and backend are built from the same release:

   ```bash
   docker compose -f docker-compose.release.yml up -d
   ```

3. **Verify they match.** Reload **Settings → About** — the amber highlight should clear. Confirm the backend version too:

   ```bash
   curl -s http://localhost:5000/api/system/health | jq .version
   ```

Do not ship a build whose version reads `0.0.0-dev...` or ends in `-dirty`. The first
means no release was tagged for that build; the second means it carries uncommitted
local edits. Neither is reproducible — deploy a properly versioned release instead.

### Prevent

- **Deploy frontend and backend together.** Both images carry a version baked in at build time; deploying them as a matched set from one release keeps the About panel green.
- **Hard-refresh after every deploy.** Browser tabs hold the old bundle until reloaded. A quick hard refresh after each deployment avoids false drift reports.
- **Use the About panel as a post-deploy check.** Treat it as the final acceptance step — if it is amber, the deployment is not finished.

### Related

- [Release Notes](/release-notes/) — what shipped in each version.
- [Observability & Alerts](/operate/monitoring/) — runtime health alongside version.

---

## Disk fills / database grows unbounded

> The observability database grows far beyond its live data and fills the disk.

### Symptom

The box runs low on disk, and the observability database file is enormous relative to how much data it should hold — for example tens of GB on disk while only ~1 GB of predictions are actually retained. Deleting old rows (or shortening retention) does **not** shrink the file.

The cause: old rows are deleted on schedule, but on a database created without incremental auto-vacuum the freed pages are kept in an internal free-list and never returned to the operating system. The file stays at its high-water mark forever.

### Confirm

Check the database file size inside the backend container against the expected live size:

```bash
docker compose -f docker-compose.release.yml exec backend \
  sh -c 'ls -lh /data/aiboard.db'
```

If the file is many times larger than the volume of data your retention window should hold (a multi-GB file for a few days of predictions), you have free-list bloat. A single high-rate write burst can inflate the file far past steady-state size.

### Fix

Current releases reclaim space automatically after each retention sweep, so a healthy box self-corrects. A database that bloated **before** that behavior was in place needs a one-time reclaim.

1. **Stop the backend** so the database is not being written during the reclaim:

   ```bash
   docker compose -f docker-compose.release.yml stop backend
   ```

2. **Run the one-time reclaim** against the database file. This converts the file to incremental auto-vacuum and compacts it, returning the free-list pages to the OS:

   ```bash
   docker compose -f docker-compose.release.yml run --rm --entrypoint sh backend -c \
     'sqlite3 /data/aiboard.db "PRAGMA auto_vacuum=INCREMENTAL; VACUUM;"'
   ```

3. **Verify** the file shrank and the data is intact:

   ```bash
   docker compose -f docker-compose.release.yml run --rm --entrypoint sh backend -c \
     'ls -lh /data/aiboard.db; sqlite3 /data/aiboard.db "PRAGMA integrity_check;"'
   # expect a much smaller file and: ok
   ```

4. **Start the backend** again:

   ```bash
   docker compose -f docker-compose.release.yml start backend
   ```

The one-time reclaim is only needed for a database that grew before automatic
reclaim was in place. Fresh deployments are created able to shrink and keep
themselves compact after each retention sweep.

### Prevent

- **Keep retention bounded.** The retention window (`InferenceObservability:RetentionDays`, default 3 days) deletes old prediction rows on a schedule; the post-delete reclaim returns the freed pages to disk so the file stays lean.
- **Cap row count as a burst guard.** `InferenceObservability:MaxRows` (default 5,000,000) trims to the newest N rows even inside the time window, so a sudden high-rate spike cannot fill the disk before the time-based sweep runs. Lower it if your box has limited disk.
- **Be careful with throughput/stub testing.** High-rate write bursts (for example sensor-pipeline throughput tests) are what inflate the file in the first place. Avoid leaving a high-rate test running against persisted storage on a production box.

### Related

- [Observability & Alerts](/operate/monitoring/) — what the observability database stores.

---

## Container marked unhealthy but service works

> A container reports unhealthy while it serves traffic fine — a health-probe false negative.

### Symptom

A container — typically the frontend — shows unhealthy in `docker ps`, yet the application is reachable and serving normally in the browser. The health probe logs a "connection refused" even though the service is up.

The cause: the probe targets `localhost`, which resolves to **both** IPv6 (`::1`) and IPv4 (`127.0.0.1`). The probe tool tries IPv6 first, but the server inside the container listens on IPv4 only — so the IPv6 attempt is refused and the probe wrongly reports the container down.

### Confirm

Check the container's reported health versus its real reachability:

```bash
docker ps --format '{{.Names}}\t{{.Status}}' | grep aiboard
```

If a container reads `(unhealthy)` but the app responds when you hit it directly on IPv4, it is this false negative:

```bash
# from inside the container — IPv4 explicitly
docker compose -f docker-compose.release.yml exec frontend \
  sh -c 'wget -qO- http://127.0.0.1:8080/ >/dev/null && echo "IPv4 OK"'
```

If `IPv4 OK` prints, the service is healthy and only the probe was wrong.

### Fix

Current images already probe IPv4 explicitly, so a healthy box should not hit this. If you do see it on an older image:

1. **Confirm the service is actually serving** (the IPv4 check above prints `IPv4 OK`). If it does, no application action is needed — the container is healthy.

2. **Update to a current release image**, where the health probe targets `127.0.0.1` directly and the false negative is gone:

   ```bash
   docker compose -f docker-compose.release.yml up -d
   ```

3. **Re-check** the status clears:

   ```bash
   docker ps --format '{{.Names}}\t{{.Status}}' | grep aiboard
   ```

Because the probe is what was wrong (not the service), a dependent container waiting on
this one's health can be held back even though everything is actually serving. Fix the
probe rather than disabling the healthcheck.

### Prevent

- **Probe the address the server actually binds.** Health probes should target `127.0.0.1` explicitly when the server listens on IPv4 only, so name resolution can't send the probe to an unbound IPv6 address.
- **Distinguish probe failures from service failures.** Before reacting to an `unhealthy` status, confirm with a direct IPv4 request. A green app behind a red probe is a probe bug, not an outage.

### Related

- [Observability & Alerts](/operate/monitoring/) — interpreting health and status signals.

---

## API Reference

> REST and gRPC APIs exposed by the Xisom runtime.

All HTTP endpoints are versioned under `/api/v1/` and require a bearer token. gRPC services are exposed on port `50051`.

### Authentication

```http
Authorization: Bearer <token>
```

Obtain tokens via the **Admin → API Keys** page or the `/api/v1/auth/token` endpoint.

### Inference

**POST** `/api/v1/inference`

**GET** `/api/v1/inference/{id}`

### Models

**GET** `/api/v1/models`
**POST** `/api/v1/models`
**POST** `/api/v1/models/{name}/promote`

### Datasources

**GET** `/api/v1/datasources`
**POST** `/api/v1/datasources`
**DELETE** `/api/v1/datasources/{id}`

### Observability

**GET** `/api/v1/stats/inference`

### OpenAPI

The full OpenAPI spec is published at `/api/v1/openapi.json` on every running device.

---

## FAQ

> Frequently asked questions about Xisom Edge AI Box & Runtime.

### Do I need a GPU?

No. CPU-only inference works on the Lite tier. GPUs are recommended for vision models and high-throughput workloads.

### Can I run without internet?

Yes. The runtime is fully air-gappable. License activation can be done via offline file.

### Which model formats are supported?

ONNX is the primary format. Compiled TensorRT engines and sandboxed Python handlers are also supported.

### How is data secured?

All traffic on the device is mTLS. At-rest data is encrypted (LUKS) on supported hardware. RBAC is enforced for every API call.

### How do I update?

Updates are signed container images. Roll forward and rollback are one-click in the dashboard.

### What SLAs are offered?

Standard, Business, and 24/7 Critical. See [Support](/support/).

---

## Release Notes

> Operator-relevant changes for the Xisom Edge AI Box & Runtime.

What changed in each release, filtered to what matters for operators running an edge
box. For how versions are reported on your device, see
[Versions & Updates](/operate/versioning/).

Entries focus on operator-visible behavior — new capabilities, deployment changes,
and fixes you might notice. Internal refactors and test-only changes are omitted.

### Latest

#### Execution-provider modes for edge hardware

- The runtime now classifies its execution provider explicitly: **TensorRT**, **CUDA**,
  or **CPU**. GPU boxes prioritize GPU acceleration by default and fall back gracefully
  if a GPU library is unavailable.
- A strict mode can turn a silent CPU fallback into a startup failure, so a GPU box
  never quietly runs on CPU.
- The dashboard now shows the **active execution provider** so you can confirm what
  your box is really using. See [Monitoring](/operate/monitoring/).

#### CSV output datasource

- A new **CSV output** writes prediction results to rotating CSV files, with
  configurable file size, rotation, and flush behavior. See
  [Output Datasources](/configure/output-datasources/).

#### Reliability & observability

- **Honest CPU temperature.** When a box has no readable CPU thermal sensor, the
  dashboard now shows **N/A** instead of a misleading `0 °C`.
- **Disk space reclamation.** The metrics database now returns freed space to the
  operating system after its retention sweep, and caps total rows so a traffic burst
  can't fill the disk.
- **Accurate runtime status.** The Model Manager runtime card no longer shows a false
  "Desync" when idle or warming up.

#### Hardened containers

- All services now run as **non-root** users.
- The web frontend listens on an unprivileged port internally (host ports are
  unchanged).

**One-time step on upgrade**

If you are upgrading from an older, root-based deployment, storage volumes need a
one-time ownership fix on first redeploy. Fresh installs are unaffected. See the
[Offline Bundle Install](/install-deploy/offline-bundle-install/) update flow.

### v0.5 — Inference observability dashboard

- Latency decomposition (queue / pre-process / model exec / post-process)
- Window & bucket controls for time aggregation
- Per-model latency thresholds

### v0.4 — Datasource platform

- OPC-UA, MQTT, and Modbus TCP adapters
- Rate limiting and backpressure reporting

### v0.3 — Model lifecycle

- Model versioning and staged promotion

### v0.2 — Web dashboard

- Live, push-driven dashboard updates
- Role-based access control and an audit log

---

## Security

> How Xisom protects access to your models, data, and operations — authentication, access control, hardened containers, and the audit trail.

Xisom is built for an industrial network and hardened against the OWASP Top 10. This
page summarizes what protects your box and what you, as the operator, are responsible
for.

### Authentication

Two separate sign-in paths, split by purpose:

- **Operators and admins** sign in to the dashboard and API with a username and
  password and receive a **JWT** bearer token. Passwords are stored only as a
  BCrypt hash — never in plain text.
- **Partner systems** that call the external API use a **static API key**, sent in a
  request header and stored only as a BCrypt hash.

**Protect the JWT signing secret**

Each deployment has its own JWT signing secret, generated on the box at install time
and stored in a protected file. Treat it like any other secret — never share or
commit it. Rotating it signs everyone out, which is the intended effect.

### Access control

- **Deny by default.** Every endpoint requires a signed-in user unless it is
  explicitly marked public. Forgetting to protect an endpoint does not expose it.
- **Role-based.** Admin-only actions (such as sending output test writes or managing
  keys) are gated to admin accounts.
- Only sign-in and the health check are reachable without authentication.

### Brute-force protection

The login endpoint is rate-limited per source IP — repeated **failed** logins exhaust
a short window, while normal use (and an operator with several tabs open) is not
penalized. This is always on in production.

### Hardened containers

- All service containers run as **non-root** users.
- The web frontend listens on an unprivileged port inside its container.
- Service images carry health checks so the platform can detect a sick service.

### Secrets handling

- User passwords and API keys are stored only as BCrypt hashes.
- Connection-config secrets (datasource passwords, tokens) are **masked as `***`**
  when read back; the stored value is retained when you save without changing it.
- API key values are shown to you **once** at creation — store them immediately.

### Audit trail

Every login outcome (success and failure) and every external API call is recorded in
the audit log, so you can review who accessed the system and when.

### Production hardening checklist

For a production edge deployment, confirm:

- TLS termination at a reverse proxy in front of the dashboard and API.
- A strong, unique JWT signing secret per deployment (generated automatically by the
  installer).
- The interactive API explorer (Swagger) is disabled unless a partner integration
  needs it.
- Datasource connections use authenticated brokers / secured OPC-UA policies — not
  anonymous access.
- Regular backups of the data volume before any destructive operation.

**Threat model**

The platform is designed for an internal industrial network, not direct exposure to
the public internet. Place it behind your plant network's perimeter controls.

### Reporting a vulnerability

Email **security@xisom.ai** for responsible disclosure.

---

## Support

> How to reach the Xisom team and get help.

### Channels

- **Email:** support@xisom.ai
- **Status page:** [status.xisom.ai](https://status.xisom.ai)
- **Community:** [github.com/xisom/discussions](https://github.com/xisom/discussions)

### SLA tiers

| Tier | First response | Coverage |
|------|----------------|----------|
| Standard | 1 business day | 9–6 KST |
| Business | 4 business hours | 9–6 KST + Sat |
| Critical | 1 hour | 24/7 |

### When opening a ticket

Please include:

1. Device serial or fleet ID
2. Runtime version
3. Reproduction steps
4. Relevant logs (download from **Admin → Diagnostics**)