# CV Platform — Agent API Guide

Reference for calling the CV Platform API with an API key. Covers the object model, authentication, the primary end-to-end workflow, every available endpoint, and error recovery.

> **Prefer a machine-readable spec or a try-it-out UI?**
> - OpenAPI 3.1 spec: [https://api.mlpractitioners.com/openapi.yaml](https://api.mlpractitioners.com/openapi.yaml)
> - Interactive Swagger UI: [https://api.mlpractitioners.com/docs](https://api.mlpractitioners.com/docs)
>
> The OpenAPI spec covers the same API-key-callable surface described in §4 below, with schemas for the response envelope, error codes, and every request/response shape. This document is the narrative companion — read it for workflow, recipes, and error recovery.

## Contents

| § | Section | Read this when you need to... |
|---|---|---|
| 0 | [Do you have an API key yet?](#0-do-you-have-an-api-key-yet) | Confirm the user gave you an `mlp_live_...` key. If not, paste them the ask. |
| 1 | [What this platform is](#1-what-this-platform-is) | Learn the object model, the lifecycle diagram, and the rules you must follow before writing any code |
| 2 | [Talking to the API](#2-talking-to-the-api) | Send authenticated requests, parse the response envelope, use `X-Request-Id`, respect rate limits |
| 3 | [Recipe: Train a detector from scratch](#3-recipe-train-a-detector-from-scratch) | Walk a user through the main workflow — upload, label, train, download |
| 4 | [Endpoint cheatsheet](#4-endpoint-cheatsheet) | Look up the method, path, and body shape of any endpoint the recipe does not cover |
| 5 | [Error playbook](#5-error-playbook) | Figure out what to do when you receive an error code |
| 6 | [Limits and platform behavior](#6-limits-and-platform-behavior) | Know file size / rate caps, workspace rules, and idempotency behavior |

Recommended first read: §0 → §1 → §2 → §3. Sections 4–6 are reference; consult them on demand.

---

## 0. Do you have an API key yet?

Every endpoint in this guide requires a Bearer token that starts with `mlp_live_`. **Stop here and confirm the user gave you one before doing anything else.**

**If the user has already supplied a key**, skip to §1. Run `GET /me` (§2 → Authentication) as your very first call so you can fail fast on a bad key.

**If the user has not given you a key**, paste the following to them verbatim and wait for a reply:

> To use the MLPractitioners CV platform I'll need an API key. If you already have one, paste it here. If not: subscribe at https://www.mlpractitioners.com/pricing.html, then create a key in the **AI Access Keys** tab at https://api.mlpractitioners.com/account. Keys start with `mlp_live_`. Paste it back to me when ready.

Do not invent a key, do not retry without one, and do not start any other workflow until the user has supplied one. Once they reply with a key, jump to §1.

---

## 1. What this platform is

CV Platform lets users train custom computer vision models on their own images without writing ML code. Through this API you can:

- Create projects and upload images to them.
- Set up labeling tasks (object detection today, more task types coming).
- Start training jobs on labeled data and track them to completion.
- Download trained models as `.onnx` or `.pt` files.
- Run inference with a trained model on individual images or in batch against a project's images.
- Read the user's token balance to decide whether training is affordable.

You cannot create labels programmatically. Labeling is done by the user in the labeler UI at `https://api.mlpractitioners.com/projects` (open the project, then a label set, then an image). Your role on that step is to stop, instruct the user, and then poll for completion.

### Objects

| Object | What it is |
|---|---|
| **Workspace** | Container for one team's projects and members. Every user has a Personal workspace and may be a member of any number of shared workspaces. Projects live in a workspace. Training tokens are personal — see "Tokens in one paragraph" below. |
| **Project** | Container for one dataset. Every project belongs to exactly one workspace. |
| **Image** | One uploaded photo, stored in a project. |
| **LabelSet** | A labeled view of a project's images for a specific task (e.g. object detection with classes `["crack", "dent"]`). A project can have many LabelSets. |
| **Label** | One annotation on one image for one LabelSet (e.g. a bounding box with a class). |
| **TrainingJob** | A request to train a model from a LabelSet. Asynchronous. |
| **Model** | A trained `.onnx` or `.pt` file produced by a successful TrainingJob. |
| **Inference** | Running a trained Model against images to get predictions. |
| **Invite** | A pending invitation for a user (by email) to join a workspace. Accepted or declined by the recipient. |

### Workspaces in one paragraph

Every project lives in a workspace. Each user has a **Personal** workspace (created by default, and recreated empty on next sign-in if deleted) and can additionally be a member of one or more **Shared** workspaces created and managed by other users. The Personal workspace is now full-parity with shared workspaces — it can have members invited to it and can be deleted. When you don't pass `workspaceId`, project list and create default to the caller's Personal workspace; pass `workspaceId` explicitly to read or write inside a shared workspace the user belongs to.

### Tokens in one paragraph

Tokens are **personal** — every user has exactly one pool (monthly allowance refilled by their subscription, plus any purchased bundles that never expire). When a training run starts, the bill goes to the **owner of the workspace the project lives in**, not to whoever triggered the run. Practical consequence: if you act inside a shared workspace, you'll see the owner's tokens get spent (the run's `actor` is recorded for audit), and you don't need your own subscription to train there. `GET /tokens/*` always reports the calling user's personal pool — there is no per-workspace token endpoint.

### Lifecycle and endpoints

```mermaid
flowchart TD
    Start([You have an API key])
    Start -->|"POST /projects"| P[Project]
    P -->|"POST /projects/:id/images<br/>(multipart upload)"| I["Images<br/>(many per project)"]
    P -->|"POST /projects/:id/labelsets"| L["LabelSet<br/>(many per project)"]
    I -.->|"labeled by the user in the UI<br/>(no API for creating labels)"| Lab["Labels<br/>(one per image per LabelSet)"]
    L -.-> Lab
    L -->|"POST /label-sets/:id/training"| T["TrainingJob<br/>(many per LabelSet)"]
    T -->|"POST /training/:id/start<br/>then poll GET /training/:id"| M["Model<br/>(.onnx or .pt)"]
    M -->|"GET /training/:id/download"| Done([Download the model file])
    M -->|"POST /training/:id/infer/:imageId<br/>or POST /training/:id/infer-all"| Inf[Inference results]
```

### Rules the agent must know up front

1. **Labels come from the UI, not the API.** There is no endpoint that lets you create labels on behalf of the user. When a LabelSet needs labeling, stop and tell the user to label in the browser, then poll `GET /label-sets/:id` to check when they are done.
2. **Training is asynchronous.** Start it, poll `GET /training/:id` on an interval, act on the terminal status (`completed` or `failed`). Typical duration 10–60 minutes.
3. **A LabelSet can have many TrainingJobs.** Retraining with different settings is normal and does not discard existing labels.
4. **Workspace defaults are implicit.** When you call `GET /projects` or `POST /projects` without `workspaceId`, you operate on the user's Personal workspace. To target a shared workspace the user belongs to, pass `?workspaceId=…` (read) or `workspaceId` in the JSON body (create). Use `GET /workspaces` to discover workspace IDs. The `/tokens/*` endpoints don't take a `workspaceId` — see "Tokens in one paragraph" above.

---

## 2. Talking to the API

### Base URL

All paths in this guide are relative to:

```
https://api.mlpractitioners.com
```

### Authentication

Send your API key in the `Authorization` header on every request:

```
Authorization: Bearer mlp_live_xxxxxxxxxxxxxxxxxxxxxxxx
```

API keys start with the prefix `mlp_live_`. Before doing anything else, call `GET /me` to confirm the key is valid:

```json
{
  "ok": true,
  "data": { "userId": "…", "email": "…", "authSource": "apiKey" },
  "requestId": "…"
}
```

If `authSource` is `"apiKey"`, you are authenticated via Bearer token and should proceed. A missing or bad key returns `401 UNAUTHENTICATED`.

### Response envelope

Every JSON response (success or error) uses this shape:

**Success** — HTTP 200, 201, or 207:

```json
{
  "ok": true,
  "data": <endpoint-specific payload>,
  "requestId": "01H…"
}
```

**Error** — HTTP 4xx or 5xx:

```json
{
  "ok": false,
  "error": {
    "code": "STABLE_MACHINE_CODE",
    "message": "Human readable message",
    "details": { /* optional, code-specific */ }
  },
  "requestId": "01H…"
}
```

Rules:

- Read the envelope before looking at the HTTP status. `ok: false` is always an error regardless of anything else.
- Branch on `error.code`, not on `error.message`. Messages may change; codes are stable.
- `HTTP 207 Multi-Status` is used for batch uploads where some items succeeded and some failed. The envelope is still `ok: true` and `data` contains both arrays. See the recipe for handling.
- **File downloads (`GET /training/:id/download`) do NOT use this envelope.** They return an HTTP redirect (302) to a temporary download URL. Follow it with curl `-L` and save the binary response.

### Request tracing with `X-Request-Id`

Every response includes an `X-Request-Id` header whose value equals the `requestId` field in the envelope. When reporting outcomes or errors to the user, include the request IDs of key operations (training start, model download, any failed call). They map 1:1 to rows in the server-side request log, so the user or support can trace exactly what happened.

### Rate limits

Per API key: **60 requests per minute**, with a burst capacity of **120** (so you can spike up to 120 in quick succession, then refill at 1/second).

When you exceed the limit, you get:

- HTTP `429 Too Many Requests`
- `Retry-After` response header (seconds)
- Envelope error code `RATE_LIMITED` with `error.details.retryAfterSeconds`

Recommended behavior:

1. On a single 429, sleep `retryAfterSeconds` and retry the same call once.
2. On a second 429 for the same operation, back off further (e.g., 2× the reported delay) or stop and tell the user.
3. During any long-running loop (polling training, uploading many images), pace your calls — never spin.

---

## 3. Recipe: Train a detector from scratch

This is the primary end-to-end workflow. Follow it step by step when the user's request matches the pattern below.

**Scenario.** The user has a folder of images of some object (defects, parts, animals, pallets, etc.) and wants a trained model that can detect that object in new images. They may or may not have labels yet.

**Recognize this pattern when the user says** things like *"train a model to detect X"*, *"I have photos of Y, help me build a detector"*, *"learn what these images contain"*. If the user says they *already have a trained model* and just want to run it on new images, this is the wrong recipe — see the batch-inference recipe (coming soon).

### Step 0. Verify your setup

```http
GET /me
Authorization: Bearer mlp_live_...
```

Expected response:

```json
{
  "ok": true,
  "data": { "userId": "...", "email": "...", "authSource": "apiKey" },
  "requestId": "..."
}
```

If `authSource !== "apiKey"` or you get 401, stop and tell the user their key is missing or invalid. If `ok: true`, continue.

### Step 1. Confirm there are enough training tokens

Training consumes tokens. Check the balance first so you can stop early and ask the user to top up instead of starting a workflow you cannot finish.

`GET /tokens/summary` reports the calling user's personal pool. If you'll be running training inside a shared workspace, the **owner of that workspace** is the one whose pool will be debited — so confirm with the user that the owner has tokens, or train inside one of the user's own workspaces. Get workspace ownership info from `GET /workspaces`.

```http
GET /tokens/summary
```

Response:

```json
{
  "ok": true,
  "data": {
    "monthly":   { "remaining": 25, "periodEnd": "2026-05-01T00:00:00Z" },
    "purchased": { "remaining": 0 },
    "totalAvailable": 25,
    "renewal": "2026-05-01T00:00:00Z"
  },
  "requestId": "..."
}
```

Heuristic: **require at least 1 token in `totalAvailable`** before starting the workflow; training costs 1 token per run minimum. If `totalAvailable === 0`, stop and tell the user to purchase a token bundle or wait until `renewal`.

### Step 2. Create the project

```http
POST /projects
Content-Type: application/json

{ "name": "warehouse-pallets" }
```

Without `workspaceId`, the project is created in the user's Personal workspace. To create it inside a shared workspace the user belongs to, pass `workspaceId`:

```json
{ "name": "warehouse-pallets", "workspaceId": "ws_abc123…" }
```

Name must be unique within a workspace.

Response:

```json
{
  "ok": true,
  "data": {
    "id": "9f2a...",
    "name": "warehouse-pallets",
    "workspace_id": "...",
    "workspace_name": "Personal",
    "is_shared": false,
    "createdAt": "..."
  },
  "requestId": "..."
}
```

**Save `data.id` as `PROJECT_ID`** — you need it for every following step. The `workspace_id` / `workspace_name` / `is_shared` fields are informational; you don't need to repeat them on subsequent calls because the project ID is enough to locate the workspace.

Possible errors:
- `PROJECT_NAME_TAKEN` (409): the workspace already has a project by this name. Suggest a variant or ask them to pick a different one.
- `VALIDATION_FAILED` (400): `name` missing or empty.
- `FORBIDDEN` (403): user is not a member of the requested `workspaceId`. Use `GET /workspaces` to list valid workspaces.

### Step 3. Upload the images

Upload images in batches using `multipart/form-data`:

- Field name: `files` — up to **20 files per request**, max **50 MB per file**. Send multiple files by repeating the field (`-F "files=@a.jpg" -F "files=@b.jpg"`), **not** with `files[]`.
- Field name: `thumbnails` — optional; one per `files` in the same order. If omitted, the server auto-generates a JPEG thumbnail (≤512px) from the uploaded image at upload time and returns it as `thumbnail_url`.
- Supported formats: JPEG, PNG, WebP.

```bash
curl -X POST "https://api.mlpractitioners.com/projects/$PROJECT_ID/images" \
  -H "Authorization: Bearer $KEY" \
  -F "files=@pallet_001.jpg" \
  -F "files=@pallet_002.jpg" \
  -F "files=@pallet_003.jpg"
```

Response on full success (201):

```json
{
  "ok": true,
  "data": {
    "results": [
      { "id": "img-1", "filename": "pallet_001.jpg", "width": 1920, "height": 1080, "split": "TRAIN", "url": "...", "thumbnail_url": "..." },
      ...
    ],
    "errors": []
  }
}
```

Response on partial success (207 Multi-Status):

```json
{
  "ok": true,
  "data": {
    "results": [ /* succeeded uploads */ ],
    "errors": [ { "filename": "pallet_042.jpg", "error": "Upload failed" } ]
  }
}
```

Rules:

- On a 207 response, **retry only the files listed in `errors`** — the files in `results` are already saved; re-uploading them would create duplicates (or fail the `projectId + filename` uniqueness check).
- If the user has more than 20 images, loop: slice their input into batches of ≤ 20, upload each, accumulate `results` and `errors` across batches, report the overall summary to the user.
- All uploads default to the `TRAIN` split. To split into train/val/test later, use `PATCH /images/split` or `PATCH /images/:id/split`.

### Step 4. Create the label set

A label set is the user's labeling task. For a detector, use `type: "object_detection"`.

```http
POST /projects/:PROJECT_ID/labelsets
Content-Type: application/json

{
  "name": "pallets-v1",
  "type": "object_detection",
  "classes": [
    { "name": "pallet", "color": "#FF5722" }
  ]
}
```

Valid `type` values: `object_detection`, `segmentation`, `classification`, `pose`, `obb`. For this recipe always use `object_detection`.

`classes` is an array of `{ name, color }`. Pick class names from what the user described. Colors can be any hex string — they are only used in the labeler UI.

Response:

```json
{
  "ok": true,
  "data": {
    "id": "ls-1",
    "name": "pallets-v1",
    "type": "object_detection",
    "classes": [ { "name": "pallet", "color": "#FF5722" } ],
    "label_count": 0,
    "createdAt": "..."
  }
}
```

**Save `data.id` as `LABEL_SET_ID`.**

Possible errors:
- `LABEL_SET_NAME_TAKEN` (409): duplicate name within this project.
- `VALIDATION_FAILED` (400): `name` or `type` missing, or `type` not in the allowed list.

### Step 5. HANDOFF — the user must label the images in the UI

**Stop here. There is no API to create labels.** Labeling is done visually in the browser. Tell the user exactly this:

> Your images are uploaded and the label set is ready. Please open
> **https://api.mlpractitioners.com/projects**, switch to the `warehouse-pallets` project's
> workspace using the switcher in the header (top of the page), open
> the project, choose label set `pallets-v1`, and draw bounding boxes
> around every pallet in the training images. When you are done (or
> want to train on what you have so far), tell me and I will continue.

While waiting, you can poll progress by listing the project's label sets and reading `label_count`:

```http
GET /projects/:PROJECT_ID/labelsets
```

Response (per label set):

```json
{ "id": "ls-1", "name": "pallets-v1", "label_count": 0, "training_count": 0, ... }
```

A reasonable polling strategy:

- Do not poll faster than **once every 5 minutes** for labeling — humans label slowly and you will waste rate-limit budget.
- For a decent detector, expect the user to need **at least ~50 labeled images** per class. If they proceed with fewer, warn them that accuracy will be poor.
- Do not require completion — if the user says "proceed now", continue even if not every image is labeled.

### Step 6. Create the training job

```http
POST /label-sets/:LABEL_SET_ID/training
Content-Type: application/json

{
  "name": "pallets v1 first train",
  "model_size": "n",
  "epochs": 50,
  "batch_size": 16,
  "image_size": 640
}
```

All fields are optional; defaults are `model_size: "n"`, `epochs: 50`, `batch_size: 16`, `image_size: 640`. These defaults are sensible — do not override unless the user asks. `model_size` options are `n` (nano, fastest), `s`, `m`, `l`, `x` (largest, slowest, most accurate).

Response (201):

```json
{
  "ok": true,
  "data": {
    "id": "tj-1",
    "name": "pallets v1 first train",
    "label_set_id": "ls-1",
    "status": "pending",
    "base_model": "yolo11n",
    "epochs": 50,
    "batch_size": 16,
    "image_size": 640,
    "metrics": {},
    "log": "Fine-tuning created",
    "progress": 0,
    "createdAt": "..."
  }
}
```

**Save `data.id` as `TRAINING_ID`.** The job is in status `pending` — created but not yet started.

### Step 7. Start training

```http
POST /training/:TRAINING_ID/start
```

(no body). This builds the dataset from the labeled images, submits it to the training backend, and transitions the job into `preparing` → `queued` → `training`.

Response (200): a refreshed job object with `status: "preparing"` (or later).

Possible errors:
- `INVALID_TRAINING_STATE` (400): the job is already running or already completed. Get the job detail (`GET /training/:id`) to see current status and react accordingly.
- `INSUFFICIENT_TOKENS` (400): the user's `totalAvailable` dropped below 1 since step 1 (concurrent workflow or expired monthly tokens). Stop and tell the user.
- `UPSTREAM_ERROR` (500): the training backend is unreachable. Retry once after 30 seconds. On second failure, stop and tell the user with the `requestId`.

### Step 8. Poll training to completion

Training is asynchronous and takes **10–60 minutes typically**, sometimes longer for large datasets or larger `model_size`.

```http
GET /training/:TRAINING_ID
```

Response includes `status`, `progress` (0.0 to 1.0), `metrics` (populated as training runs), and `log` (newline-joined).

`status` transitions, in order:

| Status | Meaning |
|---|---|
| `pending` | Created but not yet started. |
| `preparing` | Dataset is being built and uploaded. |
| `queued` | Waiting for a training worker. |
| `training` | Actively training. `progress` advances 0 → 1. |
| `completed` | Terminal. Model is ready to download. |
| `failed` | Terminal. `error` field explains why. |

Polling rules:

- Poll every **30 seconds**. Do not poll faster — you will not see state change more often.
- Terminal states are `completed` and `failed`. Stop polling at either.
- If still not terminal after **2 hours**, surface to the user with the training ID and `requestId` so they can check the account page.
- On `failed`, read `data.error` and `data.log` to explain to the user what went wrong.

### Step 9. Download the trained model

Once `status === "completed"`:

```http
GET /training/:TRAINING_ID/download?format=onnx
```

`format` is optional; valid values are `onnx` (default, portable) or `pth` (PyTorch weights).

**This endpoint does NOT return an envelope.** It responds with an HTTP redirect (302) to a short-lived download URL. Follow redirects and save the binary body.

```bash
curl -L -o pallets_v1.onnx \
  -H "Authorization: Bearer $KEY" \
  "https://api.mlpractitioners.com/training/$TRAINING_ID/download?format=onnx"
```

Possible errors:
- `TRAINING_NOT_FOUND` (404): bad ID or not owned by this user.
- `TRAINING_NOT_FINISHED` (404): the job's status is not yet `completed`. Go back to step 8 and keep polling.
- `NOT_FOUND` (404): training completed but the model artifact is not in storage yet — rare. The server retries the save on download; if you still see this, wait 30 s and retry once, then report to the user with the `requestId`.

### Recap — things you must save

| Name | From | Used for |
|---|---|---|
| `PROJECT_ID` | Step 2 | image upload, label set creation |
| `LABEL_SET_ID` | Step 4 | training creation, labeling progress polling |
| `TRAINING_ID` | Step 6 | starting, polling, downloading |
| `requestId` of training start (Step 7) | response header / envelope | include in any report or support message |

---

## 4. Endpoint cheatsheet

Every endpoint an API key can hit, grouped by resource. All paths are relative to `https://api.mlpractitioners.com`. All requests need `Authorization: Bearer mlp_live_...`. All responses use the envelope described in §2 unless noted.

`:param` = URL path parameter. `?param` = query string. `body: {...}` = JSON body. `multipart: ...` = `multipart/form-data`.

### Auth / identity

| Method | Path | Body / Params | Purpose |
|---|---|---|---|
| GET | `/me` | — | Verify key, return `{ userId, email, authSource }`. Call this first. |
| GET | `/health` | — | Liveness probe. Does not require auth. |

### Workspaces

A user's Personal workspace is auto-created and always present. Shared workspaces are created by the user (`POST /workspaces`) or joined by accepting an invite from another user (`POST /invites/:id/accept`). Project, image, label-set, training, and inference endpoints work transparently as long as the user is a member of the project's workspace — there's no per-call workspace header to send.

| Method | Path | Body / Params | Purpose |
|---|---|---|---|
| GET | `/workspaces` | — | List every workspace the user belongs to (Personal + any shared). Each entry includes `id`, `name`, `role` (`owner` or `member`), `isPersonal`, `memberCount`, `projectCount`. |
| POST | `/workspaces` | `body: { name }` | Create a new shared workspace owned by the user. |
| GET | `/workspaces/:id` | — | Workspace detail. |
| PATCH | `/workspaces/:id` | `body: { name }` | Rename a workspace (owner only). |
| DELETE | `/workspaces/:id` | — | Permanently delete a workspace and its projects (owner only). Personal workspaces can be deleted too; an empty one is recreated on next sign-in. |
| GET | `/workspaces/:id/members` | — | List members. |
| DELETE | `/workspaces/:id/members/:userId` | — | Remove a member (owner can remove anyone but the owner; a non-owner can remove only themselves to leave the workspace). |
| GET | `/workspaces/:id/invites` | — | List pending invites for this workspace (owner only). |
| POST | `/workspaces/:id/invites` | `body: { email }` | Invite an existing user to the workspace by email (owner only). The user must already have an account. |
| DELETE | `/workspaces/:id/invites/:inviteId` | — | Cancel a pending invite (owner only). |

### Invites (recipient-side)

| Method | Path | Body / Params | Purpose |
|---|---|---|---|
| GET | `/invites/me` | — | List pending invites addressed to the caller's email. Each entry includes `id`, `workspace.{id,name}`, `invitedBy.{id,email}`, `createdAt`. |
| POST | `/invites/:id/accept` | — | Accept an invite. Adds you to the workspace as a member. Returns `{ workspaceId, role: "member", accepted: true }`. |
| POST | `/invites/:id/decline` | — | Decline (and remove) an invite. |

### Projects

`workspaceId` is optional everywhere it appears below; omit it to operate on the user's Personal workspace.

| Method | Path | Body / Params | Purpose |
|---|---|---|---|
| GET | `/projects` | `?workspaceId=<id>` | List projects the user can see. With no query, this unions across every workspace the user belongs to. With `workspaceId`, restrict to that one workspace. Each project carries `workspace_id`, `workspace_name`, and `is_shared`. |
| POST | `/projects` | `body: { name, workspaceId? }` | Create a project. With no `workspaceId`, the project lands in the user's Personal workspace. |
| GET | `/projects/:id` | — | Project detail with label sets, images, models summary, plus `workspace_id` / `workspace_name` / `is_shared`. |
| PATCH | `/projects/:id` | `body: { name }` | Rename a project. |
| DELETE | `/projects/:id` | — | Delete a project. |
| GET | `/projects/:id/images` | — | List images in a project (each entry includes a temporary `url`). |
| POST | `/projects/:id/images` | `multipart: files (<=20, <=50MB each) + thumbnails (optional)` | Bulk upload images. Field names are `files` / `thumbnails` (no `[]`). Thumbnails are auto-generated when `thumbnails` is omitted. Returns 207 on partial failure. |
| GET | `/projects/:id/labelsets` | — | List label sets in a project (includes `label_count`, `training_count`). |
| POST | `/projects/:id/labelsets` | `body: { name, type, classes? }` | Create a label set. `type` in `object_detection`, `segmentation`, `classification`, `pose`, `obb`. |
| GET | `/projects/:id/background-sets` | — | List background sets (negative-example collections). |
| POST | `/projects/:id/background-sets` | `body: { name }` | Create a background set. |

### Images

| Method | Path | Body / Params | Purpose |
|---|---|---|---|
| POST | `/images` | `body: { projectId, filename, contentType, data }` | Single-image upload via base64. Prefer `/projects/:id/images` multipart for bulk. |
| GET | `/images/:id` | — | Image detail with a temporary download `url`. |
| DELETE | `/images/:id` | — | Delete one image. |
| DELETE | `/images/bulk` | `body: { image_ids: [...] }` | Delete many images at once. |
| PATCH | `/images/split` | `body: { image_ids: [...], split }` | Bulk-assign `TRAIN` / `VAL` / `TEST` split. |
| PATCH | `/images/:id/split` | `body: { split }` | Assign split for one image. |
| GET | `/images/:id/labels` | — | Get labels on an image across all label sets. |

### Label sets

| Method | Path | Body / Params | Purpose |
|---|---|---|---|
| POST | `/label-sets` | `body: { projectId, name, type }` | Alternative to `POST /projects/:id/labelsets`. |
| GET | `/label-sets/:id` | — | Label set detail (does **not** include label count — use `GET /projects/:id/labelsets` for that). |
| PATCH | `/label-sets/:id` | `body: { name?, classes? }` | Update label set name or class list. |
| DELETE | `/label-sets/:id` | — | Delete a label set. |
| GET | `/label-sets/:id/training` | — | List training jobs for this label set. |
| POST | `/label-sets/:id/training` | `body: { name?, model_size?, epochs?, batch_size?, image_size?, background_set_ids?, augmentation? }` | Create a (pending) training job. Defaults are sensible; omit unless the user requests specifics. |
| GET | `/label-sets/:id/training-dataset` | — | Preview the dataset that would be sent to training (useful for debugging). |

### Training

| Method | Path | Body / Params | Purpose |
|---|---|---|---|
| GET | `/training/:id` | — | Job detail. For active jobs, this also polls the training backend and updates status. |
| POST | `/training/:id/start` | — | Start a pending or failed job. Consumes tokens. |
| GET | `/training/:id/download` | `?format=onnx` or `?format=pth` | **Redirect (302) to a short-lived download URL.** No envelope. Follow with `curl -L`. |
| POST | `/training/:id/infer/:imageId` | `body: { conf_thre? }` (default 0.25) | Synchronous inference on a single project image. Returns predictions. |
| POST | `/training/:id/infer-all` | `body: { split? }` (`TRAIN` \| `VAL` \| `TEST` \| `ALL`, default `ALL`) | Start batch inference across images. Async — poll status. |
| GET | `/training/:id/infer-all/status` | `?inference_job_id=...` | Check batch inference progress (`running` \| `completed` \| `failed`). |
| GET | `/training/:id/infer-all/results` | `?inference_job_id=...&min_conf=0.25` | Fetch per-image predictions once status is `completed`. |
| GET | `/training/:id/evaluate/results` | `?inference_job_id=...` | Fetch evaluation metrics produced by a batch inference run. |
| GET | `/training/:id/inference-jobs` | — | List past batch-inference jobs for this training. |

### Background sets

| Method | Path | Body / Params | Purpose |
|---|---|---|---|
| GET | `/background-sets/:id/images` | — | List images in a background set. |
| POST | `/background-sets/:id/images` | `multipart: files + thumbnails (optional)` | Upload images to a background set (same format as project uploads; field names are `files` / `thumbnails` without `[]`, thumbnails auto-generated if omitted). |
| DELETE | `/background-sets/:id/images/:imageId` | — | Remove one image from a background set. |
| DELETE | `/background-sets/:id` | — | Delete the entire background set. |

### Tokens

Tokens are personal — every endpoint below reports the calling user's pool. Training inside a shared workspace bills the workspace owner's pool, so members see only their own balance here. There is no per-workspace token endpoint.

| Method | Path | Body / Params | Purpose |
|---|---|---|---|
| GET | `/tokens/summary` | — | `{ monthly, purchased, totalAvailable, renewal }`. Check before starting training. |
| GET | `/tokens/monthly` | — | Monthly allowance detail (remaining, period end). `0` if the user has no active monthly subscription. |
| GET | `/tokens/purchased` | — | Purchased (bundle) token balance. Bundles never expire. |
| GET | `/tokens/history` | `?limit=50&offset=0` | Transaction ledger (purchases, usage, refunds). Each entry includes `actor` — for usage spent from a shared workspace, this is the member who triggered the run. |

---

## 5. Error playbook

Every 4xx/5xx response uses the envelope from §2 with a stable `error.code`. **Branch on the code, never on the message.** Below is every code the API emits, grouped by situation, with the HTTP status you'll normally see and the action to take.

General rules:

- On any error, log `requestId` along with the code so it can be traced.
- Retry only the codes marked "retryable" below. Everything else is a permanent failure — stop and report to the user.
- `error.details` is optional and code-specific. When present, it is structured (e.g. `{ fields: [...] }` for validation errors, `{ retryAfterSeconds }` for rate limits). Read it; don't try to parse the human message.

### Auth / authorization

| Code | Status | Meaning | What to do |
|---|---|---|---|
| `UNAUTHENTICATED` | 401 | No credentials sent, or they could not be parsed. | Stop. Tell the user the key is missing from the request. Do not retry. |
| `API_KEY_INVALID` | 401 | The bearer token is not a valid API key. | Stop. Tell the user their key is wrong; they should generate a new one from the account page. |
| `API_KEY_REVOKED` | 401 | The key was valid but has been revoked by the user. | Stop. Tell the user to create a new key. |
| `API_KEY_EXPIRED` | 401 | The key had an `expiresAt` in the past. | Stop. Tell the user to create a new key (optionally without an expiry). |
| `COOKIE_SESSION_REQUIRED` | 403 | You tried something that has to be done by a signed-in user on the website rather than via API. | Stop. Tell the user the action has to be completed at `https://api.mlpractitioners.com/account` (or `https://www.mlpractitioners.com` for sign-in). |
| `SUBSCRIPTION_REQUIRED` | 403 | The user's subscription is inactive or expired. | Stop. Tell the user to renew at `https://www.mlpractitioners.com/pricing.html` before retrying. |
| `FORBIDDEN` | 403 | The resource exists but is not owned by this user. | Stop. You almost certainly have a stale or wrong ID. Re-list the parent resource and pick a valid ID. |

### Rate limits

| Code | Status | Meaning | What to do |
|---|---|---|---|
| `RATE_LIMITED` | 429 | You exceeded 60 req/min (burst 120). `error.details.retryAfterSeconds` and the `Retry-After` header both give the wait. | **Retryable.** Sleep `retryAfterSeconds` and retry the same call once. If it fails again, back off 2× and try once more. If still failing, stop and report. |

### Validation / input

| Code | Status | Meaning | What to do |
|---|---|---|---|
| `VALIDATION_FAILED` | 400 | A field is missing, empty, or has an unknown value. `error.details.fields` (when present) lists each bad field and its issue. | Read `details.fields`, fix your payload, retry. If the user supplied the bad value, ask them for a correction instead of guessing. |
| `PAYLOAD_TOO_LARGE` | 413 | Body exceeded the 100 MB JSON limit, or a single file exceeded 50 MB. | Split into smaller batches. For multipart uploads: keep ≤ 20 files and ≤ 50 MB each. |
| `UNSUPPORTED_MEDIA_TYPE` | 415 | You sent the wrong `Content-Type`, or a file format the platform does not accept. | Use `application/json` for JSON endpoints and `multipart/form-data` for image uploads. Only JPEG, PNG, and WebP images are accepted. |

### Not found (404)

| Code | Status | Meaning | What to do |
|---|---|---|---|
| `NOT_FOUND` | 404 | Generic unknown route. | Check path spelling against §4. |
| `PROJECT_NOT_FOUND` | 404 | No project by that ID for this user (or deleted). | Re-fetch `GET /projects` and use a current ID. |
| `IMAGE_NOT_FOUND` | 404 | Image ID not in the expected project, or deleted. | Re-fetch `GET /projects/:id/images` and retry. |
| `LABEL_SET_NOT_FOUND` | 404 | Label set ID invalid or deleted. | Re-fetch `GET /projects/:id/labelsets`. |
| `BACKGROUND_SET_NOT_FOUND` | 404 | Background set ID invalid or deleted. | Re-fetch `GET /projects/:id/background-sets`. |
| `TRAINING_NOT_FOUND` | 404 | Training ID invalid or not owned. | Re-fetch `GET /label-sets/:id/training`. |
| `INFERENCE_JOB_NOT_FOUND` | 404 | Inference job ID invalid. | Re-fetch `GET /training/:id/inference-jobs`. |
| `USER_NOT_FOUND` | 404 | The user record backing your key was removed. | Stop. Tell the user to re-register or contact support with the `requestId`. |

### Conflicts (409)

| Code | Status | Meaning | What to do |
|---|---|---|---|
| `PROJECT_NAME_TAKEN` | 409 | The user already has a project with this name. | Suggest a variant (`warehouse-pallets-2`) or ask the user for a different name. |
| `LABEL_SET_NAME_TAKEN` | 409 | Duplicate label set name within a project. | Same: suggest a variant or ask the user. |
| `BACKGROUND_SET_NAME_TAKEN` | 409 | Duplicate background set name within a project. | Same. |
| `BATCH_INFERENCE_ALREADY_RUNNING` | 409 | You tried to start a batch inference while another one for the same training job is still running. | Wait. Poll `GET /training/:id/infer-all/status` until it reaches `completed` or `failed`, then retry. |
| `ALREADY_A_MEMBER` | 409 | You tried to invite or accept an invite for a user who already belongs to the workspace. | Stop. The user is already in. List members with `GET /workspaces/:id/members` to confirm. |
| `INVITE_ALREADY_PENDING` | 409 | You tried to invite an email that already has a pending invite for this workspace. | Stop or cancel the existing invite first via `DELETE /workspaces/:id/invites/:inviteId`, then re-invite. |

### Business / state

| Code | Status | Meaning | What to do |
|---|---|---|---|
| `INSUFFICIENT_TOKENS` | 400 | The pool that pays for this run is empty. For training in a shared workspace that is the workspace owner's pool, not the caller's. | Stop. If the user owns the workspace, tell them to buy a bundle at `https://www.mlpractitioners.com/pricing.html` or wait for monthly renewal. If they're a member of someone else's workspace, the owner needs to top up. |
| `INVALID_TRAINING_STATE` | 400 | You called `/start` on a job that is already running or already finished, or `/download` on a job that is not `completed`. | Call `GET /training/:id` to see real state and branch: if `completed`, download; if `training`, keep polling; if `failed`, surface the `error` field to the user. |
| `NO_IMAGES_FOR_INFERENCE` | 400 | You asked for batch inference on a split (e.g. `VAL`) that has no images. | Check image splits via `GET /projects/:id/images` and choose a split that is populated, or set `split: "ALL"`. |
| `TRAINING_NOT_FINISHED` | 400 or 404 | You tried to download the model or run inference before the training job reached `completed`. | Poll `GET /training/:id` until `status === "completed"`, then retry. |

### Infrastructure (retry with care)

| Code | Status | Meaning | What to do |
|---|---|---|---|
| `UPSTREAM_ERROR` | 500 | The training / inference backend is unreachable or errored. | **Retryable once** after 30 s. On a second failure, stop and report the `requestId` — the user may need to retry later. |
| `UPLOAD_FAILED` | 500 | An image upload could not be stored. | **Retryable once** immediately. If it fails again, drop that file from the batch and report to the user. |
| `INTERNAL` | 500 | Unhandled server error. | **Not retryable automatically.** Stop and report the `requestId` — this likely needs a fix on the server. |

### Handling `207 Multi-Status`

Batch image uploads succeed even if some files fail. Response is `ok: true` with both `data.results` (succeeded) and `data.errors` (failed). See §3 step 3. It is not an error — do not back off on it. Just retry the failed filenames.

### Download endpoint errors

`GET /training/:id/download` does not use the envelope on success (it returns a 302 redirect). On failure it still sends an envelope. Watch for:

- `401` or `403` body: re-check auth / ownership.
- `404` `TRAINING_NOT_FOUND`: the ID is wrong — re-fetch `GET /label-sets/:id/training`.
- `404` `TRAINING_NOT_FINISHED`: keep polling `GET /training/:id` until `completed`.
- `404` `NOT_FOUND` with a message about the model file: rare — the artifact isn't in storage yet. Wait 30 s and retry once; then stop and report the `requestId`.
- Any `5xx`: retry once after 10 s; if still failing, stop.

---

## 6. Limits and platform behavior

Read this before starting a workflow so you don't surprise the user mid-run.

### Hard limits (enforced by the server)

| Limit | Value | Enforced by |
|---|---|---|
| Requests per API key | 60 / minute, burst 120 | `429 RATE_LIMITED` with `Retry-After` |
| JSON body size | 100 MB | `413 PAYLOAD_TOO_LARGE` |
| Image file size | 50 MB each | `413 PAYLOAD_TOO_LARGE` |
| Files per multipart request | 20 | Server drops extras |
| Image formats | JPEG, PNG, WebP | `415 UNSUPPORTED_MEDIA_TYPE` |
| Training job runtime (poll window) | ~2 hours typical upper bound | Surface to user if exceeded; see §3 step 8 |

### Account, billing, and key management

Account profile, subscription, and API-key creation / revocation are
managed by the user on the website at `https://api.mlpractitioners.com/account`. If the user
asks an agent to perform any of those actions, stop and direct them
there. If the user needs to sign in or sign up, point them at
`https://www.mlpractitioners.com`.

### Labels are drawn in the UI

Labeling is a visual task. There is no API to create, edit, or delete
labels — when a label set needs labeling, hand off to the user in the
labeler UI (§3 step 5) and poll `GET /projects/:id/labelsets` for
progress.

### Workspace-related rules

- **The Personal workspace is full-parity with shared workspaces.** It can have members invited to it and can be deleted like any other workspace. If deleted, an empty Personal workspace (id === userId) is recreated on the user's next sign-in via `ensurePersonalWorkspace`.
- **Owners can't leave their own workspace.** `DELETE /workspaces/:id/members/:userId` on the owner returns `400 VALIDATION_FAILED`. Delete the workspace instead.
- **Invitees must already have an account** on the platform — there's no signup-by-invite. `POST /workspaces/:id/invites` returns `404 USER_NOT_FOUND` if the email isn't on file.

### Not available yet

The following are planned but not shipped. Don't reference them to the user:

- **Active learning / suggested labels.** An active-learning endpoint is on the roadmap for suggesting which unlabeled images to label next. Until it lands, labeling is entirely manual.
- **Task types beyond object detection.** `segmentation`, `classification`, `pose`, and `obb` are accepted as label-set types today, but the training pipeline in this guide is only verified for `object_detection`. For other task types, check back in the future or ask the user to wait.

### Determinism / idempotency

The API is **not** idempotent by default. In particular:

- `POST /projects/:id/images` with the same filename will either fail (`UPLOAD_FAILED` from the uniqueness constraint on `projectId + filename`) or upsert — do not assume you can replay it safely after a partial failure. Use the 207 response's `errors` array, not a blind retry.
- `POST /training/:id/start` on an already-started job returns `INVALID_TRAINING_STATE`. Always check current state with `GET /training/:id` before retrying a start.
- `POST /label-sets/:id/training` creates a new training job each time — replaying the same call will pile up pending jobs and eventually consume tokens. Store the returned `TRAINING_ID` on the first success.