Execution & SLO

Submitting a run

From the editor, Run navigates to /workflows/{slug}/runs/new. The page reads the workflow’s latest version (GET /workflows/{slug}) and renders one AutoForm per input node — the graph itself lists every required value. Struct-typed inputs expand into nested field groups rendered by the same AutoForm widget.

On submit the values are keyed by InputNode node_key and POSTed to /workflows/{slug}/runs. The page then redirects to /workflow-runs/{id} for live polling.

curl -X POST https://api.mecapy.com/workflows/bolted-joint/runs \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "bolt_input": {
        "d": 12,
        "p": 1.5,
        "As": 84.3,
        "Re_min": 640,
        "quality_class": "8.8"
      },
      "assembly_input": {
        "dh": 18,
        "do": 13,
        "Rc": 210,
        "mu_p_min": 0.1
      },
      "loads_input": { "FA_max": 15000, "Ft_max": 3500 },
      "tightening_input": {
        "mu_tot_min": 0.08, "mu_tot_max": 0.14, "precision_class": "A"
      }
    }
  }'

The response carries the freshly-frozen execution plan and the initial node_states — every function node in pending. Input nodes do not appear in node_states: they are pre-resolved at submission and injected directly via input_node bindings on each downstream step. No worker jobs have been submitted yet: the run starts advancing on the first /tick.

The submitted values are echoed back on run.inputs so the detail page can render an “Submitted inputs” card without a separate lookup.

The tick loop

A run is a state machine driven by one HTTP endpoint:

POST /workflow-runs/{id}/tick   →   WorkflowRunResponse

One call performs three phases atomically (one DB commit):

Reconcile running nodes — poll Redis for every node currently running. Write outputs on success, record errors on failure.
Submit ready nodes — every pending function node whose upstream function dependencies are all success has its inputs resolved and is pushed to the worker queue. Input nodes do not count as dependencies — they are always ready.
Finalise — if a terminal condition is reached, update the run-level status (completed / failed / cancelled).

The endpoint is idempotent — calling it with no external progress (no job has completed in Redis since the last call) simply returns the same state.

Who calls /tick?

The platform deliberately does not run a background orchestrator. The runtime needed to drive a run comes from one of:

The frontend run page — polls every 2 s while the run’s status is pending or running, stops on terminal.
A cron or any periodic job — for unattended runs.
The tests — pytest mocks Redis and drives ticks synchronously.

Response shape

WorkflowRunResponse returned by /tick, GET /workflow-runs/{id} and POST /runs:

{
  "id": "01HQXZ…",
  "workflow_version_id": "01HQXY…",
  "status": "running",
  "started_at": "2026-04-22T10:15:00Z",
  "completed_at": null,
  "inputs": {
    "bolt_input": { "d": 12, "p": 1.5, "As": 84.3, … },
    "loads_input": { "FA_max": 15000, "Ft_max": 3500 }
  },
  "terminal_outputs": null,
  "error_message": null,
  "first_failed_node_key": null,
  "plan_snapshot": { "chains": […], "waves": [["chain-0"], ["chain-1"]] },
  "node_states": {
    "check":        { "status": "running", "job_id": "job-42", … },
    "min_preload":  { "status": "pending" }
  }
}

The single-endpoint polling guarantee (plan + node states + inputs in the same response) keeps the frontend DAG view in sync with one HTTP call per tick.

Input bindings in `plan_snapshot`

Each step’s input_bindings map every declared function input port to one of three source types:

{"source": "input_node", "input_key": "bolt_input", "from_port": "value"} — read run.inputs["bolt_input"] verbatim (primitive, list, or struct alike).
{"source": "edge", "from_node_key": "X", "from_port": "y"} — read node_states["X"].outputs["y"]. Used between chains.
{"source": "chain", "from_node_key": "X", "from_port": "y"} — same as edge but the source lives in the same chain (fusion hint for v1.5 worker chain-exec).

There is no “free” source anymore — strict mode guarantees every port is covered.

Fail-fast semantics (FRO-wkf-07)

When a function node fails:

Every pending function node downstream of the failure is marked cancelled — no new job is submitted for them.
Running nodes on independent branches keep running to completion — the spec deliberately lets parallel work finish to avoid wasted compute.
The run itself flips to failed with first_failed_node_key set.

   A (failed)
   │
   ├── B (cancelled)        ← was pending, nothing submitted
   │
   └── C (cancelled)        ← was pending
       │
   X ────── E (running)     ← independent branch keeps going
   │
   └── F (cancelled after C)

Cost model — the cold-start caveat

Each function node pays one Python cold start when submitted. Empirically on Scaleway Serverless containers: ~6 s for a noop, ~9 s end-to-end (FRO-perf-01). A naive four-node chain would therefore take ~24 s minimum — which would make workflows dead-on-arrival for real use.

Input nodes add zero cold-start cost — they are resolved in-process during submission.

v1 mitigates this with one optimisation already wired in:

Parallelism

Independent branches are submitted simultaneously on the first /tick after they become ready (FRO-wkf-08). The platform does not serialise on arbitrary work order — a diamond DAG A → {B,C} → D runs B and C in parallel, D starts after both finish.

Terminal outputs

A workflow run’s “result” is the outputs of terminal function nodes — function nodes that no other node consumes. Input nodes are never terminal. For the diamond above, D’s outputs are the run’s terminal_outputs. Multiple sinks produce multiple terminal output groups keyed by node_key:

{
  "terminal_outputs": {
    "safety_check": { "verdict": "OK", "margin": 2.3 },
    "bom_export":   { "items": […] }
  }
}

Intermediate node outputs remain inspectable in node_states[key].outputs but are not aggregated into terminal_outputs — the contract is explicit leaves only.

Observability

During a run:

GET /workflow-runs/{id} — single fetch, same shape as /tick without advancing state.
GET /workflow-runs/{id}/plan — the frozen execution plan and current node states (audit / replay).
Run detail page — visual DAG at /workflow-runs/{id}, coloured by status (pending grey, running blue pulsing, success green, failed red, cancelled amber). Also displays a “Submitted inputs” card listing run.inputs for audit, and a “Terminal outputs” card on completion.
Runs list — /workflow-runs with status filters for batch triage across all your runs.

Not in v1

Retry policy / exponential back-off on failed nodes.
Partial re-run — continue from a specific node after a fix.
Run cancellation — no DELETE /workflow-runs/{id} endpoint yet.
Worker chain-exec — one cold start per chain (FRO-wkf-12 v1.5).
Warm pool — top-N pre-warmed containers (FRO-wkf-12 v1.5).