Aller au contenu

Execution & SLO

From the editor, Run navigates to /workflows/{slug}/runs/new. The page reads the workflow’s latest version (GET /workflows/{slug}) and renders one AutoForm per input node — the graph itself lists every required value. Struct-typed inputs expand into nested field groups rendered by the same AutoForm widget.

On submit the values are keyed by InputNode node_key and POSTed to /workflows/{slug}/runs. The page then redirects to /workflow-runs/{id} for live polling.

Fenêtre de terminal
curl -X POST https://api.mecapy.com/workflows/bolted-joint/runs \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"inputs": {
"bolt_input": {
"d": 12,
"p": 1.5,
"As": 84.3,
"Re_min": 640,
"quality_class": "8.8"
},
"assembly_input": {
"dh": 18,
"do": 13,
"Rc": 210,
"mu_p_min": 0.1
},
"loads_input": { "FA_max": 15000, "Ft_max": 3500 },
"tightening_input": {
"mu_tot_min": 0.08, "mu_tot_max": 0.14, "precision_class": "A"
}
}
}'

The response carries the freshly-frozen execution plan and the initial node_states — every function node in pending. Input nodes do not appear in node_states: they are pre-resolved at submission and injected directly via input_node bindings on each downstream step. No worker jobs have been submitted yet: the run starts advancing on the first /tick.

The submitted values are echoed back on run.inputs so the detail page can render an “Submitted inputs” card without a separate lookup.

A run is a state machine driven by one HTTP endpoint:

POST /workflow-runs/{id}/tick → WorkflowRunResponse

One call performs three phases atomically (one DB commit):

  1. Reconcile running nodes — poll Redis for every node currently running. Write outputs on success, record errors on failure.
  2. Submit ready nodes — every pending function node whose upstream function dependencies are all success has its inputs resolved and is pushed to the worker queue. Input nodes do not count as dependencies — they are always ready.
  3. Finalise — if a terminal condition is reached, update the run-level status (completed / failed / cancelled).

The endpoint is idempotent — calling it with no external progress (no job has completed in Redis since the last call) simply returns the same state.

The platform deliberately does not run a background orchestrator. The runtime needed to drive a run comes from one of:

  • The frontend run page — polls every 2 s while the run’s status is pending or running, stops on terminal.
  • A cron or any periodic job — for unattended runs.
  • The tests — pytest mocks Redis and drives ticks synchronously.

WorkflowRunResponse returned by /tick, GET /workflow-runs/{id} and POST /runs:

{
"id": "01HQXZ…",
"workflow_version_id": "01HQXY…",
"status": "running",
"started_at": "2026-04-22T10:15:00Z",
"completed_at": null,
"inputs": {
"bolt_input": { "d": 12, "p": 1.5, "As": 84.3, },
"loads_input": { "FA_max": 15000, "Ft_max": 3500 }
},
"terminal_outputs": null,
"error_message": null,
"first_failed_node_key": null,
"plan_snapshot": { "chains": [], "waves": [["chain-0"], ["chain-1"]] },
"node_states": {
"check": { "status": "running", "job_id": "job-42", },
"min_preload": { "status": "pending" }
}
}

The single-endpoint polling guarantee (plan + node states + inputs in the same response) keeps the frontend DAG view in sync with one HTTP call per tick.

Each step’s input_bindings map every declared function input port to one of three source types:

  • {"source": "input_node", "input_key": "bolt_input", "from_port": "value"} — read run.inputs["bolt_input"] verbatim (primitive, list, or struct alike).
  • {"source": "edge", "from_node_key": "X", "from_port": "y"} — read node_states["X"].outputs["y"]. Used between chains.
  • {"source": "chain", "from_node_key": "X", "from_port": "y"} — same as edge but the source lives in the same chain (fusion hint for v1.5 worker chain-exec).

There is no “free” source anymore — strict mode guarantees every port is covered.

When a function node fails:

  • Every pending function node downstream of the failure is marked cancelled — no new job is submitted for them.
  • Running nodes on independent branches keep running to completion — the spec deliberately lets parallel work finish to avoid wasted compute.
  • The run itself flips to failed with first_failed_node_key set.
A (failed)
├── B (cancelled) ← was pending, nothing submitted
└── C (cancelled) ← was pending
X ────── E (running) ← independent branch keeps going
└── F (cancelled after C)

Each function node pays one Python cold start when submitted. Empirically on Scaleway Serverless containers: ~6 s for a noop, ~9 s end-to-end (FRO-perf-01). A naive four-node chain would therefore take ~24 s minimum — which would make workflows dead-on-arrival for real use.

Input nodes add zero cold-start cost — they are resolved in-process during submission.

v1 mitigates this with one optimisation already wired in:

Independent branches are submitted simultaneously on the first /tick after they become ready (FRO-wkf-08). The platform does not serialise on arbitrary work order — a diamond DAG A → {B,C} → D runs B and C in parallel, D starts after both finish.

A workflow run’s “result” is the outputs of terminal function nodes — function nodes that no other node consumes. Input nodes are never terminal. For the diamond above, D’s outputs are the run’s terminal_outputs. Multiple sinks produce multiple terminal output groups keyed by node_key:

{
"terminal_outputs": {
"safety_check": { "verdict": "OK", "margin": 2.3 },
"bom_export": { "items": [] }
}
}

Intermediate node outputs remain inspectable in node_states[key].outputs but are not aggregated into terminal_outputs — the contract is explicit leaves only.

During a run:

  • GET /workflow-runs/{id} — single fetch, same shape as /tick without advancing state.
  • GET /workflow-runs/{id}/plan — the frozen execution plan and current node states (audit / replay).
  • Run detail page — visual DAG at /workflow-runs/{id}, coloured by status (pending grey, running blue pulsing, success green, failed red, cancelled amber). Also displays a “Submitted inputs” card listing run.inputs for audit, and a “Terminal outputs” card on completion.
  • Runs list/workflow-runs with status filters for batch triage across all your runs.
  • Retry policy / exponential back-off on failed nodes.
  • Partial re-run — continue from a specific node after a fix.
  • Run cancellation — no DELETE /workflow-runs/{id} endpoint yet.
  • Worker chain-exec — one cold start per chain (FRO-wkf-12 v1.5).
  • Warm pool — top-N pre-warmed containers (FRO-wkf-12 v1.5).