Skip to main content

Persistence and State

BtScript flows maintain runtime state in a single, consistent data structure that's persisted to durable storage. Understanding how state is stored and when it's persisted is critical for building reliable, performant flows.

FlowState: The Single Source of Truth

Every flow instance maintains a FlowState object backed by Orleans IPersistentState<FlowState>. This state contains:

  • Inputs (_inputs): Dictionary mapping input names to their most recent value and timestamp. Updated on every message arrival.
  • Rolling Buffers (_rollingBuffers): Time-windowed data for aggregations like rolling-avg, rolling-min, rolling-max, rolling-sum. Each buffer stores (value, timestamp) pairs and automatically prunes old entries.
  • Gates (_gates): Synchronization state tracking which required inputs have arrived for each gate.
  • Key-Value Store (_values): Arbitrary user-defined state accessible via flow expressions.
  • Persistence Metadata: IsDirty flag and LastPersisted timestamp for tracking write operations.

All state is serialized to durable storage via Orleans state management. The flow engine handles persistence automatically based on the configured mode.

Persistence Modes

The persist: flow attribute controls when FlowState is written to durable storage. Choose the mode that matches your throughput and durability requirements.

ModeWhen PersistedTrade-offUse Case
syncAfter each Execute(), blocks until writtenLowest throughput, zero data lossCritical state, financial calculations
asyncAfter each Execute(), fire-and-forgetHigher throughput, small loss windowImportant state with performance needs
timerEvery N seconds (default 5s)High throughput, configurable loss windowMost production flows
on-deactivateOnly on grain deactivationHighest throughput, data loss on crashHigh-frequency non-critical flows
noneNeverMaximum throughput, all state lost on deactivationPure stateless computations

Sync Mode

(flow id: payment-reconciliation
persist: sync

(inputs
(transaction type: double signal: "pos.transaction")
(payment type: double signal: "pos.payment"))

(gate zip: transaction payment)

(trigger on-any: transaction payment)

(let ((balance (- transaction payment)))
(emit balance-delta value: balance)))

Every input triggers an Execute() call, and the flow blocks until the state is written to storage. Use this only when data loss is completely unacceptable, such as financial calculations or regulatory compliance scenarios.

Throughput impact: 5-10× slower than timer mode due to blocking write operations.

Async Mode

(flow id: critical-alert
persist: async

(inputs
(temperature type: double signal: "sensor.temperature")
(pressure type: double signal: "sensor.pressure"))

(trigger on-any: temperature)

(when (> temperature 100)
(emit overheat-alert channel: alarm value: temperature)))

State is persisted after each Execute(), but the write is fire-and-forget. The flow doesn't wait for confirmation. Offers a middle ground between durability and throughput.

Loss window: Typically under 100ms. If the grain crashes between Execute() and the async write completing, recent state changes are lost.

Timer Mode (Default)

(flow id: equipment-monitor
persist: timer

(inputs
(vibration type: double signal: "motor.vibration")
(temperature type: double signal: "motor.temperature")
(rpm type: double signal: "motor.rpm"))

(trigger on-any: vibration temperature rpm)

(rolling-avg window: PT5M input: vibration as: vib-avg)
(when (> vib-avg 50)
(emit high-vibration value: vib-avg)))

State is persisted on a background timer (default: every 5 seconds). This is the recommended mode for most production flows. Provides excellent throughput while limiting data loss to a small time window.

Trade-off table:

PersistIntervalWrites/minMax data loss on crash
1 second60~2 seconds of inputs
5 seconds (default)12~6 seconds of inputs
30 seconds2~31 seconds of inputs

For typical industrial telemetry with 1-15 second sample rates, the 5-second default balances write frequency against data loss risk. A crash loses at most 6 seconds of state, which means the rolling buffers might lose 1-6 input samples depending on input frequency.

On-Deactivate Mode

(flow id: metrics-aggregator
persist: on-deactivate

(inputs
(metric type: double signal: "device.metric"))

(trigger on-any: metric)

(rolling-sum window: PT1H input: metric as: total)
(emit hourly-total value: total))

State is only written when Orleans deactivates the grain due to memory pressure or idle timeout. Offers maximum throughput for high-frequency flows where losing a few seconds of state on crash is acceptable.

Loss window: From the last deactivation until crash. Could be minutes or hours depending on flow activity and cluster load.

None Mode

(flow id: unit-converter
persist: none

(inputs
(celsius type: double signal: "sensor.temperature"))

(trigger on-any: celsius)

(let ((fahrenheit (+ (* celsius 1.8) 32)))
(emit temp-f value: fahrenheit)))

State is never persisted. Use this for pure stateless transformations that don't use rolling aggregations, gates, or the key-value store. Maximum throughput, but any state is lost on deactivation.

Warning: If you use rolling-avg, rolling-min, rolling-max, rolling-sum, or gates with persist: none, those features will reset on every grain deactivation.

Rolling Window Aggregations

BtScript's rolling aggregations (rolling-avg, rolling-min, rolling-max, rolling-sum) maintain time-windowed buffers in FlowState. Each aggregation stores a list of (value, timestamp) pairs and automatically prunes entries older than the window duration.

(flow id: sensor-smoothing
persist: timer

(inputs
(sensor-reading type: double signal: "sensor.value"))

(trigger on-any: sensor-reading)

(rolling-avg window: PT5M input: sensor-reading as: smoothed)
(rolling-max window: PT1H input: sensor-reading as: peak-hour)
(emit smoothed-value value: smoothed)
(emit peak-value value: peak-hour))

On each Execute():

  1. New input value is appended to the buffer with current timestamp
  2. Old entries beyond the window are removed
  3. Aggregation is computed over remaining entries
  4. Buffer state is persisted according to persist: mode

Buffer Memory Usage

Each buffer entry is ~16 bytes (8-byte double + 8-byte timestamp). A 1-hour window receiving 1 input per second stores 3,600 entries ≈ 56 KB. Most flows have 1-5 rolling aggregations, so memory usage is negligible.

Buffer Persistence Impact

Rolling buffers are serialized as part of FlowState. Larger windows (hours or days) increase persistence payload size. For very large windows (>1 hour), consider:

  • Using timer mode with 10-30 second intervals instead of async/sync
  • Downsampling high-frequency inputs before aggregation
  • Storing aggregations in the key-value store instead of rolling buffers

Gates and Input Synchronization

Gates use FlowState to track which required inputs have arrived. When a flow has multiple inputs and uses (gate zip: ...), the gate's state (which inputs are ready) is persisted along with the input values.

(flow id: join-readings
persist: timer

(inputs
(temperature type: double signal: "sensor.temperature")
(pressure type: double signal: "sensor.pressure")
(humidity type: double signal: "sensor.humidity"))

(gate zip: temperature pressure humidity)

(trigger on-any: temperature pressure humidity)

(emit sensor-snapshot
value: (new Snapshot
temp: temperature
pres: pressure
hum: humidity)))

Gate state ensures that after a crash and recovery, the flow doesn't incorrectly fire the gate body. If the flow receives temperature and pressure, crashes before receiving humidity, then restarts—the gate will correctly wait for humidity before emitting.

Decision Guide

Start with persist: timer for most flows. It's the default for good reason: excellent throughput with minimal data loss risk.

Choose sync when:

  • Data loss is completely unacceptable (financial, compliance, critical safety systems)
  • Throughput is low (fewer than 10 inputs/sec per flow instance)
  • You can tolerate 5-10× throughput reduction

Choose async when:

  • Data loss is acceptable but should be minimized
  • Throughput is moderate (10-100 inputs/sec per flow instance)
  • You need faster response than sync but more durability than timer

Choose timer when:

  • Inputs arrive every 1-15 seconds (typical IoT telemetry)
  • Losing a few seconds of state on crash is acceptable
  • You need high throughput with good durability

Choose on-deactivate when:

  • Inputs arrive very frequently (>100/sec per flow instance)
  • Losing minutes of state on crash is acceptable
  • You're aggregating non-critical metrics

Choose none when:

  • Flow is purely stateless (no rolling aggregations, gates, or custom state)
  • You need absolute maximum throughput
  • State loss on deactivation is irrelevant

Key-Value Store

Flows can store arbitrary values in FlowState using the key-value store. These values are persisted according to the flow's persist: mode, just like inputs, gates, and rolling buffers.

Use cases:

  • Counters for events
  • Flags for state machines
  • Last-known-good values for error handling
  • Configuration overrides per flow instance

Available in C# flows via state.GetValue<T>(key)/state.SetValue(key, value). Not yet exposed in BtScript syntax.

Monitoring Persistence

The flow service exposes metrics for monitoring persistence behavior:

  • flow_persist_writes_total: Counter of persistence operations by mode
  • flow_persist_duration_seconds: Histogram of write latency
  • flow_persist_errors_total: Counter of write failures
  • flow_state_size_bytes: Gauge of serialized state size per flow instance

Monitor these metrics to detect:

  • Excessive write latency (storage bottleneck)
  • High error rates (storage unavailability)
  • State size growth (unbounded buffers, memory leaks)

Summary

  • FlowState holds all runtime state: inputs, rolling buffers, gates, key-value store
  • persist: mode controls when state is written to durable storage
  • Default timer mode (5s interval) balances throughput and durability
  • Rolling aggregations store time-windowed buffers persisted with the rest of FlowState
  • Choose persistence mode based on data loss tolerance and throughput requirements