datasynth-audit-fsm
YAML-driven audit FSM engine for methodology-based audit trail and artifact generation.
Overview
datasynth-audit-fsm separates audit methodology definition from data generation. Audit workflows are declared as YAML blueprints that describe procedures, state machines, phases, and standards references. A separate generation overlay controls runtime behaviour: revision probabilities, timing distributions, artifact volumes, and anomaly injection rates. This two-layer architecture means the same blueprint can produce a thorough engagement, a rushed engagement, or anything in between by swapping a single overlay file.
The engine walks each procedure’s finite state machine in topological (DAG) order, emitting a deterministic, event-sourced audit trail. Every event carries a UUID generated from a ChaCha8 RNG seed, so identical inputs always produce identical outputs. Alongside the event trail, the engine dispatches step commands to 14 concrete audit generators via the StepDispatcher, producing typed artifacts (engagements, materiality calculations, risk assessments, workpapers, findings, opinions, and more).
Nine built-in blueprints ship with the crate, covering ISA, IIA-GIAS, Big 4 firm methodologies, PCAOB, SOC 2, and regulatory examination workflows. Additional methodology blueprints are available at SyntheticDataBlueprints.
The crate also provides streaming execution (event-by-event emission via callbacks or channels), live anomaly injection into already-generated event logs, and analytics inventories that map audit steps to data requirements and analytical procedures.
Architecture
Blueprint YAML ──► Loader ──► Validation ──► Topological Sort
│
EngagementContext ─────┤
▼
AuditFsmEngine
│
┌──────────────┼──────────────┐
▼ ▼ ▼
FSM Walk StepDispatcher Anomaly Injection
(per procedure) (14 generators) (build-time + live)
│ │ │
▼ ▼ ▼
AuditEvent[] ArtifactBag AnomalyRecord[]
│ │
┌────────┴────────┐ │
▼ ▼ ▼
Flat JSON OCEL 2.0 Orchestrator
Event Trail Projection AuditSnapshot
│
▼
Streaming / Channel
(callback or mpsc)
Blueprints
Nine built-in blueprints are included:
| Blueprint | Framework | Procedures | Phases | Standards |
|---|---|---|---|---|
| Financial Statement Audit (FSA) | ISA | 9 | 3 | 14 ISA |
| Internal Audit (IA) | IIA-GIAS | 34 | 9 | 52 IIA-GIAS |
| KPMG ISA Complete | ISA | 44 | 7 | 37 ISA |
| PwC ISA Complete | ISA | 44 | 7 | 37 ISA |
| Deloitte ISA Complete | ISA | 46 | 7 | 37 ISA |
| EY GAM Lite | ISA | 52 | 7 | 37 ISA |
| SOC 2 Type II | AICPA-TSC | 12 | 3 | 17 AICPA |
| PCAOB Integrated | PCAOB | 14 | 5 | 17 PCAOB AS |
| Regulatory Exam | Regulatory | 15 | 6 | 10 FFIEC/OCC |
Blueprints are loaded via builtin:fsa, builtin:ia, builtin:kpmg, builtin:pwc, builtin:deloitte, builtin:ey_gam_lite, builtin:soc2, builtin:pcaob, builtin:regulatory, or from custom YAML paths.
A blueprint YAML defines:
- methodology – framework identifier, default depth, and description
- discriminators – top-level dimensions (categories, risk_ratings, engagement_types) used to scope procedure execution
- actors – role definitions (engagement_partner, senior_auditor, audit_staff)
- standards – referenced standards with binding level (requirement vs guidance)
- evidence_templates – reusable evidence types that steps can reference
- phases – ordered audit phases containing procedures
- procedures – each with its own FSM aggregate and ordered steps
Additional blueprints are maintained in the SyntheticDataBlueprints repository.
Procedure State Machines
Each procedure defines a ProcedureAggregate with an initial state, valid states, and directed transitions. Transitions carry optional commands, emitted events, and guard predicates.
The standard 4-state lifecycle used by most procedures:
not_started ──begin──► in_progress ──submit──► under_review ──approve──► completed
▲ │
└────────revise───────────┘
The under_review -> in_progress revision loop is governed by the overlay’s revision_probability. The engine bounds self-loops via max_self_loop_iterations (default 5).
The develop_findings procedure in the IA blueprint uses an expanded 8-state C2CE (Condition-Criteria-Cause-Effect) lifecycle:
not_started → in_progress → condition_identified → criteria_mapped →
cause_analyzed → effect_assessed → under_review → completed
Phase Gates and Preconditions
Procedures are executed in topological order determined by a precondition DAG. Kahn’s algorithm produces the execution sequence. Phase gates use all_of conditions (e.g., procedure.risk_assessment.completed) to control phase entry and exit.
Continuous phases have order < 0 and run in parallel with sequential phases. They are never marked as “completed” in the output. The IA blueprint uses this for ethics, governance, and quality assurance phases.
Generation Overlays
An overlay customises how a blueprint is instantiated without modifying the canonical YAML. It controls:
- Transition timing – log-normal delay distribution (mu/sigma in hours)
- Revision probability – chance that a completed step returns to in_progress
- Artifact volumes – workpapers per step, evidence items per workpaper
- Anomaly injection – per-type probabilities (skipped approval, late posting, missing evidence, out of sequence)
- Actor profiles – per-role multipliers for revision rate, evidence volume, and guidance step skipping
- Discriminator filters – restrict which procedures execute based on category dimensions
Three built-in presets:
| Parameter | default | thorough | rushed |
|---|---|---|---|
| Revision probability | 0.15 | 0.30 | 0.05 |
| Timing mu (hours) | 24.0 | 40.0 | 8.0 |
| Timing sigma (hours) | 8.0 | 12.0 | 4.0 |
| Workpapers per step | 1-3 | 2-5 | 1-2 |
| Evidence per workpaper | 2-5 | 4-8 | 1-3 |
| Skipped approval | 0.02 | 0.005 | 0.08 |
| Late posting | 0.05 | 0.02 | 0.15 |
| Missing evidence | 0.03 | 0.01 | 0.10 |
| Out of sequence | 0.01 | 0.002 | 0.05 |
StepDispatcher and Artifact Generation
The StepDispatcher bridges FSM step commands to concrete generators. Each step command is routed to the appropriate generator, and the resulting artifacts are accumulated in an ArtifactBag. Unknown commands fall through to a generic workpaper generator so every step produces at least one artifact.
Key command-to-generator mappings:
| Commands | Generator | Artifact Types |
|---|---|---|
evaluate_client_acceptance, conduct_opening_meeting | AuditEngagementGenerator | AuditEngagement |
agree_engagement_terms, draft_ia_charter | EngagementLetterGenerator | EngagementLetter |
determine_overall_materiality | MaterialityGenerator | MaterialityCalculation |
identify_risks, assess_engagement_risks | RiskAssessmentGenerator | RiskAssessment |
assess_risks, evaluate_control_effectiveness | CraGenerator | CombinedRiskAssessment |
design_test_procedures, design_work_program | WorkpaperGenerator + EvidenceGenerator | Workpaper, AuditEvidence |
perform_tests_of_details, perform_controls_tests | SamplingPlanGenerator | SamplingPlan, SampledItem |
perform_analytical_procedures | AnalyticalProcedureGenerator | AnalyticalProcedureResult |
send_confirmations | ConfirmationGenerator | ExternalConfirmation, ConfirmationResponse |
evaluate_management_assessment | GoingConcernGenerator | GoingConcernAssessment |
perform_subsequent_events_review | SubsequentEventGenerator | SubsequentEvent |
identify_condition, draft_finding | FindingGenerator | AuditFinding |
form_audit_opinion, finalize_audit_report | AuditOpinionGenerator | AuditOpinion, KeyAuditMatter |
For IA blueprints (which lack evaluate_client_acceptance), the dispatcher auto-bootstraps an engagement the first time a substantive command runs.
Streaming Execution
The streaming module enables event-by-event emission during engagement execution, rather than collecting the full event log in memory. Two modes are provided:
- Callback mode (
run_engagement_streaming): accepts anEventCallbackclosure invoked for eachAuditEventas it is produced. - Channel mode (
run_engagement_to_channel): spawns the engine on a background thread and returns anmpsc::Receiver<AuditEvent>plus aJoinHandlefor the fullEngagementResult.
Both modes accept a BlueprintWithPreconditions, a GenerationOverlay, an EngagementContext, and a seed.
#![allow(unused)]
fn main() {
use datasynth_audit_fsm::streaming::run_engagement_streaming;
use datasynth_audit_fsm::loader::{BlueprintWithPreconditions, default_overlay};
use datasynth_audit_fsm::context::EngagementContext;
let bwp = BlueprintWithPreconditions::load_builtin_fsa().unwrap();
let overlay = default_overlay();
let ctx = EngagementContext::demo();
let result = run_engagement_streaming(
&bwp, &overlay, &ctx, 42,
Box::new(|event| { /* forward to WebSocket, dashboard, etc. */ }),
).unwrap();
}
Live Anomaly Injection
The live_injection module injects anomalies into an already-generated event log, simulating emerging risks at runtime rather than only at generation time. Each LiveInjectionConfig specifies an anomaly type, an optional target procedure filter, an injection probability, and a severity level.
#![allow(unused)]
fn main() {
use datasynth_audit_fsm::live_injection::{inject_live_anomalies, LiveInjectionConfig};
use datasynth_audit_fsm::event::{AuditAnomalyType, AnomalySeverity};
let configs = vec![LiveInjectionConfig {
anomaly_type: AuditAnomalyType::LatePosting,
target_procedure: Some("substantive_testing".into()),
injection_probability: 0.10,
severity: AnomalySeverity::Medium,
}];
// `result` is an EngagementResult from a prior engine run
let injected = inject_live_anomalies(&mut result.event_log, &configs, 99);
}
Events already flagged as anomalous by the engine’s build-time injection are skipped to avoid double-labeling.
Analytics Inventory
The analytics_inventory module provides data requirement and analytical procedure mappings for each audit step. Five inventories are embedded at compile time:
| Inventory | Framework | Loader Function |
|---|---|---|
| FSA | ISA | load_fsa_inventory() |
| IA | IIA-GIAS | load_ia_inventory() |
| SOC 2 | AICPA-TSC | load_soc2_inventory() |
| PCAOB | PCAOB AS | load_pcaob_inventory() |
| Regulatory | FFIEC/OCC | load_regulatory_inventory() |
Each step entry (StepInventory) contains data requirements (input data sources, fields, scope) and analytical procedures (technique, data points, thresholds). A convenience function load_inventory_for_framework(framework) dispatches to the appropriate loader based on the blueprint’s framework string.
Event Trail
Each event in the trail captures a state transition or procedure step:
#![allow(unused)]
fn main() {
pub struct AuditEvent {
pub event_id: Uuid,
pub timestamp: NaiveDateTime,
pub event_type: String, // "state_transition" or "procedure_step"
pub procedure_id: String,
pub step_id: Option<String>,
pub phase_id: String,
pub from_state: Option<String>,
pub to_state: Option<String>,
pub actor_id: String,
pub command: String,
pub evidence_refs: Vec<String>,
pub standards_refs: Vec<String>,
pub is_anomaly: bool,
pub anomaly_type: Option<AuditAnomalyType>,
}
}
Events are exported as flat JSON via export_events_to_json() and can be projected to OCEL 2.0 format via project_to_ocel() for use with process mining tools (PM4Py, Celonis, OCPA).
Configuration
audit:
enabled: true
fsm:
enabled: true
blueprint: builtin:fsa # builtin:fsa, builtin:ia, builtin:kpmg, builtin:pwc,
# builtin:deloitte, builtin:ey_gam_lite, builtin:soc2,
# builtin:pcaob, builtin:regulatory, or path to custom YAML
overlay: builtin:default # builtin:default, builtin:thorough, builtin:rushed
Example Output
Sample events from an FSA engagement event trail:
[
{
"event_id": "a1b2c3d4-...",
"timestamp": "2025-01-15T09:00:00",
"event_type": "state_transition",
"procedure_id": "accept_engagement",
"phase_id": "planning",
"from_state": "not_started",
"to_state": "in_progress",
"actor_id": "engagement_partner",
"command": "evaluate_client_acceptance",
"is_anomaly": false
},
{
"event_id": "e5f6g7h8-...",
"timestamp": "2025-01-15T10:23:00",
"event_type": "procedure_step",
"procedure_id": "accept_engagement",
"step_id": "evaluate_acceptance",
"phase_id": "planning",
"actor_id": "engagement_partner",
"command": "evaluate_client_acceptance",
"evidence_refs": ["wp_client_assessment"],
"standards_refs": ["ISA-220"],
"is_anomaly": false
}
]
With the default overlay, the FSA blueprint produces 51 events and 1,916 artifacts. The IA blueprint produces 368 events and 1,891 artifacts. The Big 4 and domain-specific blueprints produce correspondingly larger event trails and artifact sets.
Key Types
#![allow(unused)]
fn main() {
// Blueprint root
pub struct AuditBlueprint {
pub id: String,
pub methodology: BlueprintMethodology,
pub discriminators: HashMap<String, Vec<String>>,
pub actors: Vec<BlueprintActor>,
pub phases: Vec<BlueprintPhase>,
}
// FSM engine
pub struct AuditFsmEngine { /* blueprint, overlay, rng, preconditions, dispatcher */ }
// Engagement output
pub struct EngagementResult {
pub event_log: Vec<AuditEvent>,
pub procedure_states: HashMap<String, String>,
pub anomalies: Vec<AuditAnomalyRecord>,
pub artifacts: ArtifactBag,
pub total_duration_hours: f64,
}
// Artifact accumulator (20 typed collections)
pub struct ArtifactBag {
pub engagements: Vec<AuditEngagement>,
pub materiality_calculations: Vec<MaterialityCalculation>,
pub risk_assessments: Vec<RiskAssessment>,
pub workpapers: Vec<Workpaper>,
pub findings: Vec<AuditFinding>,
pub audit_opinions: Vec<AuditOpinion>,
// ... 14 more artifact types
}
}