Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

datasynth-audit-fsm

YAML-driven audit FSM engine for methodology-based audit trail and artifact generation.

Overview

datasynth-audit-fsm separates audit methodology definition from data generation. Audit workflows are declared as YAML blueprints that describe procedures, state machines, phases, and standards references. A separate generation overlay controls runtime behaviour: revision probabilities, timing distributions, artifact volumes, and anomaly injection rates. This two-layer architecture means the same blueprint can produce a thorough engagement, a rushed engagement, or anything in between by swapping a single overlay file.

The engine walks each procedure’s finite state machine in topological (DAG) order, emitting a deterministic, event-sourced audit trail. Every event carries a UUID generated from a ChaCha8 RNG seed, so identical inputs always produce identical outputs. Alongside the event trail, the engine dispatches step commands to 14 concrete audit generators via the StepDispatcher, producing typed artifacts (engagements, materiality calculations, risk assessments, workpapers, findings, opinions, and more).

Nine built-in blueprints ship with the crate, covering ISA, IIA-GIAS, Big 4 firm methodologies, PCAOB, SOC 2, and regulatory examination workflows. Additional methodology blueprints are available at SyntheticDataBlueprints.

The crate also provides streaming execution (event-by-event emission via callbacks or channels), live anomaly injection into already-generated event logs, and analytics inventories that map audit steps to data requirements and analytical procedures.

Architecture

Blueprint YAML ──► Loader ──► Validation ──► Topological Sort
                                                    │
                              EngagementContext ─────┤
                                                    ▼
                                             AuditFsmEngine
                                                    │
                                     ┌──────────────┼──────────────┐
                                     ▼              ▼              ▼
                              FSM Walk       StepDispatcher    Anomaly Injection
                            (per procedure)   (14 generators)   (build-time + live)
                                     │              │              │
                                     ▼              ▼              ▼
                              AuditEvent[]    ArtifactBag    AnomalyRecord[]
                                     │              │
                            ┌────────┴────────┐     │
                            ▼                 ▼     ▼
                       Flat JSON         OCEL 2.0   Orchestrator
                       Event Trail      Projection  AuditSnapshot
                            │
                            ▼
                     Streaming / Channel
                     (callback or mpsc)

Blueprints

Nine built-in blueprints are included:

BlueprintFrameworkProceduresPhasesStandards
Financial Statement Audit (FSA)ISA9314 ISA
Internal Audit (IA)IIA-GIAS34952 IIA-GIAS
KPMG ISA CompleteISA44737 ISA
PwC ISA CompleteISA44737 ISA
Deloitte ISA CompleteISA46737 ISA
EY GAM LiteISA52737 ISA
SOC 2 Type IIAICPA-TSC12317 AICPA
PCAOB IntegratedPCAOB14517 PCAOB AS
Regulatory ExamRegulatory15610 FFIEC/OCC

Blueprints are loaded via builtin:fsa, builtin:ia, builtin:kpmg, builtin:pwc, builtin:deloitte, builtin:ey_gam_lite, builtin:soc2, builtin:pcaob, builtin:regulatory, or from custom YAML paths.

A blueprint YAML defines:

  • methodology – framework identifier, default depth, and description
  • discriminators – top-level dimensions (categories, risk_ratings, engagement_types) used to scope procedure execution
  • actors – role definitions (engagement_partner, senior_auditor, audit_staff)
  • standards – referenced standards with binding level (requirement vs guidance)
  • evidence_templates – reusable evidence types that steps can reference
  • phases – ordered audit phases containing procedures
  • procedures – each with its own FSM aggregate and ordered steps

Additional blueprints are maintained in the SyntheticDataBlueprints repository.

Procedure State Machines

Each procedure defines a ProcedureAggregate with an initial state, valid states, and directed transitions. Transitions carry optional commands, emitted events, and guard predicates.

The standard 4-state lifecycle used by most procedures:

not_started ──begin──► in_progress ──submit──► under_review ──approve──► completed
                            ▲                        │
                            └────────revise───────────┘

The under_review -> in_progress revision loop is governed by the overlay’s revision_probability. The engine bounds self-loops via max_self_loop_iterations (default 5).

The develop_findings procedure in the IA blueprint uses an expanded 8-state C2CE (Condition-Criteria-Cause-Effect) lifecycle:

not_started → in_progress → condition_identified → criteria_mapped →
cause_analyzed → effect_assessed → under_review → completed

Phase Gates and Preconditions

Procedures are executed in topological order determined by a precondition DAG. Kahn’s algorithm produces the execution sequence. Phase gates use all_of conditions (e.g., procedure.risk_assessment.completed) to control phase entry and exit.

Continuous phases have order < 0 and run in parallel with sequential phases. They are never marked as “completed” in the output. The IA blueprint uses this for ethics, governance, and quality assurance phases.

Generation Overlays

An overlay customises how a blueprint is instantiated without modifying the canonical YAML. It controls:

  • Transition timing – log-normal delay distribution (mu/sigma in hours)
  • Revision probability – chance that a completed step returns to in_progress
  • Artifact volumes – workpapers per step, evidence items per workpaper
  • Anomaly injection – per-type probabilities (skipped approval, late posting, missing evidence, out of sequence)
  • Actor profiles – per-role multipliers for revision rate, evidence volume, and guidance step skipping
  • Discriminator filters – restrict which procedures execute based on category dimensions

Three built-in presets:

Parameterdefaultthoroughrushed
Revision probability0.150.300.05
Timing mu (hours)24.040.08.0
Timing sigma (hours)8.012.04.0
Workpapers per step1-32-51-2
Evidence per workpaper2-54-81-3
Skipped approval0.020.0050.08
Late posting0.050.020.15
Missing evidence0.030.010.10
Out of sequence0.010.0020.05

StepDispatcher and Artifact Generation

The StepDispatcher bridges FSM step commands to concrete generators. Each step command is routed to the appropriate generator, and the resulting artifacts are accumulated in an ArtifactBag. Unknown commands fall through to a generic workpaper generator so every step produces at least one artifact.

Key command-to-generator mappings:

CommandsGeneratorArtifact Types
evaluate_client_acceptance, conduct_opening_meetingAuditEngagementGeneratorAuditEngagement
agree_engagement_terms, draft_ia_charterEngagementLetterGeneratorEngagementLetter
determine_overall_materialityMaterialityGeneratorMaterialityCalculation
identify_risks, assess_engagement_risksRiskAssessmentGeneratorRiskAssessment
assess_risks, evaluate_control_effectivenessCraGeneratorCombinedRiskAssessment
design_test_procedures, design_work_programWorkpaperGenerator + EvidenceGeneratorWorkpaper, AuditEvidence
perform_tests_of_details, perform_controls_testsSamplingPlanGeneratorSamplingPlan, SampledItem
perform_analytical_proceduresAnalyticalProcedureGeneratorAnalyticalProcedureResult
send_confirmationsConfirmationGeneratorExternalConfirmation, ConfirmationResponse
evaluate_management_assessmentGoingConcernGeneratorGoingConcernAssessment
perform_subsequent_events_reviewSubsequentEventGeneratorSubsequentEvent
identify_condition, draft_findingFindingGeneratorAuditFinding
form_audit_opinion, finalize_audit_reportAuditOpinionGeneratorAuditOpinion, KeyAuditMatter

For IA blueprints (which lack evaluate_client_acceptance), the dispatcher auto-bootstraps an engagement the first time a substantive command runs.

Streaming Execution

The streaming module enables event-by-event emission during engagement execution, rather than collecting the full event log in memory. Two modes are provided:

  • Callback mode (run_engagement_streaming): accepts an EventCallback closure invoked for each AuditEvent as it is produced.
  • Channel mode (run_engagement_to_channel): spawns the engine on a background thread and returns an mpsc::Receiver<AuditEvent> plus a JoinHandle for the full EngagementResult.

Both modes accept a BlueprintWithPreconditions, a GenerationOverlay, an EngagementContext, and a seed.

#![allow(unused)]
fn main() {
use datasynth_audit_fsm::streaming::run_engagement_streaming;
use datasynth_audit_fsm::loader::{BlueprintWithPreconditions, default_overlay};
use datasynth_audit_fsm::context::EngagementContext;

let bwp = BlueprintWithPreconditions::load_builtin_fsa().unwrap();
let overlay = default_overlay();
let ctx = EngagementContext::demo();

let result = run_engagement_streaming(
    &bwp, &overlay, &ctx, 42,
    Box::new(|event| { /* forward to WebSocket, dashboard, etc. */ }),
).unwrap();
}

Live Anomaly Injection

The live_injection module injects anomalies into an already-generated event log, simulating emerging risks at runtime rather than only at generation time. Each LiveInjectionConfig specifies an anomaly type, an optional target procedure filter, an injection probability, and a severity level.

#![allow(unused)]
fn main() {
use datasynth_audit_fsm::live_injection::{inject_live_anomalies, LiveInjectionConfig};
use datasynth_audit_fsm::event::{AuditAnomalyType, AnomalySeverity};

let configs = vec![LiveInjectionConfig {
    anomaly_type: AuditAnomalyType::LatePosting,
    target_procedure: Some("substantive_testing".into()),
    injection_probability: 0.10,
    severity: AnomalySeverity::Medium,
}];

// `result` is an EngagementResult from a prior engine run
let injected = inject_live_anomalies(&mut result.event_log, &configs, 99);
}

Events already flagged as anomalous by the engine’s build-time injection are skipped to avoid double-labeling.

Analytics Inventory

The analytics_inventory module provides data requirement and analytical procedure mappings for each audit step. Five inventories are embedded at compile time:

InventoryFrameworkLoader Function
FSAISAload_fsa_inventory()
IAIIA-GIASload_ia_inventory()
SOC 2AICPA-TSCload_soc2_inventory()
PCAOBPCAOB ASload_pcaob_inventory()
RegulatoryFFIEC/OCCload_regulatory_inventory()

Each step entry (StepInventory) contains data requirements (input data sources, fields, scope) and analytical procedures (technique, data points, thresholds). A convenience function load_inventory_for_framework(framework) dispatches to the appropriate loader based on the blueprint’s framework string.

Event Trail

Each event in the trail captures a state transition or procedure step:

#![allow(unused)]
fn main() {
pub struct AuditEvent {
    pub event_id: Uuid,
    pub timestamp: NaiveDateTime,
    pub event_type: String,        // "state_transition" or "procedure_step"
    pub procedure_id: String,
    pub step_id: Option<String>,
    pub phase_id: String,
    pub from_state: Option<String>,
    pub to_state: Option<String>,
    pub actor_id: String,
    pub command: String,
    pub evidence_refs: Vec<String>,
    pub standards_refs: Vec<String>,
    pub is_anomaly: bool,
    pub anomaly_type: Option<AuditAnomalyType>,
}
}

Events are exported as flat JSON via export_events_to_json() and can be projected to OCEL 2.0 format via project_to_ocel() for use with process mining tools (PM4Py, Celonis, OCPA).

Configuration

audit:
  enabled: true
  fsm:
    enabled: true
    blueprint: builtin:fsa    # builtin:fsa, builtin:ia, builtin:kpmg, builtin:pwc,
                               # builtin:deloitte, builtin:ey_gam_lite, builtin:soc2,
                               # builtin:pcaob, builtin:regulatory, or path to custom YAML
    overlay: builtin:default   # builtin:default, builtin:thorough, builtin:rushed

Example Output

Sample events from an FSA engagement event trail:

[
  {
    "event_id": "a1b2c3d4-...",
    "timestamp": "2025-01-15T09:00:00",
    "event_type": "state_transition",
    "procedure_id": "accept_engagement",
    "phase_id": "planning",
    "from_state": "not_started",
    "to_state": "in_progress",
    "actor_id": "engagement_partner",
    "command": "evaluate_client_acceptance",
    "is_anomaly": false
  },
  {
    "event_id": "e5f6g7h8-...",
    "timestamp": "2025-01-15T10:23:00",
    "event_type": "procedure_step",
    "procedure_id": "accept_engagement",
    "step_id": "evaluate_acceptance",
    "phase_id": "planning",
    "actor_id": "engagement_partner",
    "command": "evaluate_client_acceptance",
    "evidence_refs": ["wp_client_assessment"],
    "standards_refs": ["ISA-220"],
    "is_anomaly": false
  }
]

With the default overlay, the FSA blueprint produces 51 events and 1,916 artifacts. The IA blueprint produces 368 events and 1,891 artifacts. The Big 4 and domain-specific blueprints produce correspondingly larger event trails and artifact sets.

Key Types

#![allow(unused)]
fn main() {
// Blueprint root
pub struct AuditBlueprint {
    pub id: String,
    pub methodology: BlueprintMethodology,
    pub discriminators: HashMap<String, Vec<String>>,
    pub actors: Vec<BlueprintActor>,
    pub phases: Vec<BlueprintPhase>,
}

// FSM engine
pub struct AuditFsmEngine { /* blueprint, overlay, rng, preconditions, dispatcher */ }

// Engagement output
pub struct EngagementResult {
    pub event_log: Vec<AuditEvent>,
    pub procedure_states: HashMap<String, String>,
    pub anomalies: Vec<AuditAnomalyRecord>,
    pub artifacts: ArtifactBag,
    pub total_duration_hours: f64,
}

// Artifact accumulator (20 typed collections)
pub struct ArtifactBag {
    pub engagements: Vec<AuditEngagement>,
    pub materiality_calculations: Vec<MaterialityCalculation>,
    pub risk_assessments: Vec<RiskAssessment>,
    pub workpapers: Vec<Workpaper>,
    pub findings: Vec<AuditFinding>,
    pub audit_opinions: Vec<AuditOpinion>,
    // ... 14 more artifact types
}
}

See Also