Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

datasynth-graph

Graph/network export for synthetic accounting data with ML-ready formats.

Overview

datasynth-graph provides graph construction and export capabilities:

  • Graph Builders: Transaction, approval, entity relationship, compliance, and multi-layer hypergraph builders
  • Graph Property Mapping: ToNodeProperties trait bridge via GraphNode::from_entity() for typed model→graph node conversion with 55+ entity types
  • Hypergraph: 3-layer hypergraph (Governance, Process Events, Accounting Network) spanning 10 process families with 55+ entity type codes, OCPM event hyperedges, and compliance regulation nodes
  • Compliance Graph: Cross-domain linking of standards to GL accounts, internal controls, companies, and business processes
  • ML Export: PyTorch Geometric, Neo4j, DGL, RustGraph, and RustGraph Hypergraph formats
  • Feature Engineering: Temporal, amount, structural, and categorical features
  • Data Splits: Train/validation/test split generation

Graph Types

GraphNodesEdgesUse Case
Transaction NetworkAccounts, EntitiesTransactionsAnomaly detection
Approval NetworkUsersApprovalsSoD analysis
Entity RelationshipLegal EntitiesOwnershipConsolidation analysis
Compliance NetworkStandards, Findings, Filings, JurisdictionsGovernedByStandard, ImplementsStandard, FiledByCompany, FindingAffects*Compliance coverage, risk propagation

Export Formats

PyTorch Geometric

graphs/transaction_network/pytorch_geometric/
├── node_features.pt    # [num_nodes, num_features]
├── edge_index.pt       # [2, num_edges]
├── edge_attr.pt        # [num_edges, num_edge_features]
├── labels.pt           # [num_nodes] or [num_edges]
├── train_mask.pt       # Boolean mask
├── val_mask.pt
└── test_mask.pt

Neo4j

graphs/entity_relationship/neo4j/
├── nodes_account.csv
├── nodes_entity.csv
├── edges_transaction.csv
├── edges_ownership.csv
└── import.cypher

DGL (Deep Graph Library)

graphs/approval_network/dgl/
├── graph.bin           # DGL graph object
├── node_feats.npy      # Node features
├── edge_feats.npy      # Edge features
└── labels.npy          # Labels

Feature Categories

CategoryFeatures
Temporalweekday, period, is_month_end, is_quarter_end, is_year_end
Amountlog(amount), benford_probability, is_round_number
Structuralline_count, unique_accounts, has_intercompany
Categoricalbusiness_process (one-hot), source_type (one-hot)

Key Types

Graph Models

#![allow(unused)]
fn main() {
pub struct Graph {
    pub nodes: Vec<Node>,
    pub edges: Vec<Edge>,
    pub node_features: Option<Array2<f32>>,
    pub edge_features: Option<Array2<f32>>,
}

pub enum Node {
    Account(AccountNode),
    Entity(EntityNode),
    User(UserNode),
    Transaction(TransactionNode),
}

pub enum Edge {
    Transaction(TransactionEdge),
    Approval(ApprovalEdge),
    Ownership(OwnershipEdge),
}
}

Split Configuration

#![allow(unused)]
fn main() {
pub struct SplitConfig {
    pub train_ratio: f64,     // e.g., 0.7
    pub val_ratio: f64,       // e.g., 0.15
    pub test_ratio: f64,      // e.g., 0.15
    pub stratify_by: Option<String>,
    pub random_seed: u64,
}
}

Usage Examples

Building Transaction Graph

#![allow(unused)]
fn main() {
use synth_graph::{TransactionGraphBuilder, GraphConfig};

let builder = TransactionGraphBuilder::new(GraphConfig::default());
let graph = builder.build(&journal_entries)?;

println!("Nodes: {}", graph.nodes.len());
println!("Edges: {}", graph.edges.len());
}

PyTorch Geometric Export

#![allow(unused)]
fn main() {
use synth_graph::{PyTorchGeometricExporter, SplitConfig};

let exporter = PyTorchGeometricExporter::new("output/graphs");

let split = SplitConfig {
    train_ratio: 0.7,
    val_ratio: 0.15,
    test_ratio: 0.15,
    stratify_by: Some("is_anomaly".to_string()),
    random_seed: 42,
};

exporter.export(&graph, split)?;
}

Neo4j Export

#![allow(unused)]
fn main() {
use synth_graph::Neo4jExporter;

let exporter = Neo4jExporter::new("output/graphs/neo4j");
exporter.export(&graph)?;

// Generates import script:
// LOAD CSV WITH HEADERS FROM 'file:///nodes_account.csv' AS row
// CREATE (:Account {id: row.id, name: row.name, ...})
}

Feature Engineering

#![allow(unused)]
fn main() {
use synth_graph::features::{FeatureExtractor, FeatureConfig};

let extractor = FeatureExtractor::new(FeatureConfig {
    temporal: true,
    amount: true,
    structural: true,
    categorical: true,
});

let node_features = extractor.extract_node_features(&entries)?;
let edge_features = extractor.extract_edge_features(&entries)?;
}

Graph Construction

Transaction Network

Accounts and entities become nodes; transactions become edges.

#![allow(unused)]
fn main() {
// Nodes:
// - Each GL account is a node
// - Each vendor/customer is a node

// Edges:
// - Each journal entry line creates an edge
// - Edge connects account to entity
// - Edge features: amount, date, fraud flag
}

Approval Network

Users become nodes; approval relationships become edges.

#![allow(unused)]
fn main() {
// Nodes:
// - Each user/employee is a node
// - Node features: approval_limit, department, role

// Edges:
// - Approval actions create edges
// - Edge features: amount, threshold, escalation
}

Entity Relationship Network

Legal entities become nodes; ownership and IC relationships become edges.

#![allow(unused)]
fn main() {
// Nodes:
// - Each company/legal entity is a node
// - Node features: currency, country, parent_flag

// Edges:
// - Ownership relationships (parent → subsidiary)
// - IC transaction relationships
// - Edge features: ownership_percent, transaction_volume
}

ML Integration

Loading in PyTorch

import torch
from torch_geometric.data import Data

# Load exported data
node_features = torch.load('node_features.pt')
edge_index = torch.load('edge_index.pt')
edge_attr = torch.load('edge_attr.pt')
labels = torch.load('labels.pt')
train_mask = torch.load('train_mask.pt')

data = Data(
    x=node_features,
    edge_index=edge_index,
    edge_attr=edge_attr,
    y=labels,
    train_mask=train_mask,
)

Loading in Neo4j

# Import using generated script
neo4j-admin import \
    --nodes=nodes_account.csv \
    --nodes=nodes_entity.csv \
    --relationships=edges_transaction.csv

Configuration

graph_export:
  enabled: true
  formats:
    - pytorch_geometric
    - neo4j
  graphs:
    - transaction_network
    - approval_network
    - entity_relationship
  split:
    train: 0.7
    val: 0.15
    test: 0.15
    stratify: is_anomaly
  features:
    temporal: true
    amount: true
    structural: true
    categorical: true

Graph Property Mapping (v0.9.4)

The GraphNode::from_entity() bridge converts any ToNodeProperties implementor into a graph node:

#![allow(unused)]
fn main() {
use synth_graph::models::GraphNode;
use synth_core::models::ToNodeProperties;

// Convert any model struct to a graph node
let node = GraphNode::from_entity(node_id, &tax_return);
// node.properties contains all camelCase property keys from the model
}

From<GraphPropertyValue> for NodeProperty handles automatic type conversion between the core property enum and the graph module’s property types.

51 entity types across 10 process families and 28 typed edge variants with EdgeConstraint validation are available. See Graph Export for the full entity type code table and edge registry.

Multi-Layer Hypergraph (v0.6.2+)

The hypergraph builder supports all enterprise process families:

MethodFamilyNode Types
add_p2p_documents()P2PPurchaseOrder, GoodsReceipt, VendorInvoice, Payment
add_o2c_documents()O2CSalesOrder, Delivery, CustomerInvoice
add_s2c_documents()S2CSourcingProject, RfxEvent, SupplierBid, ProcurementContract
add_h2r_documents()H2RPayrollRun, TimeEntry, ExpenseReport, BenefitEnrollment
add_mfg_documents()MFGProductionOrder, QualityInspection, CycleCount, BomComponent, InventoryMovement
add_bank_documents()BANKBankingCustomer, BankAccount, BankTransaction
add_audit_documents()AUDITAuditEngagement, Workpaper, AuditFinding, AuditEvidence
add_bank_recon_documents()Bank ReconBankReconciliation, BankStatementLine, ReconcilingItem
add_ocpm_events()OCPMEvents as hyperedges (entity type 400)
add_compliance_regulations()ComplianceComplianceStandard (Layer 1), ComplianceFinding, RegulatoryFiling (Layer 2)

Compliance Graph Builder (v1.1.0)

ComplianceGraphBuilder creates a standalone compliance network with cross-domain edges:

#![allow(unused)]
fn main() {
use datasynth_graph::{ComplianceGraphBuilder, ComplianceGraphConfig, AccountLinkInput, ControlLinkInput, FilingNodeInput};

let config = ComplianceGraphConfig {
    include_account_links: true,
    include_control_links: true,
    include_company_links: true,
    ..Default::default()
};
let mut builder = ComplianceGraphBuilder::new(config);

// Add standards, jurisdictions, procedures, findings
builder.add_standards(&standard_inputs);
builder.add_jurisdictions(&jurisdiction_inputs);
builder.add_findings(&finding_inputs);

// Cross-domain: link standards to GL accounts
builder.add_account_links(&account_links);

// Cross-domain: link standards to internal controls
builder.add_control_links(&control_links);

// Cross-domain: link filings to companies
builder.add_filings(&filing_inputs);

let graph = builder.build();
}

Traversal Paths

The compliance graph enables traversal across the full enterprise:

Company → Filing → Jurisdiction → Standard → Account → JournalEntry
                                           → Control → Finding

See Also