Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Graph Export

Export transaction data as ML-ready graphs.

Overview

Graph export transforms financial data into network representations:

  • Accounting Network (GL accounts as nodes, transactions as edges) - New in v0.2.1
  • Transaction networks (accounts and entities)
  • Approval networks (users and approvals)
  • Entity relationship graphs (ownership)

Accounting Network Graph Export

The accounting network represents money flows between GL accounts, designed for network reconstruction and anomaly detection algorithms.

Quick Start

# Generate with graph export enabled
datasynth-data generate --config config.yaml --output ./output --graph-export

Graph Structure

ElementDescription
NodesGL Accounts from Chart of Accounts
EdgesMoney flows FROM credit accounts TO debit accounts
DirectionDirected graph (source→target)
     ┌──────────────┐
     │ Credit Acct  │
     │   (2000)     │
     └──────┬───────┘
            │ $1,000
            ▼
     ┌──────────────┐
     │ Debit Acct   │
     │   (1100)     │
     └──────────────┘

Edge Features (8 dimensions)

FeatureIndexDescription
log_amountF0log10(transaction amount)
benford_probF1Expected first-digit probability
weekdayF2Day of week (normalized 0-1)
periodF3Fiscal period (normalized 0-1)
is_month_endF4Last 3 days of month
is_year_endF5Last month of year
is_anomalyF6Anomaly flag (0 or 1)
business_processF7Encoded business process

Output Files

output/graphs/accounting_network/pytorch_geometric/
├── edge_index.npy      # [2, E] source→target node indices
├── node_features.npy   # [N, 4] node feature vectors
├── edge_features.npy   # [E, 8] edge feature vectors
├── edge_labels.npy     # [E] anomaly labels (0=normal, 1=anomaly)
├── node_labels.npy     # [N] node labels
├── train_mask.npy      # [N] boolean training mask
├── val_mask.npy        # [N] boolean validation mask
├── test_mask.npy       # [N] boolean test mask
├── metadata.json       # Graph statistics and configuration
└── load_graph.py       # Auto-generated Python loader script

Loading in Python

import numpy as np
import json

# Load metadata
with open('metadata.json') as f:
    meta = json.load(f)
print(f"Nodes: {meta['num_nodes']}, Edges: {meta['num_edges']}")

# Load arrays
edge_index = np.load('edge_index.npy')      # [2, E]
node_features = np.load('node_features.npy') # [N, F]
edge_features = np.load('edge_features.npy') # [E, 8]
edge_labels = np.load('edge_labels.npy')     # [E]

# For PyTorch Geometric
import torch
from torch_geometric.data import Data

data = Data(
    x=torch.from_numpy(node_features).float(),
    edge_index=torch.from_numpy(edge_index).long(),
    edge_attr=torch.from_numpy(edge_features).float(),
    y=torch.from_numpy(edge_labels).long(),
)

Configuration

graph_export:
  enabled: true
  formats:
    - pytorch_geometric
  train_ratio: 0.7
  validation_ratio: 0.15
  # test_ratio is automatically 1 - train - val = 0.15

Use Cases

  1. Anomaly Detection: Train GNNs to detect anomalous transaction patterns
  2. Network Reconstruction: Validate accounting network recovery algorithms
  3. Fraud Detection: Identify suspicious money flow patterns
  4. Link Prediction: Predict likely transaction relationships

Configuration

graph_export:
  enabled: true

  formats:
    - pytorch_geometric
    - neo4j
    - dgl

  graphs:
    - transaction_network
    - approval_network
    - entity_relationship

  split:
    train: 0.7
    val: 0.15
    test: 0.15
    stratify: is_anomaly

  features:
    temporal: true
    amount: true
    structural: true
    categorical: true

Graph Types

Transaction Network

Accounts and entities as nodes, transactions as edges.

     ┌──────────┐
     │ Account  │
     │  1100    │
     └────┬─────┘
          │ $1000
          ▼
     ┌──────────┐
     │ Customer │
     │  C-001   │
     └──────────┘

Nodes:

  • GL accounts
  • Vendors
  • Customers
  • Cost centers

Edges:

  • Journal entry lines
  • Payments
  • Invoices

Approval Network

Users as nodes, approval relationships as edges.

     ┌──────────┐
     │  Clerk   │
     │  U-001   │
     └────┬─────┘
          │ approved
          ▼
     ┌──────────┐
     │ Manager  │
     │  U-002   │
     └──────────┘

Nodes: Employees/users Edges: Approval actions

Entity Relationship Network

Legal entities with ownership relationships.

     ┌──────────┐
     │  Parent  │
     │  1000    │
     └────┬─────┘
          │ 100%
          ▼
     ┌──────────┐
     │   Sub    │
     │  2000    │
     └──────────┘

Nodes: Companies Edges: Ownership, IC transactions

Export Formats

PyTorch Geometric

output/graphs/transaction_network/pytorch_geometric/
├── node_features.pt    # [num_nodes, num_features]
├── edge_index.pt       # [2, num_edges]
├── edge_attr.pt        # [num_edges, num_edge_features]
├── labels.pt           # Labels
├── train_mask.pt       # Boolean training mask
├── val_mask.pt         # Boolean validation mask
└── test_mask.pt        # Boolean test mask

Loading in Python:

import torch
from torch_geometric.data import Data

# Load tensors
node_features = torch.load('node_features.pt')
edge_index = torch.load('edge_index.pt')
edge_attr = torch.load('edge_attr.pt')
labels = torch.load('labels.pt')
train_mask = torch.load('train_mask.pt')

# Create PyG Data object
data = Data(
    x=node_features,
    edge_index=edge_index,
    edge_attr=edge_attr,
    y=labels,
    train_mask=train_mask,
)

print(f"Nodes: {data.num_nodes}")
print(f"Edges: {data.num_edges}")

Neo4j

output/graphs/transaction_network/neo4j/
├── nodes_account.csv
├── nodes_vendor.csv
├── nodes_customer.csv
├── edges_transaction.csv
├── edges_payment.csv
└── import.cypher

Import script (import.cypher):

// Load accounts
LOAD CSV WITH HEADERS FROM 'file:///nodes_account.csv' AS row
CREATE (:Account {
    id: row.id,
    name: row.name,
    type: row.type
});

// Load transactions
LOAD CSV WITH HEADERS FROM 'file:///edges_transaction.csv' AS row
MATCH (from:Account {id: row.from_id})
MATCH (to:Account {id: row.to_id})
CREATE (from)-[:TRANSACTION {
    amount: toFloat(row.amount),
    date: date(row.posting_date),
    is_anomaly: toBoolean(row.is_anomaly)
}]->(to);

DGL (Deep Graph Library)

output/graphs/transaction_network/dgl/
├── graph.bin           # Serialized DGL graph
├── node_feats.npy      # Node features
├── edge_feats.npy      # Edge features
└── labels.npy          # Labels

Loading in Python:

import dgl
import numpy as np

# Load graph
graph = dgl.load_graphs('graph.bin')[0][0]

# Load features
graph.ndata['feat'] = torch.tensor(np.load('node_feats.npy'))
graph.edata['feat'] = torch.tensor(np.load('edge_feats.npy'))
graph.ndata['label'] = torch.tensor(np.load('labels.npy'))

Features

Temporal Features

features:
  temporal: true
FeatureDescription
weekdayDay of week (0-6)
periodFiscal period (1-12)
is_month_endLast 3 days of month
is_quarter_endLast week of quarter
is_year_endLast month of year
hourHour of posting

Amount Features

features:
  amount: true
FeatureDescription
log_amountlog10(amount)
benford_probExpected first-digit probability
is_round_numberEnds in 00, 000, etc.
amount_zscoreStandard deviations from mean

Structural Features

features:
  structural: true
FeatureDescription
line_countNumber of JE lines
unique_accountsDistinct accounts used
has_intercompanyIC transaction flag
debit_credit_ratioTotal debits / credits

Categorical Features

features:
  categorical: true

One-hot encoded:

  • business_process: Manual, P2P, O2C, etc.
  • source_type: System, User, Recurring
  • account_type: Asset, Liability, etc.

Train/Val/Test Splits

split:
  train: 0.7                         # 70% training
  val: 0.15                          # 15% validation
  test: 0.15                         # 15% test
  stratify: is_anomaly               # Maintain anomaly ratio
  random_seed: 42                    # Reproducible splits

Stratification options:

  • is_anomaly: Balanced anomaly detection
  • is_fraud: Balanced fraud detection
  • account_type: Balanced by account type
  • null: Random (no stratification)

GNN Training Example

import torch
from torch_geometric.nn import GCNConv

class AnomalyGNN(torch.nn.Module):
    def __init__(self, num_features, hidden_dim):
        super().__init__()
        self.conv1 = GCNConv(num_features, hidden_dim)
        self.conv2 = GCNConv(hidden_dim, 2)  # Binary classification

    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x

# Train
model = AnomalyGNN(data.num_features, 64)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

for epoch in range(100):
    model.train()
    optimizer.zero_grad()
    out = model(data)
    loss = F.cross_entropy(out[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()

Graph Property Mapping (v0.9.4)

The ToNodeProperties trait provides a standardized way to convert typed Rust model structs into graph node property maps with camelCase keys, suitable for Neo4j, AssureTwin, and other graph consumers.

ToNodeProperties Trait

#![allow(unused)]
fn main() {
pub trait ToNodeProperties {
    fn node_type_name(&self) -> &'static str;  // e.g. "uncertain_tax_position"
    fn node_type_code(&self) -> u16;           // e.g. 416
    fn to_node_properties(&self) -> HashMap<String, GraphPropertyValue>;
}
}

GraphPropertyValue Enum

#![allow(unused)]
fn main() {
pub enum GraphPropertyValue {
    String(String),
    Int(i64),
    Float(f64),
    Decimal(Decimal),
    Bool(bool),
    Date(NaiveDate),
    StringList(Vec<String>),
}
}

GraphNode::from_entity()

Bridge method for converting any ToNodeProperties implementor into a graph node:

#![allow(unused)]
fn main() {
let node = GraphNode::from_entity(node_id, &tax_return);
// node.properties contains all camelCase property keys
}

Implemented Entity Types (51 types across 10 process families)

All model structs implement ToNodeProperties, mapping their fields to camelCase property keys. Boolean flags (isApproved, isPassed, isActive, treatyApplied, billable, etc.) are derived from status fields or probability-based generation for graph query convenience.

Multi-Layer Hypergraph (v0.6.2+)

The RustGraph Hypergraph exporter supports all enterprise process families with 51 entity type codes:

Entity Type Codes

RangeFamilyTypes
100-106CoreCompany, Vendor, Material, Customer, Employee, GlAccount
300-303P2PPurchaseOrder, GoodsReceipt, VendorInvoice, Payment
310-312O2CSalesOrder, Delivery, CustomerInvoice
320-325S2CSourcingProject, RfxEvent, SupplierBid, BidEvaluation, ProcurementContract, SupplierQualification
330-333H2RPayrollRun, TimeEntry, ExpenseReport, BenefitEnrollment
340-345MFGProductionOrder, RoutingOperation, QualityInspection, CycleCount, BomComponent, InventoryMovement
350-352BANKBankingCustomer, BankAccount, BankTransaction
360-365AUDITAuditEngagement, Workpaper, AuditFinding, AuditEvidence, RiskAssessment, ProfessionalJudgment
370-372Bank ReconBankReconciliation, BankStatementLine, ReconcilingItem
400OCPMOcpmEvent (events as hyperedges)
410-416TAXTaxJurisdiction, TaxCode, TaxLine, TaxReturn, TaxProvision, WithholdingTaxRecord, UncertainTaxPosition
420-427TreasuryCashPosition, CashForecast, CashPool, CashPoolSweep, HedgingInstrument, HedgeRelationship, DebtInstrument, DebtCovenant
430-442ESGEmissionRecord, EnergyConsumption, WaterUsage, WasteRecord, WorkforceDiversityMetric, PayEquityMetric, SafetyIncident, SafetyMetric, GovernanceMetric, SupplierEsgAssessment, MaterialityAssessment, EsgDisclosure, ClimateScenario
450-455ProjectProject, ProjectCostLine, ProjectRevenue, EarnedValueMetric, ChangeOrder, ProjectMilestone
500-504GOVCosoComponent, CosoPrinciple, SoxAssertion, AuditEngagement, ProfessionalJudgment
505-508ComplianceComplianceStandard, Jurisdiction, RegulatoryFiling, ComplianceFinding
510-513Compliance (ToNodeProperties)ComplianceStandard, ComplianceFinding, RegulatoryFiling, JurisdictionProfile

Edge Type Registry (v0.9.4)

28 typed relationship variants with source→target entity constraints:

FamilyEdge TypeSource → Target
P2PPlacedWithPurchaseOrder → Vendor
P2PMatchesOrderVendorInvoice → PurchaseOrder
P2PPaysInvoicePayment → VendorInvoice
O2CPlacedBySalesOrder → Customer
O2CBillsOrderCustomerInvoice → SalesOrder
S2CRfxBelongsToProjectRfxEvent → SourcingProject
S2CRespondsToSupplierBid → RfxEvent
S2CAwardedFromProcurementContract → BidEvaluation
H2RRecordedByTimeEntry → Employee
H2RPayrollIncludesPayrollRun → Employee
H2RSubmittedByExpenseReport → Employee
H2REnrolledByBenefitEnrollment → Employee
MFGProducesProductionOrder → Material
MFGInspectsQualityInspection → ProductionOrder
MFGPartOfBomComponent → Material
TAXTaxLineBelongsToTaxLine → TaxReturn
TAXProvisionAppliesToTaxProvision → TaxJurisdiction
TAXWithheldFromWithholdingTaxRecord → Vendor
TreasurySweepsToCashPoolSweep → CashPool
TreasuryHedgesInstrumentHedgeRelationship → HedgingInstrument
TreasuryGovernsInstrumentDebtCovenant → DebtInstrument
ESGEmissionReportedByEmissionRecord → Company
ESGAssessesSupplierSupplierEsgAssessment → Vendor
ProjectCostChargedToProjectCostLine → Project
ProjectMilestoneOfProjectMilestone → Project
ProjectModifiesProjectChangeOrder → Project
GOVPrincipleUnderCosoPrinciple → CosoComponent
GOVAssertionCoversSoxAssertion → GlAccount
GOVJudgmentWithinProfessionalJudgment → AuditEngagement
ComplianceStandardToControlComplianceStandard → InternalControl
ComplianceFindingOnControlComplianceFinding → InternalControl
ComplianceStandardToAccountComplianceStandard → GlAccount
ComplianceFiledByCompanyRegulatoryFiling → Company
ComplianceGovernedByStandardGlAccount → ComplianceStandard
ComplianceImplementsStandardInternalControl → ComplianceStandard
ComplianceFindingAffectsControlComplianceFinding → InternalControl
ComplianceFindingAffectsAccountComplianceFinding → GlAccount

Each edge has a typed EdgeConstraint with Cardinality (OneToOne, OneToMany, ManyToMany) and optional edge properties.

OCPM Events as Hyperedges

When events_as_hyperedges: true, each OCPM event becomes a hyperedge connecting all its participating objects. This enables cross-process analysis via the hypergraph structure.

Per-Family Toggles

graph_export:
  hypergraph:
    enabled: true
    process_layer:
      include_p2p: true
      include_o2c: true
      include_s2c: true
      include_h2r: true
      include_mfg: true
      include_bank: true
      include_audit: true
      include_r2r: true
      events_as_hyperedges: true

Compliance Graph Integration (v1.1.0)

The compliance regulations framework integrates with both the standalone ComplianceGraphBuilder and the multi-layer hypergraph, enabling full enterprise graph traversal between regulatory standards, accounting data, and process documents.

Cross-Domain Edges

When compliance regulations are enabled, the graph includes cross-domain edges:

Company ──FiledByCompany──▶ RegulatoryFiling
                                  │
                           Jurisdiction
                                  │
                        ComplianceStandard
                         ╱              ╲
          GovernedByStandard      ImplementsStandard
               ╱                          ╲
         GlAccount                 InternalControl
             │                           │
       JournalEntry              ComplianceFinding

Hypergraph Placement

Compliance TypeHypergraph LayerType Code
ComplianceStandardLayer 1 (GovernanceControls)505
JurisdictionLayer 1 (GovernanceControls)506
RegulatoryFilingLayer 2 (ProcessEvents)507
ComplianceFindingLayer 2 (ProcessEvents)508

Configuration

compliance_regulations:
  graph:
    enabled: true
    include_account_links: true     # Standard → Account edges
    include_control_links: true     # Standard → Control edges
    include_company_links: true     # Filing → Company edges

ToNodeProperties

All four compliance models implement ToNodeProperties for typed graph node conversion:

ModelType CodeKey Properties
ComplianceStandard510standardId, title, issuingBody, category, domain, applicableAccountTypes, applicableProcesses
ComplianceFinding511findingId, severity, deficiencyLevel, controlId, affectedAccounts, remediationStatus
RegulatoryFiling512filingType, companyCode, jurisdiction, status, deadline
JurisdictionProfile513countryCode, accountingFramework, auditFramework, corporateTaxRate

See Also