State Machine Best Practices

This guide covers best practices, patterns, and techniques for building robust state machines with mkunion. Whether you're building simple state machines or complex distributed systems, these practices will help you create maintainable and scalable solutions.

Best Practices

When building state machines with mkunion, following these practices will help you create maintainable and robust systems:

File Organization

Organize your state machine code across files for better maintainability:

model.go: State and command definitions with other model types like value objects, etc.

example/state/model.go

//
//go:tag mkunion:"Command"
type (
 CreateOrderCMD struct {
     OrderID OrderID
     Attr    OrderAttr
 }
 MarkAsProcessingCMD struct {
     OrderID  OrderID
     WorkerID WorkerID
 }
 CancelOrderCMD struct {
     OrderID OrderID
     Reason  string
 }
 MarkOrderCompleteCMD struct {
     OrderID  OrderID
     WorkerID WorkerID
 }
 // TryRecoverErrorCMD is a special command that can be used to recover from error state
 // you can have different "self-healing" rules based on the error code or even return to previous healthy state
 TryRecoverErrorCMD struct {
     OrderID OrderID
 }
)

//
//go:tag mkunion:"State"
type (
 OrderPending struct {
     Order Order
 }
 OrderProcessing struct {
     Order Order
 }
 OrderCompleted struct {
     Order Order
 }
 OrderCancelled struct {
     Order Order
 }
 // OrderError is a special state that represent an error
 // during order processing, you can have different "self-healing jobs" based on the error code
 // like retrying the order, cancel the order, etc.
 //
 // This pattern enables:
 // 1. Perfect reproduction of the failure
 // 2. Automatic retry with the same command
 // 3. Debugging with full context
 // 4. Recovery to previous valid state
 OrderError struct {
     // error information
     Retried   int
     RetriedAt *time.Time

     ProblemCode ProblemCode

     ProblemCommand Command
     ProblemState   State
 }
)

type (
 // OrderID Price, Quantity are placeholders for value objects, to ensure better data semantic and type safety
 OrderID  = string
 Price    = float64
 Quantity = int

 OrderAttr struct {
     // placeholder for order attributes
     // like customer name, address, etc.
     // like product name, price, etc.
     // for simplicity we only have Price and Quantity
     Price    Price
     Quantity Quantity
 }

 // WorkerID represent human that process the order
 WorkerID = string

 // Order everything we know about order
 Order struct {
     ID               OrderID
     OrderAttr        OrderAttr
     WorkerID         WorkerID
     StockRemovedAt   *time.Time
     PaymentChargedAt *time.Time
     DeliveredAt      *time.Time
     CancelledAt      *time.Time
     CancelledReason  string
 }
)

type ProblemCode int

const (
 ProblemWarehouseAPIUnreachable ProblemCode = iota
 ProblemPaymentAPIUnreachable
)

machine.go: Core state machine initialization, and most importantly transition logic:

example/state/machine.go

//go:generate moq -with-resets -stub -out machine_mock.go . Dependency
type Dependency interface {
 TimeNow() *time.Time
 WarehouseRemoveStock(ctx context.Context, quantity Quantity) error
 PaymentCharge(ctx context.Context, price Price) error
}

func NewMachine(di Dependency, init State) *machine.Machine[Dependency, Command, State] {
 return machine.NewMachine(di, Transition, init)
}


func Transition(ctx context.Context, di Dependency, cmd Command, state State) (State, error) {
 return MatchCommandR2(
     cmd,
     func(x *CreateOrderCMD) (State, error) {
         // 1. Structural validation as simple checks and explicit error type
         if x.OrderID == "" {
             return nil, ErrOrderIDRequired
         }

         switch state.(type) {
         case nil:
             o := Order{
                 ID:        x.OrderID,
                 OrderAttr: x.Attr,
             }
             return &OrderPending{
                 Order: o,
             }, nil
         }

         return nil, ErrOrderAlreadyExist
     },
// ... and so on

Naming Conventions

States: Use descriptive nouns that clearly indicate the state (e.g., OrderPending, PaymentProcessing)
Commands: Suffix with CMD for clarity (e.g., CreateOrderCMD, CancelOrderCMD)
Packages: Keep state machines in dedicated packages named after the domain (e.g., order, payment)

State Design

Keep States Focused: Each state should represent one clear condition
Immutable Data: States should contain immutable data; create new states instead of modifying
Minimal State Data: Only store data that's essential for the state's identity
Use Zero Values: Design states so Go's zero values are meaningful defaults

Command Validation

Centralizing validation in the Transition function provides significant benefits:

Single source of truth: All business rules and validation logic live in one place
Atomic validation: Commands are validated together with state checks, preventing invalid transitions
Testability: Easy to test all validation rules through the state machine tests
Maintainability: When rules change, you only update one location

Basic Validation

example/state/machine.go

        func(x *CreateOrderCMD) (State, error) {
            // 1. Structural validation as simple checks and explicit error type
            if x.OrderID == "" {
                return nil, ErrOrderIDRequired
            }

            switch state.(type) {
            case nil:
                o := Order{
                    ID:        x.OrderID,
                    OrderAttr: x.Attr,
                }
                return &OrderPending{
                    Order: o,
                }, nil
            }

            return nil, ErrOrderAlreadyExist
        },

Advanced Validation with go-validate

For complex validation requirements demonstrate

Structural validation is declarative (struct tags)
Business rules are explicit and testable
External validations are isolated in dependencies
State validations ensure valid transitions
All validation happens before any state change:

example/state/machine.go

        func(x *MarkOrderCompleteCMD) (State, error) {
            //  1. Structural validation of commands (you could use go-validate library):
            //
            //     if err := di.Validator().Struct(x); err != nil {
            //        return nil, fmt.Errorf("validation failed: %w. %s", err, ErrValidationFailed)
            //     }
            //
            //    or do it manually like in this example:
            if x.OrderID == "" {
                return nil, ErrOrderIDRequired
            }
            if x.WorkerID == "" {
                return nil, ErrWorkerIDRequired
            }

            // 2. Ensure valid transitions
            s, ok := state.(*OrderProcessing)
            if !ok {
                return nil, ErrCannotCompleteNonProcessingOrder
            }

            // 3. Business rule validation:
            //    Worker cannot approve it's own order
            if s.Order.WorkerID == x.WorkerID {
                return nil, ErrWorkerSelfApprove
            }

            // 4. External validation or mutations:
            if s.Order.StockRemovedAt == nil {
                // We need to remove stock first
                // We can retry this operation (assuming warehouse is idempotent, see TryRecoverErrorCMD)
                // OrderID could be used to deduplicate operation
                // it's not required in this example
                err := di.WarehouseRemoveStock(ctx, s.Order.OrderAttr.Quantity)
                if err != nil {
                    return &OrderError{
                        ProblemCode:    ProblemWarehouseAPIUnreachable,
                        ProblemCommand: x,
                        ProblemState:   s,
                    }, nil
                }

                s.Order.StockRemovedAt = di.TimeNow()
            }

            if s.Order.PaymentChargedAt == nil {
                // We need to charge payment first
                // We can retry this operation (assuming payment gateway is idempotent, see TryRecoverErrorCMD))
                // OrderID could be used to deduplicate operation
                // it's not required in this example
                err := di.PaymentCharge(ctx, s.Order.OrderAttr.Price)
                if err != nil {
                    return &OrderError{
                        ProblemCode:    ProblemPaymentAPIUnreachable,
                        ProblemCommand: x,
                        ProblemState:   s,
                    }, nil
                }

                s.Order.PaymentChargedAt = di.TimeNow()
            }

            s.Order.DeliveredAt = di.TimeNow()

            return &OrderCompleted{
                Order: s.Order,
            }, nil
        },

This approach scales well because of the separation of state from IO and business logic.

Dependency Management

Define Clear Interfaces: Dependencies should be interfaces, not concrete types
Keep Dependencies Minimal: Only inject what's absolutely necessary
Generate Mocks with moq: Use //go:generate moq to automatically generate mocks

example/state/machine.go

//go:generate moq -with-resets -stub -out machine_mock.go . Dependency
type Dependency interface {
    TimeNow() *time.Time
    WarehouseRemoveStock(ctx context.Context, quantity Quantity) error
    PaymentCharge(ctx context.Context, price Price) error
}

Running mkunion watch -g ./... creates machine_mock.go with a DependencyMock type. This mock can then be used in tests:

example/state/machine_test.go

func TestSuite(t *testing.T) {
    now := time.Now()
    var di Dependency = &DependencyMock{
        TimeNowFunc: func() *time.Time {
            return &now
        },
    }

// ... and some time later in assertion functions
                                ForkCase(t, "successfully recover", func(t *testing.T, c *machine.Case[Dependency, Command, State]) {
                                    c.
                                        GivenCommand(&TryRecoverErrorCMD{OrderID: "123"}).
                                        BeforeCommand(func(t testing.TB, di Dependency) {
                                            di.(*DependencyMock).ResetCalls()
                                        }).
                                        AfterCommand(func(t testing.TB, di Dependency) {
                                            dep := di.(*DependencyMock)
                                            if assert.Len(t, dep.WarehouseRemoveStockCalls(), 1) {
                                                assert.Equal(t, order.Quantity, dep.WarehouseRemoveStockCalls()[0].Quantity)
                                            }
                                            if assert.Len(t, dep.PaymentChargeCalls(), 1) {
                                                assert.Equal(t, order.Price, dep.PaymentChargeCalls()[0].Price)
                                            }
                                        }).
                                        ThenState(t, &OrderCompleted{
                                            Order: Order{
                                                ID:               "123",
                                                OrderAttr:        order,
                                                WorkerID:         "worker-1",
                                                DeliveredAt:      &now,
                                                StockRemovedAt:   &now,
                                                PaymentChargedAt: &now,
                                            },
                                        })
                                })

Benefits of generating mocks:

Reduces boilerplate: No need to manually write mock implementations
Type safety: Generated mocks always match the interface
Easy maintenance: Mocks automatically update when interface changes
Better test readability: Focus on behavior, not mock implementation

Testing Philosophy

When testing state machines, mkunion's test suite enforces an important principle: states can only be created through command sequences. This design philosophy ensures:

Reachability Verification: Every state used in tests is provably reachable through valid command sequences
Self-Documentation: Tests document exactly how to reach each state, serving as executable documentation
Invariant Preservation: Prevents testing impossible states that violate business rules
Realistic Testing: Tests mirror real-world usage patterns

Command-Only State Creation

Instead of allowing direct state instantiation in tests:

// ❌ Not supported - direct state creation
c.InitState = &OrderProcessing{ID: "123", Items: []Item{...}}

Tests must build states through command sequences:

// ✅ Correct - states created through commands
suite.Case(t, "order lifecycle", func(t *testing.T, c *Case[...]) {
    c.GivenCommand(&CreateOrderCMD{...}).
      ThenState(t, &OrderPending{...}).
      ForkCase(t, "process order", func(t *testing.T, c *Case[...]) {
          c.GivenCommand(&ProcessOrderCMD{...}).
            ThenState(t, &OrderProcessing{...})
          // Now we have OrderProcessing state created through valid commands
      })
})

This constraint is intentional and powerful - if you cannot reach a state through commands, it likely shouldn't exist or indicates a missing command in your domain model.

Testing Error States

Error states require special consideration. If an error state seems unreachable through normal commands, consider: - Is this error state actually possible in production? - Should this be modeled as an explicit error state rather than just an error return?

Benefits of This Approach

Prevents Invalid Test Scenarios: You can't accidentally test states that are impossible to reach in production
Forces Complete Command Design: If you need to test a state, you must provide a way to reach it
Living Documentation: Test cases become a guide for how to use the state machine
Catches Design Issues Early: Unreachable states are identified during test writing

State Machine Composition

For complex systems, compose multiple state machines as a service layer:

type OrderService struct {
    repo schemaless.Repository[State]
    deps Dependency
}

type ECommerceService struct {
    orderService   *OrderService
    paymentService *PaymentService
}

func NewECommerceService(orderSvc *OrderService, paymentSvc *PaymentService) *ECommerceService {
    return &ECommerceService{
        orderService:   orderSvc,
        paymentService: paymentSvc,
    }
}

func (s *ECommerceService) ProcessOrder(ctx context.Context, orderCmd Command) error {
    // 1. Handle order command through order service
    newOrderState, err := s.orderService.HandleCommand(ctx, orderCmd)
    if err != nil {
        return fmt.Errorf("order processing failed: %w", err)
    }

    // 2. If order is confirmed, trigger payment through payment service
    if processing, ok := newOrderState.(*OrderProcessing); ok {
        paymentCmd := &InitiatePaymentCMD{
            OrderID: processing.Order.ID,
            Amount:  processing.Order.OrderAttr.Price,
        }
        _, err := s.paymentService.HandleCommand(ctx, paymentCmd)
        if err != nil {
            return fmt.Errorf("payment initiation failed: %w", err)
        }
    }

    return nil
}

Key principles:

Domain services: Each domain encapsulates its repository, dependencies, and machine logic
Schemaless repositories: Use schemaless.Repository[StateType] for type-safe state storage
Service composition: Compose domain services, avoiding direct repository/machine access
Single responsibility: Each service handles one domain's state machine lifecycle
Optimistic concurrency: Built-in through schemaless.Repository version handling
No duplication: State loading, machine creation, and saving logic exists once per domain

Common Pitfalls

Avoid these common mistakes when implementing state machines:

1. State Explosion

Problem: Creating too many states for every minor variation

// Bad: Too granular
type (
    OrderPendingWithOneItem struct{}
    OrderPendingWithTwoItems struct{}
    OrderPendingWithThreeItems struct{}
    // ... and so on
)

Solution: Use state data instead

// Good: Single state with data
type OrderPending struct {
    Items []OrderItem
}

2. Circular Dependencies

Problem: States that can transition in circles without progress

// Problematic: A -> B -> C -> A without any business value

Solution: Ensure each transition represents meaningful progress or explicitly document allowed cycles

3. Missing Error States

Problem: Not modeling error conditions as explicit states

// Bad: Errors only in transition function
return nil, fmt.Errorf("payment failed")

Solution: Model error conditions as states when they need handling. Crucially, store both the command that failed and the previous valid state to enable recovery or debugging:

example/state/model.go

//
//go:tag mkunion:"State"
type (
    OrderPending struct {
        Order Order
    }
    OrderProcessing struct {
        Order Order
    }
    OrderCompleted struct {
        Order Order
    }
    OrderCancelled struct {
        Order Order
    }
    // OrderError is a special state that represent an error
    // during order processing, you can have different "self-healing jobs" based on the error code
    // like retrying the order, cancel the order, etc.
    //
    // This pattern enables:
    // 1. Perfect reproduction of the failure
    // 2. Automatic retry with the same command
    // 3. Debugging with full context
    // 4. Recovery to previous valid state
    OrderError struct {
        // error information
        Retried   int
        RetriedAt *time.Time

        ProblemCode ProblemCode

        ProblemCommand Command
        ProblemState   State
    }
)

The error state pattern enables recovery:

example/state/machine.go

        func(x *TryRecoverErrorCMD) (State, error) {
            if x.OrderID == "" {
                return nil, ErrOrderIDRequired
            }

            switch s := state.(type) {
            case *OrderError:
                s.Retried += 1
                s.RetriedAt = di.TimeNow()

                switch s.ProblemCode {
                case ProblemWarehouseAPIUnreachable,
                    ProblemPaymentAPIUnreachable:
                    // we can retry this operation
                    newState, err := Transition(ctx, di, s.ProblemCommand, s.ProblemState)
                    if err != nil {
                        return s, err
                    }

                    // make sure that error retries are preserved
                    if es, ok := newState.(*OrderError); ok {
                        es.Retried = s.Retried
                        es.RetriedAt = s.RetriedAt
                        return es, nil
                    }

                    return newState, nil

                default:
                    // we don't know what to do, return to previous state
                    return s, nil
                }
            }

            return nil, ErrCannotRecoverNonErrorState
        },

This approach preserves critical information needed for recovery without losing the context of what failed (look at Transition(ctx, di, s.ProblemCommand, s.ProblemState))

4. Ignoring Concurrency

Problem: Misunderstanding the state machine concurrency model

// Wrong: Sharing a machine instance across goroutines
sharedMachine := NewMachine(deps, currentState)
go sharedMachine.Handle(ctx, cmd1) // Goroutine 1
go sharedMachine.Handle(ctx, cmd2) // Goroutine 2 - DON'T DO THIS!

Solution: For handling concurrent updates to the same entity, see the Optimistic Concurrency Control section below.

5. Overloading Transitions

Problem: Putting too much business logic in transition functions

// Bad: Transition function doing too much
func Transition(...) (State, error) {
    // Send emails
    // Update inventory
    // Calculate prices
    // Log to external systems
    // ... 200 lines later
}

Solution: Keep transitions focused on state changes; delegate side effects to dependencies

Debugging and Observability

State History Tracking

The mkunion state machine pattern leverages Change Data Capture (CDC) for automatic state history tracking. Since every state transition is persisted with versioning through optimistic concurrency control, you get a complete audit trail without modifying your state machine logic.

The schemaless.Repository creates an append log of all state changes with version numbers, providing ordering guarantees and enabling powerful history tracking capabilities. CDC processors consume this stream asynchronously to build history aggregates, analytics, and debugging tools - all without impacting state machine performance. The system automatically handles failures through persistent, replayable streams that survive crashes and allow processors to resume from their last position.

This approach integrates seamlessly with other mkunion patterns like retry processors and timeout handlers, creating a unified system where every state change is tracked, queryable, and analyzable.

Real Implementation

The example app demonstrates CDC integration with taskRetry.RunCDC(ctx) and store.AppendLog(). Detailed examples of building history processors, analytics pipelines, and debugging tools will be added in future updates.

Metrics and Monitoring

Currently, metrics collection is the responsibility of the user. If you need Prometheus metrics or other monitoring, include them in your dependency interface and use them within your Transition function:

type Dependencies interface {
    // Your business dependencies
    StockService() StockService

    // Metrics dependencies - user's responsibility to provide
    Metrics() *prometheus.Registry
    TransitionCounter() prometheus.Counter
}

func Transition(ctx context.Context, deps Dependencies, cmd Command, state State) (State, error) {
    // Manual metrics collection
    startTime := time.Now()
    defer func() {
        deps.TransitionCounter().Inc()
        // Record duration, state types, etc.
    }()

    // Your transition logic here
}

There's no automatic metrics injection - you must explicitly add metrics to your dependencies and instrument your transitions manually.

Future Enhancement

Automatic metrics collection would be a valuable addition to machine.Machine. This could include built-in counters for transitions, error rates, and timing histograms without requiring manual instrumentation.

Evolution and Versioning

Backward Compatible Changes

When evolving state machines, maintain compatibility:

// Version 1
//go:tag mkunion:"OrderState"
type (
    OrderCreated struct {
        ID    string
        Items []Item
    }
)

// Version 2 - Added field with default
//go:tag mkunion:"OrderState"
type (
    OrderCreated struct {
        ID       string
        Items    []Item
        Discount float64 `json:"discount,omitempty"` // New field
    }
)

State Migration Strategies

Handle state structure changes:

// Migration function
func MigrateOrderState(old []byte) (State, error) {
    // Try to unmarshal as current version
    current, err := shared.JSONUnmarshal[OrderState](old)
    if err == nil {
        return current, nil
    }

    // Try older version
    v1, err := shared.JSONUnmarshal[OrderStateV1](old)
    if err == nil {
        // Convert v1 to current
        return convertV1ToCurrent(v1), nil
    }

    return nil, fmt.Errorf("unknown state version")
}

Deprecating States and Commands

Gracefully phase out old states:

//go:tag mkunion:"OrderState"
type (
    // Deprecated: Use OrderPending instead
    OrderCreated struct {
        // ... fields
    }

    OrderPending struct {
        // New state structure
    }
)

func Transition(ctx context.Context, deps Dependencies, cmd Command, state State) (State, error) {
    // Handle deprecated state
    if old, ok := state.(*OrderCreated); ok {
        // Automatically migrate to new state
        state = &OrderPending{
            // Map old fields to new
        }
    }

    // Continue with normal processing
    // ...
}

Performance Considerations

Memory Optimization

Reuse State Instances: For states without data, use singletons

var (
    pendingState = &Pending{}
    activeState  = &Active{}
)

Lazy Loading: Don't load unnecessary data in states

type OrderDetails struct {
    ID       string
    // Don't embed full customer, just reference
    CustomerID string `json:"customer_id"`
}

Optimistic Concurrency Control

The x/storage/schemaless package provides built-in optimistic concurrency control using version fields. This ensures data consistency when multiple processes work with the same state.

example/state/machine_test.go

    storage := schemaless.NewInMemoryRepository[State]()

    // 1. Load current state from storage
    records, err := storage.FindingRecords(schemaless.FindingRecords[schemaless.Record[State]]{
        RecordType: recordType,
        Where: predicate.MustWhere("ID = :id", predicate.ParamBinds{
            ":id": schema.MkString(orderId),
        }, nil),
        Limit: 1,
    })
    assert.NoError(t, err)
    assert.Len(t, records.Items, 0)

    // 2. Create a fresh machine instance with the current state
    var state State
    m := NewMachine(dep, state)

    // 3. Handle the command
    cmd := &CreateOrderCMD{OrderID: "123", Attr: OrderAttr{Price: 100, Quantity: 3}}
    err = m.Handle(ctx, cmd)
    assert.NoError(t, err)

    // 4. Save the new state (with optimistic concurrency control)
    result, err := storage.UpdateRecords(schemaless.Save(schemaless.Record[State]{
        ID:   orderId,
        Type: recordType,
        Data: m.State(),
    }))
    assert.NoError(t, err)
    assert.Len(t, result.Saved, 1)

    if errors.Is(err, schemaless.ErrVersionConflict) {
        // handle error conflicts, usually retry from step 1.
    }

    assert.Equal(t,
        &OrderPending{
            Order: Order{
                ID:        "123",
                OrderAttr: OrderAttr{Price: 100, Quantity: 3},
            },
        }, m.State(),
    )

How It Works:

Each record has a Version field that increments on updates
Updates specify the expected version in the record
If versions don't match, ErrVersionConflict is returned
Applications retry with the latest version