Skip to content

State Machine Best Practices

This guide covers best practices, patterns, and techniques for building robust state machines with mkunion. Whether you're building simple state machines or complex distributed systems, these practices will help you create maintainable and scalable solutions.

Best Practices

When building state machines with mkunion, following these practices will help you create maintainable and robust systems:

File Organization

Organize your state machine code across files for better maintainability:

  1. model.go: State and command definitions with other model types like value objects, etc.

    example/state/model.go
    //
    //go:tag mkunion:"Command"
    type (
     CreateOrderCMD struct {
         OrderID OrderID
         Attr    OrderAttr
     }
     MarkAsProcessingCMD struct {
         OrderID  OrderID
         WorkerID WorkerID
     }
     CancelOrderCMD struct {
         OrderID OrderID
         Reason  string
     }
     MarkOrderCompleteCMD struct {
         OrderID  OrderID
         WorkerID WorkerID
     }
     // TryRecoverErrorCMD is a special command that can be used to recover from error state
     // you can have different "self-healing" rules based on the error code or even return to previous healthy state
     TryRecoverErrorCMD struct {
         OrderID OrderID
     }
    )
    
    //
    //go:tag mkunion:"State"
    type (
     OrderPending struct {
         Order Order
     }
     OrderProcessing struct {
         Order Order
     }
     OrderCompleted struct {
         Order Order
     }
     OrderCancelled struct {
         Order Order
     }
     // OrderError is a special state that represent an error
     // during order processing, you can have different "self-healing jobs" based on the error code
     // like retrying the order, cancel the order, etc.
     //
     // This pattern enables:
     // 1. Perfect reproduction of the failure
     // 2. Automatic retry with the same command
     // 3. Debugging with full context
     // 4. Recovery to previous valid state
     OrderError struct {
         // error information
         Retried   int
         RetriedAt *time.Time
    
         ProblemCode ProblemCode
    
         ProblemCommand Command
         ProblemState   State
     }
    )
    
    type (
     // OrderID Price, Quantity are placeholders for value objects, to ensure better data semantic and type safety
     OrderID  = string
     Price    = float64
     Quantity = int
    
     OrderAttr struct {
         // placeholder for order attributes
         // like customer name, address, etc.
         // like product name, price, etc.
         // for simplicity we only have Price and Quantity
         Price    Price
         Quantity Quantity
     }
    
     // WorkerID represent human that process the order
     WorkerID = string
    
     // Order everything we know about order
     Order struct {
         ID               OrderID
         OrderAttr        OrderAttr
         WorkerID         WorkerID
         StockRemovedAt   *time.Time
         PaymentChargedAt *time.Time
         DeliveredAt      *time.Time
         CancelledAt      *time.Time
         CancelledReason  string
     }
    )
    
    type ProblemCode int
    
    const (
     ProblemWarehouseAPIUnreachable ProblemCode = iota
     ProblemPaymentAPIUnreachable
    )
    

  2. machine.go: Core state machine initialization, and most importantly transition logic:

    example/state/machine.go
    //go:generate moq -with-resets -stub -out machine_mock.go . Dependency
    type Dependency interface {
     TimeNow() *time.Time
     WarehouseRemoveStock(ctx context.Context, quantity Quantity) error
     PaymentCharge(ctx context.Context, price Price) error
    }
    
    func NewMachine(di Dependency, init State) *machine.Machine[Dependency, Command, State] {
     return machine.NewMachine(di, Transition, init)
    }
    
    
    func Transition(ctx context.Context, di Dependency, cmd Command, state State) (State, error) {
     return MatchCommandR2(
         cmd,
         func(x *CreateOrderCMD) (State, error) {
             // 1. Structural validation as simple checks and explicit error type
             if x.OrderID == "" {
                 return nil, ErrOrderIDRequired
             }
    
             switch state.(type) {
             case nil:
                 o := Order{
                     ID:        x.OrderID,
                     OrderAttr: x.Attr,
                 }
                 return &OrderPending{
                     Order: o,
                 }, nil
             }
    
             return nil, ErrOrderAlreadyExist
         },
    // ... and so on
    

Naming Conventions

  1. States: Use descriptive nouns that clearly indicate the state (e.g., OrderPending, PaymentProcessing)
  2. Commands: Suffix with CMD for clarity (e.g., CreateOrderCMD, CancelOrderCMD)
  3. Packages: Keep state machines in dedicated packages named after the domain (e.g., order, payment)

State Design

  1. Keep States Focused: Each state should represent one clear condition
  2. Immutable Data: States should contain immutable data; create new states instead of modifying
  3. Minimal State Data: Only store data that's essential for the state's identity
  4. Use Zero Values: Design states so Go's zero values are meaningful defaults

Command Validation

Centralizing validation in the Transition function provides significant benefits:

  1. Single source of truth: All business rules and validation logic live in one place
  2. Atomic validation: Commands are validated together with state checks, preventing invalid transitions
  3. Testability: Easy to test all validation rules through the state machine tests
  4. Maintainability: When rules change, you only update one location

Basic Validation

example/state/machine.go
        func(x *CreateOrderCMD) (State, error) {
            // 1. Structural validation as simple checks and explicit error type
            if x.OrderID == "" {
                return nil, ErrOrderIDRequired
            }

            switch state.(type) {
            case nil:
                o := Order{
                    ID:        x.OrderID,
                    OrderAttr: x.Attr,
                }
                return &OrderPending{
                    Order: o,
                }, nil
            }

            return nil, ErrOrderAlreadyExist
        },

Advanced Validation with go-validate

For complex validation requirements demonstrate

  • Structural validation is declarative (struct tags)
  • Business rules are explicit and testable
  • External validations are isolated in dependencies
  • State validations ensure valid transitions
  • All validation happens before any state change:
example/state/machine.go
        func(x *MarkOrderCompleteCMD) (State, error) {
            //  1. Structural validation of commands (you could use go-validate library):
            //
            //     if err := di.Validator().Struct(x); err != nil {
            //        return nil, fmt.Errorf("validation failed: %w. %s", err, ErrValidationFailed)
            //     }
            //
            //    or do it manually like in this example:
            if x.OrderID == "" {
                return nil, ErrOrderIDRequired
            }
            if x.WorkerID == "" {
                return nil, ErrWorkerIDRequired
            }

            // 2. Ensure valid transitions
            s, ok := state.(*OrderProcessing)
            if !ok {
                return nil, ErrCannotCompleteNonProcessingOrder
            }

            // 3. Business rule validation:
            //    Worker cannot approve it's own order
            if s.Order.WorkerID == x.WorkerID {
                return nil, ErrWorkerSelfApprove
            }

            // 4. External validation or mutations:
            if s.Order.StockRemovedAt == nil {
                // We need to remove stock first
                // We can retry this operation (assuming warehouse is idempotent, see TryRecoverErrorCMD)
                // OrderID could be used to deduplicate operation
                // it's not required in this example
                err := di.WarehouseRemoveStock(ctx, s.Order.OrderAttr.Quantity)
                if err != nil {
                    return &OrderError{
                        ProblemCode:    ProblemWarehouseAPIUnreachable,
                        ProblemCommand: x,
                        ProblemState:   s,
                    }, nil
                }

                s.Order.StockRemovedAt = di.TimeNow()
            }

            if s.Order.PaymentChargedAt == nil {
                // We need to charge payment first
                // We can retry this operation (assuming payment gateway is idempotent, see TryRecoverErrorCMD))
                // OrderID could be used to deduplicate operation
                // it's not required in this example
                err := di.PaymentCharge(ctx, s.Order.OrderAttr.Price)
                if err != nil {
                    return &OrderError{
                        ProblemCode:    ProblemPaymentAPIUnreachable,
                        ProblemCommand: x,
                        ProblemState:   s,
                    }, nil
                }

                s.Order.PaymentChargedAt = di.TimeNow()
            }

            s.Order.DeliveredAt = di.TimeNow()

            return &OrderCompleted{
                Order: s.Order,
            }, nil
        },

This approach scales well because of the separation of state from IO and business logic.

Dependency Management

  1. Define Clear Interfaces: Dependencies should be interfaces, not concrete types
  2. Keep Dependencies Minimal: Only inject what's absolutely necessary
  3. Generate Mocks with moq: Use //go:generate moq to automatically generate mocks
example/state/machine.go
//go:generate moq -with-resets -stub -out machine_mock.go . Dependency
type Dependency interface {
    TimeNow() *time.Time
    WarehouseRemoveStock(ctx context.Context, quantity Quantity) error
    PaymentCharge(ctx context.Context, price Price) error
}

Running mkunion watch -g ./... creates machine_mock.go with a DependencyMock type. This mock can then be used in tests:

example/state/machine_test.go
func TestSuite(t *testing.T) {
    now := time.Now()
    var di Dependency = &DependencyMock{
        TimeNowFunc: func() *time.Time {
            return &now
        },
    }

// ... and some time later in assertion functions
                                ForkCase(t, "successfully recover", func(t *testing.T, c *machine.Case[Dependency, Command, State]) {
                                    c.
                                        GivenCommand(&TryRecoverErrorCMD{OrderID: "123"}).
                                        BeforeCommand(func(t testing.TB, di Dependency) {
                                            di.(*DependencyMock).ResetCalls()
                                        }).
                                        AfterCommand(func(t testing.TB, di Dependency) {
                                            dep := di.(*DependencyMock)
                                            if assert.Len(t, dep.WarehouseRemoveStockCalls(), 1) {
                                                assert.Equal(t, order.Quantity, dep.WarehouseRemoveStockCalls()[0].Quantity)
                                            }
                                            if assert.Len(t, dep.PaymentChargeCalls(), 1) {
                                                assert.Equal(t, order.Price, dep.PaymentChargeCalls()[0].Price)
                                            }
                                        }).
                                        ThenState(t, &OrderCompleted{
                                            Order: Order{
                                                ID:               "123",
                                                OrderAttr:        order,
                                                WorkerID:         "worker-1",
                                                DeliveredAt:      &now,
                                                StockRemovedAt:   &now,
                                                PaymentChargedAt: &now,
                                            },
                                        })
                                })

Benefits of generating mocks:

  • Reduces boilerplate: No need to manually write mock implementations
  • Type safety: Generated mocks always match the interface
  • Easy maintenance: Mocks automatically update when interface changes
  • Better test readability: Focus on behavior, not mock implementation

Testing Philosophy

When testing state machines, mkunion's test suite enforces an important principle: states can only be created through command sequences. This design philosophy ensures:

  1. Reachability Verification: Every state used in tests is provably reachable through valid command sequences
  2. Self-Documentation: Tests document exactly how to reach each state, serving as executable documentation
  3. Invariant Preservation: Prevents testing impossible states that violate business rules
  4. Realistic Testing: Tests mirror real-world usage patterns

Command-Only State Creation

Instead of allowing direct state instantiation in tests:

// ❌ Not supported - direct state creation
c.InitState = &OrderProcessing{ID: "123", Items: []Item{...}}

Tests must build states through command sequences:

// ✅ Correct - states created through commands
suite.Case(t, "order lifecycle", func(t *testing.T, c *Case[...]) {
    c.GivenCommand(&CreateOrderCMD{...}).
      ThenState(t, &OrderPending{...}).
      ForkCase(t, "process order", func(t *testing.T, c *Case[...]) {
          c.GivenCommand(&ProcessOrderCMD{...}).
            ThenState(t, &OrderProcessing{...})
          // Now we have OrderProcessing state created through valid commands
      })
})

This constraint is intentional and powerful - if you cannot reach a state through commands, it likely shouldn't exist or indicates a missing command in your domain model.

Testing Error States

Error states require special consideration. If an error state seems unreachable through normal commands, consider: - Is this error state actually possible in production? - Should this be modeled as an explicit error state rather than just an error return?

Benefits of This Approach

  1. Prevents Invalid Test Scenarios: You can't accidentally test states that are impossible to reach in production
  2. Forces Complete Command Design: If you need to test a state, you must provide a way to reach it
  3. Living Documentation: Test cases become a guide for how to use the state machine
  4. Catches Design Issues Early: Unreachable states are identified during test writing

State Machine Composition

For complex systems, compose multiple state machines as a service layer:

type OrderService struct {
    repo schemaless.Repository[State]
    deps Dependency
}
type ECommerceService struct {
    orderService   *OrderService
    paymentService *PaymentService
}

func NewECommerceService(orderSvc *OrderService, paymentSvc *PaymentService) *ECommerceService {
    return &ECommerceService{
        orderService:   orderSvc,
        paymentService: paymentSvc,
    }
}

func (s *ECommerceService) ProcessOrder(ctx context.Context, orderCmd Command) error {
    // 1. Handle order command through order service
    newOrderState, err := s.orderService.HandleCommand(ctx, orderCmd)
    if err != nil {
        return fmt.Errorf("order processing failed: %w", err)
    }

    // 2. If order is confirmed, trigger payment through payment service
    if processing, ok := newOrderState.(*OrderProcessing); ok {
        paymentCmd := &InitiatePaymentCMD{
            OrderID: processing.Order.ID,
            Amount:  processing.Order.OrderAttr.Price,
        }
        _, err := s.paymentService.HandleCommand(ctx, paymentCmd)
        if err != nil {
            return fmt.Errorf("payment initiation failed: %w", err)
        }
    }

    return nil
}

Key principles:

  • Domain services: Each domain encapsulates its repository, dependencies, and machine logic
  • Schemaless repositories: Use schemaless.Repository[StateType] for type-safe state storage
  • Service composition: Compose domain services, avoiding direct repository/machine access
  • Single responsibility: Each service handles one domain's state machine lifecycle
  • Optimistic concurrency: Built-in through schemaless.Repository version handling
  • No duplication: State loading, machine creation, and saving logic exists once per domain

Common Pitfalls

Avoid these common mistakes when implementing state machines:

1. State Explosion

Problem: Creating too many states for every minor variation

// Bad: Too granular
type (
    OrderPendingWithOneItem struct{}
    OrderPendingWithTwoItems struct{}
    OrderPendingWithThreeItems struct{}
    // ... and so on
)

Solution: Use state data instead

// Good: Single state with data
type OrderPending struct {
    Items []OrderItem
}

2. Circular Dependencies

Problem: States that can transition in circles without progress

// Problematic: A -> B -> C -> A without any business value

Solution: Ensure each transition represents meaningful progress or explicitly document allowed cycles

3. Missing Error States

Problem: Not modeling error conditions as explicit states

// Bad: Errors only in transition function
return nil, fmt.Errorf("payment failed")

Solution: Model error conditions as states when they need handling. Crucially, store both the command that failed and the previous valid state to enable recovery or debugging:

example/state/model.go
//
//go:tag mkunion:"State"
type (
    OrderPending struct {
        Order Order
    }
    OrderProcessing struct {
        Order Order
    }
    OrderCompleted struct {
        Order Order
    }
    OrderCancelled struct {
        Order Order
    }
    // OrderError is a special state that represent an error
    // during order processing, you can have different "self-healing jobs" based on the error code
    // like retrying the order, cancel the order, etc.
    //
    // This pattern enables:
    // 1. Perfect reproduction of the failure
    // 2. Automatic retry with the same command
    // 3. Debugging with full context
    // 4. Recovery to previous valid state
    OrderError struct {
        // error information
        Retried   int
        RetriedAt *time.Time

        ProblemCode ProblemCode

        ProblemCommand Command
        ProblemState   State
    }
)

The error state pattern enables recovery:

example/state/machine.go
        func(x *TryRecoverErrorCMD) (State, error) {
            if x.OrderID == "" {
                return nil, ErrOrderIDRequired
            }

            switch s := state.(type) {
            case *OrderError:
                s.Retried += 1
                s.RetriedAt = di.TimeNow()

                switch s.ProblemCode {
                case ProblemWarehouseAPIUnreachable,
                    ProblemPaymentAPIUnreachable:
                    // we can retry this operation
                    newState, err := Transition(ctx, di, s.ProblemCommand, s.ProblemState)
                    if err != nil {
                        return s, err
                    }

                    // make sure that error retries are preserved
                    if es, ok := newState.(*OrderError); ok {
                        es.Retried = s.Retried
                        es.RetriedAt = s.RetriedAt
                        return es, nil
                    }

                    return newState, nil

                default:
                    // we don't know what to do, return to previous state
                    return s, nil
                }
            }

            return nil, ErrCannotRecoverNonErrorState
        },

This approach preserves critical information needed for recovery without losing the context of what failed (look at Transition(ctx, di, s.ProblemCommand, s.ProblemState))

4. Ignoring Concurrency

Problem: Misunderstanding the state machine concurrency model

// Wrong: Sharing a machine instance across goroutines
sharedMachine := NewMachine(deps, currentState)
go sharedMachine.Handle(ctx, cmd1) // Goroutine 1
go sharedMachine.Handle(ctx, cmd2) // Goroutine 2 - DON'T DO THIS!

Solution: For handling concurrent updates to the same entity, see the Optimistic Concurrency Control section below.

5. Overloading Transitions

Problem: Putting too much business logic in transition functions

// Bad: Transition function doing too much
func Transition(...) (State, error) {
    // Send emails
    // Update inventory
    // Calculate prices
    // Log to external systems
    // ... 200 lines later
}

Solution: Keep transitions focused on state changes; delegate side effects to dependencies

Debugging and Observability

State History Tracking

The mkunion state machine pattern leverages Change Data Capture (CDC) for automatic state history tracking. Since every state transition is persisted with versioning through optimistic concurrency control, you get a complete audit trail without modifying your state machine logic.

The schemaless.Repository creates an append log of all state changes with version numbers, providing ordering guarantees and enabling powerful history tracking capabilities. CDC processors consume this stream asynchronously to build history aggregates, analytics, and debugging tools - all without impacting state machine performance. The system automatically handles failures through persistent, replayable streams that survive crashes and allow processors to resume from their last position.

This approach integrates seamlessly with other mkunion patterns like retry processors and timeout handlers, creating a unified system where every state change is tracked, queryable, and analyzable.

Real Implementation

The example app demonstrates CDC integration with taskRetry.RunCDC(ctx) and store.AppendLog(). Detailed examples of building history processors, analytics pipelines, and debugging tools will be added in future updates.

Metrics and Monitoring

Currently, metrics collection is the responsibility of the user. If you need Prometheus metrics or other monitoring, include them in your dependency interface and use them within your Transition function:

type Dependencies interface {
    // Your business dependencies
    StockService() StockService

    // Metrics dependencies - user's responsibility to provide
    Metrics() *prometheus.Registry
    TransitionCounter() prometheus.Counter
}

func Transition(ctx context.Context, deps Dependencies, cmd Command, state State) (State, error) {
    // Manual metrics collection
    startTime := time.Now()
    defer func() {
        deps.TransitionCounter().Inc()
        // Record duration, state types, etc.
    }()

    // Your transition logic here
}

There's no automatic metrics injection - you must explicitly add metrics to your dependencies and instrument your transitions manually.

Future Enhancement

Automatic metrics collection would be a valuable addition to machine.Machine. This could include built-in counters for transitions, error rates, and timing histograms without requiring manual instrumentation.

Evolution and Versioning

Backward Compatible Changes

When evolving state machines, maintain compatibility:

// Version 1
//go:tag mkunion:"OrderState"
type (
    OrderCreated struct {
        ID    string
        Items []Item
    }
)

// Version 2 - Added field with default
//go:tag mkunion:"OrderState"
type (
    OrderCreated struct {
        ID       string
        Items    []Item
        Discount float64 `json:"discount,omitempty"` // New field
    }
)

State Migration Strategies

Handle state structure changes:

// Migration function
func MigrateOrderState(old []byte) (State, error) {
    // Try to unmarshal as current version
    current, err := shared.JSONUnmarshal[OrderState](old)
    if err == nil {
        return current, nil
    }

    // Try older version
    v1, err := shared.JSONUnmarshal[OrderStateV1](old)
    if err == nil {
        // Convert v1 to current
        return convertV1ToCurrent(v1), nil
    }

    return nil, fmt.Errorf("unknown state version")
}

Deprecating States and Commands

Gracefully phase out old states:

//go:tag mkunion:"OrderState"
type (
    // Deprecated: Use OrderPending instead
    OrderCreated struct {
        // ... fields
    }

    OrderPending struct {
        // New state structure
    }
)

func Transition(ctx context.Context, deps Dependencies, cmd Command, state State) (State, error) {
    // Handle deprecated state
    if old, ok := state.(*OrderCreated); ok {
        // Automatically migrate to new state
        state = &OrderPending{
            // Map old fields to new
        }
    }

    // Continue with normal processing
    // ...
}

Performance Considerations

Memory Optimization

  1. Reuse State Instances: For states without data, use singletons

    var (
        pendingState = &Pending{}
        activeState  = &Active{}
    )
    

  2. Lazy Loading: Don't load unnecessary data in states

    type OrderDetails struct {
        ID       string
        // Don't embed full customer, just reference
        CustomerID string `json:"customer_id"`
    }
    

Optimistic Concurrency Control

The x/storage/schemaless package provides built-in optimistic concurrency control using version fields. This ensures data consistency when multiple processes work with the same state.

example/state/machine_test.go
    storage := schemaless.NewInMemoryRepository[State]()

    // 1. Load current state from storage
    records, err := storage.FindingRecords(schemaless.FindingRecords[schemaless.Record[State]]{
        RecordType: recordType,
        Where: predicate.MustWhere("ID = :id", predicate.ParamBinds{
            ":id": schema.MkString(orderId),
        }, nil),
        Limit: 1,
    })
    assert.NoError(t, err)
    assert.Len(t, records.Items, 0)

    // 2. Create a fresh machine instance with the current state
    var state State
    m := NewMachine(dep, state)

    // 3. Handle the command
    cmd := &CreateOrderCMD{OrderID: "123", Attr: OrderAttr{Price: 100, Quantity: 3}}
    err = m.Handle(ctx, cmd)
    assert.NoError(t, err)

    // 4. Save the new state (with optimistic concurrency control)
    result, err := storage.UpdateRecords(schemaless.Save(schemaless.Record[State]{
        ID:   orderId,
        Type: recordType,
        Data: m.State(),
    }))
    assert.NoError(t, err)
    assert.Len(t, result.Saved, 1)

    if errors.Is(err, schemaless.ErrVersionConflict) {
        // handle error conflicts, usually retry from step 1.
    }

    assert.Equal(t,
        &OrderPending{
            Order: Order{
                ID:        "123",
                OrderAttr: OrderAttr{Price: 100, Quantity: 3},
            },
        }, m.State(),
    )

How It Works:

  1. Each record has a Version field that increments on updates
  2. Updates specify the expected version in the record
  3. If versions don't match, ErrVersionConflict is returned
  4. Applications retry with the latest version