State Machine Best Practices
This guide covers best practices, patterns, and techniques for building robust state machines with mkunion. Whether you're building simple state machines or complex distributed systems, these practices will help you create maintainable and scalable solutions.
Best Practices
When building state machines with mkunion, following these practices will help you create maintainable and robust systems:
File Organization
Organize your state machine code across files for better maintainability:
-
model.go
: State and command definitions with other model types like value objects, etc.example/state/model.go// //go:tag mkunion:"Command" type ( CreateOrderCMD struct { OrderID OrderID Attr OrderAttr } MarkAsProcessingCMD struct { OrderID OrderID WorkerID WorkerID } CancelOrderCMD struct { OrderID OrderID Reason string } MarkOrderCompleteCMD struct { OrderID OrderID WorkerID WorkerID } // TryRecoverErrorCMD is a special command that can be used to recover from error state // you can have different "self-healing" rules based on the error code or even return to previous healthy state TryRecoverErrorCMD struct { OrderID OrderID } ) // //go:tag mkunion:"State" type ( OrderPending struct { Order Order } OrderProcessing struct { Order Order } OrderCompleted struct { Order Order } OrderCancelled struct { Order Order } // OrderError is a special state that represent an error // during order processing, you can have different "self-healing jobs" based on the error code // like retrying the order, cancel the order, etc. // // This pattern enables: // 1. Perfect reproduction of the failure // 2. Automatic retry with the same command // 3. Debugging with full context // 4. Recovery to previous valid state OrderError struct { // error information Retried int RetriedAt *time.Time ProblemCode ProblemCode ProblemCommand Command ProblemState State } ) type ( // OrderID Price, Quantity are placeholders for value objects, to ensure better data semantic and type safety OrderID = string Price = float64 Quantity = int OrderAttr struct { // placeholder for order attributes // like customer name, address, etc. // like product name, price, etc. // for simplicity we only have Price and Quantity Price Price Quantity Quantity } // WorkerID represent human that process the order WorkerID = string // Order everything we know about order Order struct { ID OrderID OrderAttr OrderAttr WorkerID WorkerID StockRemovedAt *time.Time PaymentChargedAt *time.Time DeliveredAt *time.Time CancelledAt *time.Time CancelledReason string } ) type ProblemCode int const ( ProblemWarehouseAPIUnreachable ProblemCode = iota ProblemPaymentAPIUnreachable )
-
machine.go
: Core state machine initialization, and most importantly transition logic:example/state/machine.go//go:generate moq -with-resets -stub -out machine_mock.go . Dependency type Dependency interface { TimeNow() *time.Time WarehouseRemoveStock(ctx context.Context, quantity Quantity) error PaymentCharge(ctx context.Context, price Price) error } func NewMachine(di Dependency, init State) *machine.Machine[Dependency, Command, State] { return machine.NewMachine(di, Transition, init) } func Transition(ctx context.Context, di Dependency, cmd Command, state State) (State, error) { return MatchCommandR2( cmd, func(x *CreateOrderCMD) (State, error) { // 1. Structural validation as simple checks and explicit error type if x.OrderID == "" { return nil, ErrOrderIDRequired } switch state.(type) { case nil: o := Order{ ID: x.OrderID, OrderAttr: x.Attr, } return &OrderPending{ Order: o, }, nil } return nil, ErrOrderAlreadyExist }, // ... and so on
Naming Conventions
- States: Use descriptive nouns that clearly indicate the state (e.g.,
OrderPending
,PaymentProcessing
) - Commands: Suffix with
CMD
for clarity (e.g.,CreateOrderCMD
,CancelOrderCMD
) - Packages: Keep state machines in dedicated packages named after the domain (e.g.,
order
,payment
)
State Design
- Keep States Focused: Each state should represent one clear condition
- Immutable Data: States should contain immutable data; create new states instead of modifying
- Minimal State Data: Only store data that's essential for the state's identity
- Use Zero Values: Design states so Go's zero values are meaningful defaults
Command Validation
Centralizing validation in the Transition function provides significant benefits:
- Single source of truth: All business rules and validation logic live in one place
- Atomic validation: Commands are validated together with state checks, preventing invalid transitions
- Testability: Easy to test all validation rules through the state machine tests
- Maintainability: When rules change, you only update one location
Basic Validation
func(x *CreateOrderCMD) (State, error) {
// 1. Structural validation as simple checks and explicit error type
if x.OrderID == "" {
return nil, ErrOrderIDRequired
}
switch state.(type) {
case nil:
o := Order{
ID: x.OrderID,
OrderAttr: x.Attr,
}
return &OrderPending{
Order: o,
}, nil
}
return nil, ErrOrderAlreadyExist
},
Advanced Validation with go-validate
For complex validation requirements demonstrate
- Structural validation is declarative (struct tags)
- Business rules are explicit and testable
- External validations are isolated in dependencies
- State validations ensure valid transitions
- All validation happens before any state change:
func(x *MarkOrderCompleteCMD) (State, error) {
// 1. Structural validation of commands (you could use go-validate library):
//
// if err := di.Validator().Struct(x); err != nil {
// return nil, fmt.Errorf("validation failed: %w. %s", err, ErrValidationFailed)
// }
//
// or do it manually like in this example:
if x.OrderID == "" {
return nil, ErrOrderIDRequired
}
if x.WorkerID == "" {
return nil, ErrWorkerIDRequired
}
// 2. Ensure valid transitions
s, ok := state.(*OrderProcessing)
if !ok {
return nil, ErrCannotCompleteNonProcessingOrder
}
// 3. Business rule validation:
// Worker cannot approve it's own order
if s.Order.WorkerID == x.WorkerID {
return nil, ErrWorkerSelfApprove
}
// 4. External validation or mutations:
if s.Order.StockRemovedAt == nil {
// We need to remove stock first
// We can retry this operation (assuming warehouse is idempotent, see TryRecoverErrorCMD)
// OrderID could be used to deduplicate operation
// it's not required in this example
err := di.WarehouseRemoveStock(ctx, s.Order.OrderAttr.Quantity)
if err != nil {
return &OrderError{
ProblemCode: ProblemWarehouseAPIUnreachable,
ProblemCommand: x,
ProblemState: s,
}, nil
}
s.Order.StockRemovedAt = di.TimeNow()
}
if s.Order.PaymentChargedAt == nil {
// We need to charge payment first
// We can retry this operation (assuming payment gateway is idempotent, see TryRecoverErrorCMD))
// OrderID could be used to deduplicate operation
// it's not required in this example
err := di.PaymentCharge(ctx, s.Order.OrderAttr.Price)
if err != nil {
return &OrderError{
ProblemCode: ProblemPaymentAPIUnreachable,
ProblemCommand: x,
ProblemState: s,
}, nil
}
s.Order.PaymentChargedAt = di.TimeNow()
}
s.Order.DeliveredAt = di.TimeNow()
return &OrderCompleted{
Order: s.Order,
}, nil
},
This approach scales well because of the separation of state from IO and business logic.
Dependency Management
- Define Clear Interfaces: Dependencies should be interfaces, not concrete types
- Keep Dependencies Minimal: Only inject what's absolutely necessary
- Generate Mocks with moq: Use
//go:generate moq
to automatically generate mocks
//go:generate moq -with-resets -stub -out machine_mock.go . Dependency
type Dependency interface {
TimeNow() *time.Time
WarehouseRemoveStock(ctx context.Context, quantity Quantity) error
PaymentCharge(ctx context.Context, price Price) error
}
Running mkunion watch -g ./...
creates machine_mock.go
with a DependencyMock
type. This mock can then be used in tests:
func TestSuite(t *testing.T) {
now := time.Now()
var di Dependency = &DependencyMock{
TimeNowFunc: func() *time.Time {
return &now
},
}
// ... and some time later in assertion functions
ForkCase(t, "successfully recover", func(t *testing.T, c *machine.Case[Dependency, Command, State]) {
c.
GivenCommand(&TryRecoverErrorCMD{OrderID: "123"}).
BeforeCommand(func(t testing.TB, di Dependency) {
di.(*DependencyMock).ResetCalls()
}).
AfterCommand(func(t testing.TB, di Dependency) {
dep := di.(*DependencyMock)
if assert.Len(t, dep.WarehouseRemoveStockCalls(), 1) {
assert.Equal(t, order.Quantity, dep.WarehouseRemoveStockCalls()[0].Quantity)
}
if assert.Len(t, dep.PaymentChargeCalls(), 1) {
assert.Equal(t, order.Price, dep.PaymentChargeCalls()[0].Price)
}
}).
ThenState(t, &OrderCompleted{
Order: Order{
ID: "123",
OrderAttr: order,
WorkerID: "worker-1",
DeliveredAt: &now,
StockRemovedAt: &now,
PaymentChargedAt: &now,
},
})
})
Benefits of generating mocks:
- Reduces boilerplate: No need to manually write mock implementations
- Type safety: Generated mocks always match the interface
- Easy maintenance: Mocks automatically update when interface changes
- Better test readability: Focus on behavior, not mock implementation
Testing Philosophy
When testing state machines, mkunion's test suite enforces an important principle: states can only be created through command sequences. This design philosophy ensures:
- Reachability Verification: Every state used in tests is provably reachable through valid command sequences
- Self-Documentation: Tests document exactly how to reach each state, serving as executable documentation
- Invariant Preservation: Prevents testing impossible states that violate business rules
- Realistic Testing: Tests mirror real-world usage patterns
Command-Only State Creation
Instead of allowing direct state instantiation in tests:
// ❌ Not supported - direct state creation
c.InitState = &OrderProcessing{ID: "123", Items: []Item{...}}
Tests must build states through command sequences:
// ✅ Correct - states created through commands
suite.Case(t, "order lifecycle", func(t *testing.T, c *Case[...]) {
c.GivenCommand(&CreateOrderCMD{...}).
ThenState(t, &OrderPending{...}).
ForkCase(t, "process order", func(t *testing.T, c *Case[...]) {
c.GivenCommand(&ProcessOrderCMD{...}).
ThenState(t, &OrderProcessing{...})
// Now we have OrderProcessing state created through valid commands
})
})
This constraint is intentional and powerful - if you cannot reach a state through commands, it likely shouldn't exist or indicates a missing command in your domain model.
Testing Error States
Error states require special consideration. If an error state seems unreachable through normal commands, consider: - Is this error state actually possible in production? - Should this be modeled as an explicit error state rather than just an error return?
Benefits of This Approach
- Prevents Invalid Test Scenarios: You can't accidentally test states that are impossible to reach in production
- Forces Complete Command Design: If you need to test a state, you must provide a way to reach it
- Living Documentation: Test cases become a guide for how to use the state machine
- Catches Design Issues Early: Unreachable states are identified during test writing
State Machine Composition
For complex systems, compose multiple state machines as a service layer:
type ECommerceService struct {
orderService *OrderService
paymentService *PaymentService
}
func NewECommerceService(orderSvc *OrderService, paymentSvc *PaymentService) *ECommerceService {
return &ECommerceService{
orderService: orderSvc,
paymentService: paymentSvc,
}
}
func (s *ECommerceService) ProcessOrder(ctx context.Context, orderCmd Command) error {
// 1. Handle order command through order service
newOrderState, err := s.orderService.HandleCommand(ctx, orderCmd)
if err != nil {
return fmt.Errorf("order processing failed: %w", err)
}
// 2. If order is confirmed, trigger payment through payment service
if processing, ok := newOrderState.(*OrderProcessing); ok {
paymentCmd := &InitiatePaymentCMD{
OrderID: processing.Order.ID,
Amount: processing.Order.OrderAttr.Price,
}
_, err := s.paymentService.HandleCommand(ctx, paymentCmd)
if err != nil {
return fmt.Errorf("payment initiation failed: %w", err)
}
}
return nil
}
Key principles:
- Domain services: Each domain encapsulates its repository, dependencies, and machine logic
- Schemaless repositories: Use
schemaless.Repository[StateType]
for type-safe state storage - Service composition: Compose domain services, avoiding direct repository/machine access
- Single responsibility: Each service handles one domain's state machine lifecycle
- Optimistic concurrency: Built-in through
schemaless.Repository
version handling - No duplication: State loading, machine creation, and saving logic exists once per domain
Common Pitfalls
Avoid these common mistakes when implementing state machines:
1. State Explosion
Problem: Creating too many states for every minor variation
// Bad: Too granular
type (
OrderPendingWithOneItem struct{}
OrderPendingWithTwoItems struct{}
OrderPendingWithThreeItems struct{}
// ... and so on
)
Solution: Use state data instead
2. Circular Dependencies
Problem: States that can transition in circles without progress
Solution: Ensure each transition represents meaningful progress or explicitly document allowed cycles
3. Missing Error States
Problem: Not modeling error conditions as explicit states
Solution: Model error conditions as states when they need handling. Crucially, store both the command that failed and the previous valid state to enable recovery or debugging:
//
//go:tag mkunion:"State"
type (
OrderPending struct {
Order Order
}
OrderProcessing struct {
Order Order
}
OrderCompleted struct {
Order Order
}
OrderCancelled struct {
Order Order
}
// OrderError is a special state that represent an error
// during order processing, you can have different "self-healing jobs" based on the error code
// like retrying the order, cancel the order, etc.
//
// This pattern enables:
// 1. Perfect reproduction of the failure
// 2. Automatic retry with the same command
// 3. Debugging with full context
// 4. Recovery to previous valid state
OrderError struct {
// error information
Retried int
RetriedAt *time.Time
ProblemCode ProblemCode
ProblemCommand Command
ProblemState State
}
)
The error state pattern enables recovery:
func(x *TryRecoverErrorCMD) (State, error) {
if x.OrderID == "" {
return nil, ErrOrderIDRequired
}
switch s := state.(type) {
case *OrderError:
s.Retried += 1
s.RetriedAt = di.TimeNow()
switch s.ProblemCode {
case ProblemWarehouseAPIUnreachable,
ProblemPaymentAPIUnreachable:
// we can retry this operation
newState, err := Transition(ctx, di, s.ProblemCommand, s.ProblemState)
if err != nil {
return s, err
}
// make sure that error retries are preserved
if es, ok := newState.(*OrderError); ok {
es.Retried = s.Retried
es.RetriedAt = s.RetriedAt
return es, nil
}
return newState, nil
default:
// we don't know what to do, return to previous state
return s, nil
}
}
return nil, ErrCannotRecoverNonErrorState
},
This approach preserves critical information needed for recovery
without losing the context of what failed (look at Transition(ctx, di, s.ProblemCommand, s.ProblemState)
)
4. Ignoring Concurrency
Problem: Misunderstanding the state machine concurrency model
// Wrong: Sharing a machine instance across goroutines
sharedMachine := NewMachine(deps, currentState)
go sharedMachine.Handle(ctx, cmd1) // Goroutine 1
go sharedMachine.Handle(ctx, cmd2) // Goroutine 2 - DON'T DO THIS!
Solution: For handling concurrent updates to the same entity, see the Optimistic Concurrency Control section below.
5. Overloading Transitions
Problem: Putting too much business logic in transition functions
// Bad: Transition function doing too much
func Transition(...) (State, error) {
// Send emails
// Update inventory
// Calculate prices
// Log to external systems
// ... 200 lines later
}
Solution: Keep transitions focused on state changes; delegate side effects to dependencies
Debugging and Observability
State History Tracking
The mkunion state machine pattern leverages Change Data Capture (CDC) for automatic state history tracking. Since every state transition is persisted with versioning through optimistic concurrency control, you get a complete audit trail without modifying your state machine logic.
The schemaless.Repository
creates an append log of all state changes with version numbers, providing ordering guarantees and enabling powerful history tracking capabilities. CDC processors consume this stream asynchronously to build history aggregates, analytics, and debugging tools - all without impacting state machine performance. The system automatically handles failures through persistent, replayable streams that survive crashes and allow processors to resume from their last position.
This approach integrates seamlessly with other mkunion patterns like retry processors and timeout handlers, creating a unified system where every state change is tracked, queryable, and analyzable.
Real Implementation
The example app demonstrates CDC integration with taskRetry.RunCDC(ctx)
and store.AppendLog()
. Detailed examples of building history processors, analytics pipelines, and debugging tools will be added in future updates.
Metrics and Monitoring
Currently, metrics collection is the responsibility of the user. If you need Prometheus metrics or other monitoring, include them in your dependency interface and use them within your Transition
function:
type Dependencies interface {
// Your business dependencies
StockService() StockService
// Metrics dependencies - user's responsibility to provide
Metrics() *prometheus.Registry
TransitionCounter() prometheus.Counter
}
func Transition(ctx context.Context, deps Dependencies, cmd Command, state State) (State, error) {
// Manual metrics collection
startTime := time.Now()
defer func() {
deps.TransitionCounter().Inc()
// Record duration, state types, etc.
}()
// Your transition logic here
}
There's no automatic metrics injection - you must explicitly add metrics to your dependencies and instrument your transitions manually.
Future Enhancement
Automatic metrics collection would be a valuable addition to machine.Machine
. This could include built-in counters for transitions, error rates, and timing histograms without requiring manual instrumentation.
Evolution and Versioning
Backward Compatible Changes
When evolving state machines, maintain compatibility:
// Version 1
//go:tag mkunion:"OrderState"
type (
OrderCreated struct {
ID string
Items []Item
}
)
// Version 2 - Added field with default
//go:tag mkunion:"OrderState"
type (
OrderCreated struct {
ID string
Items []Item
Discount float64 `json:"discount,omitempty"` // New field
}
)
State Migration Strategies
Handle state structure changes:
// Migration function
func MigrateOrderState(old []byte) (State, error) {
// Try to unmarshal as current version
current, err := shared.JSONUnmarshal[OrderState](old)
if err == nil {
return current, nil
}
// Try older version
v1, err := shared.JSONUnmarshal[OrderStateV1](old)
if err == nil {
// Convert v1 to current
return convertV1ToCurrent(v1), nil
}
return nil, fmt.Errorf("unknown state version")
}
Deprecating States and Commands
Gracefully phase out old states:
//go:tag mkunion:"OrderState"
type (
// Deprecated: Use OrderPending instead
OrderCreated struct {
// ... fields
}
OrderPending struct {
// New state structure
}
)
func Transition(ctx context.Context, deps Dependencies, cmd Command, state State) (State, error) {
// Handle deprecated state
if old, ok := state.(*OrderCreated); ok {
// Automatically migrate to new state
state = &OrderPending{
// Map old fields to new
}
}
// Continue with normal processing
// ...
}
Performance Considerations
Memory Optimization
-
Reuse State Instances: For states without data, use singletons
-
Lazy Loading: Don't load unnecessary data in states
Optimistic Concurrency Control
The x/storage/schemaless
package provides built-in optimistic concurrency control using version fields. This ensures data consistency when multiple processes work with the same state.
storage := schemaless.NewInMemoryRepository[State]()
// 1. Load current state from storage
records, err := storage.FindingRecords(schemaless.FindingRecords[schemaless.Record[State]]{
RecordType: recordType,
Where: predicate.MustWhere("ID = :id", predicate.ParamBinds{
":id": schema.MkString(orderId),
}, nil),
Limit: 1,
})
assert.NoError(t, err)
assert.Len(t, records.Items, 0)
// 2. Create a fresh machine instance with the current state
var state State
m := NewMachine(dep, state)
// 3. Handle the command
cmd := &CreateOrderCMD{OrderID: "123", Attr: OrderAttr{Price: 100, Quantity: 3}}
err = m.Handle(ctx, cmd)
assert.NoError(t, err)
// 4. Save the new state (with optimistic concurrency control)
result, err := storage.UpdateRecords(schemaless.Save(schemaless.Record[State]{
ID: orderId,
Type: recordType,
Data: m.State(),
}))
assert.NoError(t, err)
assert.Len(t, result.Saved, 1)
if errors.Is(err, schemaless.ErrVersionConflict) {
// handle error conflicts, usually retry from step 1.
}
assert.Equal(t,
&OrderPending{
Order: Order{
ID: "123",
OrderAttr: OrderAttr{Price: 100, Quantity: 3},
},
}, m.State(),
)
How It Works:
- Each record has a
Version
field that increments on updates - Updates specify the expected version in the record
- If versions don't match,
ErrVersionConflict
is returned - Applications retry with the latest version