In the previous post, we saw some of the challenges of implementing distributed transactions and how to implement Saga’s pattern using the Event/Choreography approach. In this article, let’s talk about how to address some of its problems like complex transactions or cyclic dependencies of events by using another type of Saga’s implementation called Command or Orchestration.
Saga’s Command/Orchestration Sequencing Logic
In the orchestration approach, we define a new service with the sole responsibility of telling each participant what to do and when. The saga pattern orchestrator communicates with each service in a command/reply style telling them what operation should be performed.
Let’s see how it looks like using our previous e-commerce example:
- Order Service saves a pending order and asks Order Saga Orchestrator (OSO) to start a create order transaction.
- OSO sends an Execute Payment command to Payment Service, and it replies with a Payment Executed message
- OSO sends a Prepare Order command to Stock Service, and it replies with an Order Prepared message
- OSO sends a Deliver Order command to Delivery Service, and it replies with an Order Delivered message
In the case above, Order Saga Orchestrator knows what is the flow needed to execute a “create order” transaction. If anything fails, it is also responsible for coordinating the rollback by sending commands to each participant to undo the previous operation.
A standard way to model a saga orchestrator is a State Machine where each transformation corresponds to a command or message. State machines are an excellent pattern to structure a well-defined behavior as they are easy to implement and particularly great for testing.
Rolling Back in Saga’s Command/Orchestration
Rollbacks are a lot easier when you have an orchestrator to coordinate everything:
- Stock Service replies to OSO with an Out-Of-Stock message;
- OSO recognizes that the transaction has failed and starts the rollback
- In this case, only a single operation was executed successfully before the failure, so OSO sends a Refund Client command to Payment Service and set the order state as failed
Benefits and Drawbacks of Using Saga’s Command/Orchestration Design
Orchestration-based sagas have a variety of benefits:
- Avoid cyclic dependencies between services, as the saga orchestrator invokes the saga participants but the participants do not invoke the orchestrator
- Centralize the orchestration of the distributed transaction
- Reduce participants’ complexity as they only need to execute/reply commands.
- Easier to be implemented and tested
- The transaction complexity remains linear when new steps are added
- Rollbacks are easier to manage
- If you have a second transaction willing to change the same target object, you can easily put it on hold on the orchestrator until the first transaction ends.
However, this approach still has some drawbacks, one of them is the risk of concentrating too much logic in the orchestrator and ending up with an architecture where the smart orchestrator tells dumb services what to do.
Another downside of Saga’s Orchestration-based is that it slightly increases your infrastructure complexity as you will need to manage an extra service.
Saga Pattern Tips
Create a Unique Id per Transaction
Having a unique identifier for each transaction is a common technique for traceability, but it also helps participants to have a standard way to request data from each other. By using a transaction Id, for instance, Delivery Service could ask Stock Service where to pick up the products and double check with the Payment Service if the order was paid.
Add the Reply Address Within the Command
Instead of designing your participants to reply to a fixed address, consider sending the reply address within the message, this way you enable your participants to reply to multiple orchestrators.
Idempotent Operations
If you are using queues for communication between services (like SQS, Kafka, RabbitMQ, etc.), I personally recommended you make your operations Idempotent. Most of those queues might deliver the same message twice.
It also might increase the fault tolerance of your service. Quite often a bug in a client might trigger/replay unwanted messages and mess up with your database.
Avoiding Synchronous Communications
As the transaction goes, don’t forget to add into the message all the data needed for each operation to be executed. The whole goal is to avoid synchronous calls between the services just to request more data. It will enable your services to execute their local transactions even when other services are offline.
The downside is that your orchestrator will be slightly more complex as you will need to manipulate the requests/responses of each step, so be aware of the tradeoffs.
If you have any questions, feel free to ask me at @deniswsrosa
Very good!
Is there an implementation?
Here’s an alternative to a state machine as implementation: https://github.com/bertilmuth/requirementsascode. I’m glad if you let me know what you think.
Hello, in your visualization you used Message Broker and channels. So it looks like Sage Publish to broker channel. And for example Payment Subscriber do that channel and receive that message. After that it should send result to another channel for Saga. So it’s PUB/SUB MQ. So if Payment service is offline, Saga should proivde some logic with timeout for response. So why we not use for that case REQ/REP (Request/Response MQ model), or direct request to Payment service? So we will immediately to know response and service availability?