'Distributed' circuit breakers: co-ordinate or not?
Oliver Kamps
2 posts
|
Hello everybody, Assume a redundantly deployed component (e.g. the same EJB deployed to a number of Java EE server instances) has an integration point to an external system and I’d want to make this more robust using a Circuit Breaker. Would you have the different instances of the Circuit Breaker communicate with each other (i.e. if one flips open, they all do)? Or would this be overkill as they’d probably all flip open within a short period of time anyway if there’s a systemic problem? Thanks a lot, Oliver |
Michael Nygard
5 posts
|
Oliver, The prospect of coordinating circuit breakers makes e very nervous, for a couple of reasons. Coordinating across the circuit breakers would either be done via point-to-point connections, which suggests a scalability problem, or via broadcast messages. (Broadcasts would not present a scalability problem.) I see a second, slightly more subtle problem. Suppose a farm of 10 servers is calling a back end system. Due to an external event, maybe load-related, maybe human-error, whatever, disrupts the back end. If the first circuit breaker causes all 10 to stop making calls, then when the time limit passes, all 10 will start making calls at the same time. This can cause a surge in load on the newly restored back end system, exactly the same as when the power company restores service after a summertime blackout, only to have thousands of air conditioners turn on immediately. The surge in demand can destabilize the back end. Finally, coordinating actions across the entire tier will tend to make the whole cluster react. If you assume that the circuit breaker trips only because of a problem with the back end, then communication between the circuit breakers would save a little bit of time. On the other hand, I have seen problems affect one host, disrupting its connection to the back end, thus causing circuit breakers to trip. If the problem is confined to that one host, you would not want the other circuit breakers to also flip open. (Imagine, for example, a failed NIC on that host. It still makes sense to protect the application by aborting the calls, even though it’s not strictly a problem with the back end system.) So, in summary, I would choose not to have the circuit breakers communicate their status with each other. Cheers, |
Oliver Kamps
2 posts
|
Hi Mike, Thanks for sharing your thoughts so quickly on this. Your reasoning makes a lot of sense to me; I was leaning in the same direction but hadn’t thought things through that far. Cheers, |
3 posts, 2 voices
