Wednesday 25 May 2016

Coupling in Distributed Systems






by Sarang Nagmote



Category - Cloud Computing
More Information & Updates Available at: http://vibranttechnologies.co.in




Coupling and cohesion are key quality indicators. We strive for highly cohesive and loosely coupled systems, but high doesnt mean pure. The same goes with functional programming, we aim for isolating and reducing side effects, but we need them unless we want a useless system. Its good to modularise our systems, so whenever those modules need to talk to each other theyll effectively couple themselves. Our work is to create cohesive modules and minimize coupling as much as possible.
Lets provide an example. Our system has the following structure:
  1. Different deployables, aka, microservices architecture.
  2. Intracommunication through Kafka (pub-sub messaging). No HTTP involved.
  3. 1 Producer to N Consumers scenarios.
  4. JSON for data serialization.
The messages that are published and consumed in this system have a schema and its our choice making it implicit or explicit and validating that schema at compile or runtime execution. Before analysing the trade-offs of every approach lets say some words about compile vs runtime approaches.

Proofing Software Correctness as Soon as Possible

Ive been a user of statically typed languages most of my career, so Im really biased with this topic. I strongly believe in Lean concepts as the importance of minimizing waste. At the same time, I love the therapeutic ideas behind Agile, TDD, or BDD about exposing the truth as soon as possible. Static types, and in the end the compiler, help me to achieve those goals.
I would prefer spending my time creating tests under the motivations of providing living documentation, easing future refactors, or helping me to drive the design, more than helping to catch bugs that the type system should take care of. Writing a test that checks the behaviour of a method when receiving null is a waste of time if we can make it impossible to write a line of code that passes a null.
The compile world is not perfect though, as its definitively slower on development and constrains developers (someone could say that less freedom might be a nice-to-have in this context).

Runtime Approaches

Now that Ive been honest with you about my compile bias, I can explain different approaches and trade-offs for the schema validation problem.

Implicit Schemas

The first runtime approach is the loosest one: using implicit schemas and trusting in the good will of producers. As nobody is checking the validity of messages before being published into Kafka that means that consumers could blow up.
The first corrective measure is assuring that only the processing of the poisoned message will blow and not the whole consumer. An example of that would be providing a resume supervision strategy on Akka Streams when the message doesnt hold the expected implicit schema.
The second corrective measure would be not simply swallowing those crashes but being able to communicate them to proper actors (being humans or software). A good practice is to provide dead letter queues for poisoned messages just in case we want to manipulate and retry the processing of those messages at that level.
Before getting into explicit schemas, I would say that those measures are usually not enough but they are a good safety net, as shit happens, and we need to be prepared.

Explicit Schemas

If we want to avoid poisoned messages getting into our topics we could provide a middle-man service to intercept and validate explicit schemas. Schema registry is an example of that for Kafka, and its documentation is full of insights about how to implement that in a distributed, highly available, and scalable way.
Thats an integration service that could be a single point of failure but, at the same time, it could be valuable to have a centralized repo of schemas when we have a lot of consumers and the complexity of the system would be hard to grasp in a decentralized fashion. That service will be stateless, so in order to avoid single point of failures, we could make it redundant in a farm of services to allow high availability.

Compile Approaches

The last approach would be creating software that makes it impossible to create messages that do not hold the expected schema. Assuming Scala, we could create a jar that contains case classes that are object materializations of a schema.
What are the benefits of this approach?
  1. Fail early. We dont have to wait until testing or production to verify that the messages published by some producer are correct.
  2. Centralized knowledge.
What is the solvable problem?
  • Cascade updates. If our microservices live in different repos, then we need to make sure that updates into that common binary are applied into producer and consumers. Thats cumbersome and if its not done could generate unexpected bugs as we introduced a false sense of security with that library. That could be solved using a monorepo.
What is the biggest problem?
  • Breaking isolation of deployables. One of the points of microservices is being able to deploy its services independently. If youre forced to redeploy N consumer services every time you upgrade the consumer with a non-backward compatible change of the schema library then youre losing that perk. Being able to do small releases is a big enabler of Continuous Delivery, so its a remarkable loss.
You could argue that only non-backward compatible changes force a redeploy of consumers and that we should design our schemas in a way that predicts and minimizes those kinds of changes.

Generalizing Coupling Problem

If we generalize the problem, well see that there are two kinds of coupling: avoidable and mandatory.
Avoidable coupling comes when we strive for reducing duplication in our codebase. Lets say that we want to extract some requestId from the header of a HTTP request and put it into some MDC in order to be able to trace logs across different threads or services. That code will hardly vary from service to service so its a good candidate to be extracted and therefore adding some coupling between services. Before doing that, its good to think about the following:
  1. Coupling is the enemy of microservices and its effects in the future are not easily visible.
  2. Following Conways law, breaking isolation of your services breaks the isolation of your teams, so be sure that your organization is able to cope with that level of communication and integration.
  3. The key measure is the rate change. A library that is going to be constantly updated (as could be your schema library) will be more painful to manage as a common dependency than some fairly static library.
Mandatory coupling comes when some info needs to reside in a third entity as it doesnt make sense to be held by one of the integration entities or its not worthy to share and duplicate that info into every single entity.

Conclusion

Even if I am a strong supporter of compiled languages, I think that sharing code through binaries in a distributed environment deserves a deep analysis of the structure and needs of your system. I hope that this post has provided some valuable insights into this topic.

No comments:

Post a Comment