This page describes how cluster discovery and refresh will work.
At this stage, documenting the desired solution is essential for the project to gradually and incrementally reach this design without having to stop and rethink or re-adapt everything every few months.
At the essence of Replicante Core is a reactive engine: when a state change is detected actions are taken to return the system to a desired state.
The idea of orchestration built on a reactive engine is not new:
Additionally traking state changes can tell us what is happening to our system and what we need to change as well as what our actions on the system lead to.
The cluster state refresh process continuously evaluates the state of clusters so decisions can be taken, progress tracked and (re)actions taken.
The cluster discovery process aims to keep administration and management overheads at a minimum and take advantage of the highly dynamic platforms and tools available today (“the cloud”).
So how do cluster discovery and refresh work?
discovery
component periodically runs at fixed intervals.
The interval should be short as it determines the delay between
discoveries needing to run and them being scheduled.discovery
run looks for any discoveries with an expected next schedule time in the past.
If no cluster discovery needs to run the discovery
run does nothing.discovery
run schedules a discovery tasks for each discovery that needs to be performed.now() + discovery interval
.refresh
component periodically runs at fixed intervals.
The interval should be short as it determines the delay between cluster
needing to refresh and the refresh being scheduled.refresh
run looks for any cluster with an expected next refresh time in the past.
If no cluster needs to be refreshed the refresh
run does nothing.refresh
run schedules a refresh tasks for each cluster that needs to run.now() + refresh interval
.Because events are generated from differences in observed states, refreshing the state of a node from multiple processes at once may lead to duplicate and/or missing events as well as confused and inconsistent aggregations.
Distributed locks are used to ensure a cluster is refreshed by only one process at a time. Any cluster refresh operation attempted while another operation is already running will be discarded.
When clusters and nodes are automatically discovered they can also automatically go away.
This feature is not currently available and is yet to be designed.
The newly fetched agents information is aggregated to generate an approximate cluster view. This cluster view is compared to the last known view to generate events describing changes in the views of the system.
Because the cluster view is approximate node events are always based on reporting from the node themselves (we do not report a node as down if we see it up, even if another node in the cluster think it is down).
Only cluster level events are generated off the top of this views. Actions will have to check if the state of the system matches the expectations before they are applied.
The primary use case for Replicante is part of an automated, distributed, dynamic infrastructure that scales from a small number of small cluster to a large number of large clusters.
It is assumed that managing a list of nodes is at best impractical, but may even be impossible in combination with tools such as auto scaling groups and automated instance provisioners.
The idea of cluster discovery was inspired by Prometheus. Cluster discovery has several advantages:
Kafka itself isn’t really the point, the point is introspection and its implications on the ability to debug the system.
Kafka has the property that it keeps messages in its system for a configurable retention period even if the message was consumed. This feature is a requirement for the events stream but even without that, having access to the full history of messages can be helpful to understand what is happening in the system and why.
In a distributed system, where complexity runs high by nature, any access to system introspection becomes an extremely valuable thing.
Additional distributed queueing systems may be supported in the future but those that have this extra introspection property will be favoured (at the time of writing this means NATS streaming).