Actions

Actions are a way to describe some task that needs to be performed by some component of the larger system.

Actions progress from start to finish across states.
More details about the states an action goes through are documented in the Developers Notebook.

Actions are scheduled by applying a YAML object using the replictl tool (yes, this intentionally mirrors kubernetes.io approach):

$ replictl apply -f path/to/action.yaml
Object applied successfully

The YAML object depends on what kind of action you are scheduling.

Node Actions

Node actions (also known as Agent actions) are defined in the specification and are executed on a specific nodes, generally by Replicante Agents.

The YAML object for agent actions has the following specification:

apiVersion: replicante.io/v0
kind: NodeAction

metadata:
  # Can override the namespace with --namespace=test-namespace
  namespace: default
  # Can override the cluster with --cluster=test-cluster
  cluster: target-cluster-id
  # Can override the namespace with --node=test-node
  node: target-node-id

spec:
  # Trigger a debug action that executes two dummy steps and then successfully completes.
  action: agent.replicante.io/debug.progress
  # Pass additional arguments as structured data.
  args:
    options: 'available options change based on the action'
    format: 'any structured YAML object is fine'

Node Actions Approval

Node actions require approval for scheduling by default when they are applied.

This means that actions will NOT be scheduled until they are approved using replictl or the API. Requiring approval for an action means that an action can be created without executing it and someone else can approve it after review. The action executed on approval is exactly the one applied with no change allowed.

To approve and action with replictl:

# Approve a node action for execution.
# It will be scheduled the next orchestration cycle for the cluster.
$ replictl action approve-node-action UUID
Action approved for scheduling

# Node actions can also be disapproved and thus cancelled.
$ replictl action disapprove-node-action UUID
Action disapproved and will not be scheduled

To skip the approval step and schedule an action as soon as possible after it is applied you can set the approval metadata attribute to granted.

metadata:
  # Don't require explicit approval before the action is scheduled.
  approval: granted

Orchestrator Actions

Some times actions operate or impact multiple nodes or the full cluster. These actions generally are about orchestrating changes to the cluster or day to day operations. They don’t even need to be around orchestrating work but that is the most common case.

To support these use cases Replicante Core provides Orchestrator Actions. These are actions that are executed outside of the datastore they target and at the control plane level (either as part of Replicante Core or as a stateless service invoked by Core).

apiVersion: replicante.io/v0
kind: OrchestratorAction

metadata:
  # Can override the namespace with --namespace=test-namespace
  namespace: default
  # Can override the cluster with --cluster=test-cluster
  cluster: target-cluster-id

spec:
  # Trigger a debug action that executes two dummy steps and then successfully completes.
  action: core.replicante.io/debug.ping
  # Pass additional arguments as structured data.
  args:
    options: 'available options change based on the action'
    format: 'any structured YAML object is fine'

Orchestrator Actions Approval

Orchestrator actions scheduling approval works the same way as node action scheduling approval does. The only difference is the replictl command (and API endpoint) used to approve actions:

# List orchestrator actions to know what needs to be approved still.
$ replictl action list-orchestrator-actions
CLUSTER ID            ACTION ID                              KIND                            STATE             CREATED                       FINISHED  
dev-agent-zookeeper   f3bab556-d25f-4d06-90e9-63a5793dd083   core.replicante.io/debug.ping   PENDING_APPROVE   2022-06-19 11:34:45.256 UTC

# Approve an orchestrator action for execution.
# It will be scheduled the next orchestration cycle for the cluster.
$ replictl action approve-orchestrator-action UUID
Orchestrator action approved for scheduling

# Orchestrator actions can also be disapproved and thus cancelled.
$ replictl action disapprove-orchestrator-action UUID
Orchestrator action disapproved and will not be scheduled

Actions concurrency and scheduling priorities

When Replicante Core schedules actions it follows defined rules around which actions can be scheduled, when and where.

The aim of actions is to change the state of the system. Running multiple actions at the same time is therefore risky as it means different changes possibly going into different directions. On the other hand many activities can be safely performed while other changes are happening.

Replicante Core defines a strict set of rules around action scheduling to ensure things behave as expected:

  1. Any running (node or orchestrator) action executes until the end once it is started, even if it means violating the other rules.
  2. Node actions can be scheduled in parallel to any other node, but execute serially on each node.
  3. Orchestrator actions have a scheduling mode that determine when actions can be scheduled as shown in the table below.

Rule 1 exists mainly for safety and simplicity:

  • While clashing actions should not happen stopping one after the fact may be more harmful.
  • And even if safe to do, which action should we stop?
  • And can actions even stop and resume? If an index is being built it has to finish …

As for orchestrator action scheduling modes: no, you can choose the mode. Scheduling modes are a property of actions and not action invocations. If a task is not safe to perform in parallel with others it is never safe to do so, not just sometimes.

The exception to this would be running actions in more restrictive modes, which may be supported in the future.

Scheduling Priority

To enforce the above rules Replicante Core will schedule actions only when no higher priority action is waiting to be scheduled. Additionally, running actions are taken into account to decide if scheduling is allowed.

In the table below:

  • Rows indicate the presence of one or more actions in the named state.
  • Columns indicate the existence of actions of the given class waiting to be scheduled.
  • Cells in the table indicate scheduling of actions is blocked (if the cell is marked) or allowed (if the cell is empty)
Node Orchestrator (Exclusive) Orchestrator (ClusterExclusiveNodeParallel)
[Running] Node X
[Running] Orchestrator (Exclusive) X X X
[Running] Orchestrator (ClusterExclusiveNodeParallel) X X
[Pending] Node X
[Pending] Orchestrator (Exclusive) X
[Pending] Orchestrator (ClusterExclusiveNodeParallel)
The ClusterExclusiveNodeParallel orchestrator action mode is planned but not currently in use.