VTOrc

VTOrc is the automated fault detection and repair tool of Vitess. It started off as a fork of the Orchestrator, which was then custom-fitted to the Vitess use-case running as a Vitess component. An overview of the architecture of VTOrc can be found on this page.

Setting up VTOrc lets you avoid performing the InitShardPrimary step. It automatically detects that the new shard doesn't have a primary and elects one for you. It detects any configuration problems in the cluster and fixes them. Here is the list of things VTOrc can do for you:

Recovery NameDescriptionFix that VTOrc does
ClusterHasNoPrimaryVTOrc detects when a shard doesn't have any primary tablet electedVTOrc runs PlannedReparentShard to elect a new primary
DeadPrimaryVTOrc detects when the primary tablet is deadVTOrc runs EmergencyReparentShard to elect a different primary
IncapacitatedPrimaryVTOrc detects when the primary tablet is consistently failing health checks but is still network-reachableVTOrc runs PlannedReparentShard, falling back to EmergencyReparentShard if that fails
PrimaryIsReadOnly, PrimarySemiSyncMustBeSet, PrimarySemiSyncMustNotBeSetVTOrc detects when the primary tablet has configuration issues like being read-only, semi-sync being set or not being setVTOrc fixes the configurations on the primary.
NotConnectedToPrimary, ConnectedToWrongPrimary, ReplicationStopped, ReplicaIsWritable, ReplicaSemiSyncMustBeSet, ReplicaSemiSyncMustNotBeSetVTOrc detects when a replica has configuration issues like not being connected to the primary, connected to the wrong primary, replication stopped, replica being writable, semi-sync being set or not being setVTOrc fixes the configurations on the replica.
StaleTopoPrimaryVTOrc detects when a tablet still has type PRIMARY in the topology but a newer primary has already been elected. This can happen if a topology update fails during an emergency reparent operation.VTOrc demotes the stale primary to a read-only replica, updates its type to REPLICA in the topology, and configures it to replicate from the current primary.

Flags #

For a full list of supported flags, please look at VTOrc reference page.

UI, API and Metrics #

For information about the UI, API and metrics that VTOrc exports, please consult this page.

Example invocation of VTOrc #

You can bring VTOrc using the following invocation:

vtorc --topo-implementation etcd2 \
  --topo-global-server-address "localhost:2379" \
  --topo-global-root /vitess/global \
  --cell zone1 \
  --port 15000 \
  --log-dir=${VTDATAROOT}/tmp \
  --recovery-period-block-duration "10m" \
  --instance-poll-time "1s" \
  --topo-information-refresh-duration "30s" \
  --alsologtostderr

Cell Awareness #

Starting in v24, VTOrc supports the --cell flag to specify which cell the VTOrc instance is running in. This flag is optional in v24 but will become required in v25 and later versions.

The --cell flag enables VTOrc to be cell-aware, which will be used in future releases for cross-cell problem validation. When specified, VTOrc validates that the cell exists in the topology. If the cell doesn't exist, VTOrc will fail to start. If the flag is not provided in v24, VTOrc will log a warning but continue to operate normally.

Filtering Tablets #

By default, VTOrc monitors all tablets across all cells. You can restrict which tablets it watches using the --clusters-to-watch and --cells-to-watch flags.

Filtering by Keyspace/Shard #

The --clusters-to-watch flag accepts a comma-separated list of keyspaces or keyspace/shard combinations:

# Watch all shards in keyspace1 and keyspace2
vtorc --clusters-to-watch "keyspace1,keyspace2" ...

# Watch specific shards
vtorc --clusters-to-watch "keyspace1,keyspace2/-80" ...

Filtering by Cell #

The --cells-to-watch flag accepts a comma-separated list of cells. VTOrc will only monitor tablets in those cells:

# Only watch tablets in zone1 and zone2
vtorc --cells-to-watch "zone1,zone2" ...

VTOrc validates that each specified cell exists in the topology. If any cell doesn't exist, VTOrc will fail to start.

Combining Filters #

When both flags are set, a tablet must match both filters to be monitored:

vtorc --clusters-to-watch "keyspace1" --cells-to-watch "zone1,zone2" ...

This configuration makes VTOrc monitor only tablets in keyspace1 that are located in zone1 or zone2.

When neither flag is set, VTOrc monitors all tablets in the topology.

Durability Policies #

All the failovers that VTOrc performs will be honoring the durability policies. Please be careful in setting the desired durability policies for your keyspace because this will affect what situations VTOrc can recover from and what situations will require manual intervention.

Running VTOrc using the Vitess Operator #

To find information about deploying VTOrc using Vitess Operator please take a look at this page.