Replace a failed YB-Master
To replace a failed YB-Master server in a YugabyteDB cluster, follow these steps:
For the steps below, the examples use the following scenario:
- The cluster includes three
yb-master
servers:M1
,M2
,M3
. - YB-Master server
M1
failed and needs to be replaced. - A new YB-Master server (
M4
) will replaceM1
. - The default master RPC port is
7100
Note
If the master to be replaced is already dead (for example, the VM was terminated), perform theREMOVE
step first, and then the ADD
step.
Start the replacement YB-Master server in standby mode
To start yb-master
in standby mode, set the --master_addresses
flag to an empty string (""
). When the
--master_addresses
is ""
, this master server starts without joining any existing master quorum. The node will be added to the master quorum in a later step.
./bin/yb-master --master_addresses="" --fs_data_dirs=<your data directories> [any other flags you would normally pass
to this master process]
Add the replacement YB-Master server into the existing cluster
To add the new YB-Master server, run the yb-admin change_master_config ADD_SERVER
command.
./bin/yb-admin -master_addresses M1:7100,M2:7100,M3:7100 change_master_config ADD_SERVER M4 7100
Remove the failed YB-Master server from the cluster
To remove the failed YB-Master server from the cluster, use the yb-admin change_master_config REMOVE_SERVER
command.
./yb-admin -master_addresses M1:7100,M2:7100,M3:7100,M4:7100 change_master_config REMOVE_SERVER M1 7100
Note
Make sure to specify all YB-Master addresses, including M4, to make sure that if M4 becomes the leader, then yb-admin can find it.Validate cluster
Validate that your set of masters is now M2
, M3
and M4
using:
yb-admin -master_addresses M2:7100,M3:7100,M4:7100 list_all_masters
Until #1542 is implemented, the TS will by default only know of
whatever masters are encoded in the --tserver_master_addrs
flag that they are started with.
If any one of those masters is still part of the active quorum, then they will propagate the new master quorum over via heartbeats. If, however, none of the current masters are present in the TS flag, then the TS will not be able to join the cluster!
So it is important to make sure to update --tserver_master_addrs
on every TS to the new set of master addresses, M2:7100,M3:7100,M4:7100
!