Manual remote bootstrap when a majority of peers fail
When a RAFT peer fails, YugabyteDB executes an automatic remote bootstrap to create a new peer from the remaining ones.
If a majority of RAFT peers fail for a given tablet, we have to manually execute the equivalent of a remote bootstrap. We
can get a list of tablets in yb-master-ip:7000/tablet-replication
yb-admin gui.
Assuming we have a cluster where:
- Replication factor is 3
- a given tablet with UUID
TABLET1
- 3 tablet peers, 1 in good working order, referred to as
NODE_GOOD
and two broken peers, referred asNODE_BAD1
andNODE_BAD2
- We will be copying some tablet related data from the good peer to each of the bad peers until we've restored the majority of them
These are the steps to follow in such scenario:
-
on the
NODE_GOOD
TS, create an archive of the wals (raft data), rocksdb (regular rocksdb) directories, intents (transactions data) and snapshots directories forTABLET1
-
copy these archives over to
NODE_BAD1
, on the same drive thatTABLET1
currently has its raft and rocksdb data -
stop the bad TS, say
NODE_BAD1
, as we will be changing file system data underneath -
remove the old wals, rocksdb, intents, snapshots data for
TABLET1
fromNODE_BAD1
-
unpack the data we copied over from
NODE_GOOD
into the corresponding (now empty) directories onNODE_BAD1
-
restart
NODE_BAD1
, so it can bootstrapTABLET1
using this new data -
restart
NODE_GOOD
so it can properly observe the changed state and data onNODE_BAD1
At this point, NODE_BAD2
should be automatically fixed and removed from its quorum, as it has gotten a majority of healthy peers.
Note
Normally when we try to find tablet data, we use a find
command across the --fs_data_dir
paths.
In this example, assume that's set to /mnt/d0
and our tablet UUID is c08596d5820a4683a96893e092088c39
:
$ find /mnt/d0/ -name '*c08596d5820a4683a96893e092088c39*'
/mnt/d0/yb-data/tserver/wals/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/tablet-meta/c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/consensus-meta/c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.intents
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.snapshots
The data we are interested in here is:
For the raft wals:
/mnt/d0/yb-data/tserver/wals/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
For the rocksdb regular DB:
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
For the intents files:
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.intents
For the snapshot files:
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.snapshots