Home > Explore >

Fault tolerance

YugabyteDB can automatically handle failures and therefore provides high availability. You will create YSQL tables with a replication factor (RF) of 3 that allows a fault tolerance of 1. This means the cluster will remain available for both reads and writes even if one node fails. However, if another node fails, bringing the number of failures to two, then writes will become unavailable on the cluster in order to preserve data consistency.

If you haven't installed YugabyteDB yet, you can create a local YugabyteDB cluster within five minutes by following the Quick Start guide.

1. Create universe

If you have a previously running local universe, destroy it using the following command.

$ helm uninstall yb-demo -n yb-demo
$ kubectl delete pvc --namespace yb-demo --all

Create a new YugabyteDB cluster.

$ helm install yb-demo yugabytedb/yugabyte \
--set resource.master.requests.cpu=0.5,resource.master.requests.memory=0.5Gi,\
resource.tserver.requests.cpu=0.5,resource.tserver.requests.memory=0.5Gi --namespace yb-demo

Check the Kubernetes dashboard to see the three YB-Master and three YB-TServer pods representing the three nodes of the cluster.

$ minikube dashboard

Kubernetes Dashboard

2. Check cluster status with Admin UI

To check the cluster status, you need to access the Admin UI on port 7000 exposed by the yb-master-ui service. In order to do so, you need to find the port forward the port.

$ kubectl --namespace yb-demo port-forward svc/yb-master-ui 7000:7000

Now, you can view the yb-master-0 Admin UI is available at http://localhost:7000.

3. Connect to YugabyteDB Shell

Connect to ycqlsh on node 1.

$ kubectl -n yb-demo exec -it yb-tserver-0 -- ycqlsh yb-tserver-0

Connected to local cluster at 127.0.0.1:9042.
[ycqlsh 5.0.1 | Cassandra 3.9-SNAPSHOT | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.
ycqlsh>

Create a keyspace and a table.

ycqlsh> CREATE KEYSPACE users;

ycqlsh> CREATE TABLE users.profile (id bigint PRIMARY KEY,
                                    email text,
                                    password text,
                                    profile frozen<map<text, text>>);

4. Insert data through a node

Now insert some data by typing the following into ycqlsh shell.

ycqlsh> INSERT INTO users.profile (id, email, password, profile) VALUES
  (1000, 'james.bond@yugabyte.com', 'licensed2Kill',
   {'firstname': 'James', 'lastname': 'Bond', 'nickname': '007'}
  );

ycqlsh> INSERT INTO users.profile (id, email, password, profile) VALUES
  (2000, 'sherlock.holmes@yugabyte.com', 'itsElementary',
   {'firstname': 'Sherlock', 'lastname': 'Holmes'}
  );

Query all the rows.

ycqlsh> SELECT email, profile FROM users.profile;

 email                        | profile
------------------------------+---------------------------------------------------------------
      james.bond@yugabyte.com | {'firstname': 'James', 'lastname': 'Bond', 'nickname': '007'}
 sherlock.holmes@yugabyte.com |               {'firstname': 'Sherlock', 'lastname': 'Holmes'}

(2 rows)

5. Read data through another node

Let us now query the data from node 3.

$ kubectl -n yb-demo exec -it yb-tserver-2 -- ycqlsh yb-tserver-2

ycqlsh> SELECT email, profile FROM users.profile;

 email                        | profile
------------------------------+---------------------------------------------------------------
      james.bond@yugabyte.com | {'firstname': 'James', 'lastname': 'Bond', 'nickname': '007'}
 sherlock.holmes@yugabyte.com |               {'firstname': 'Sherlock', 'lastname': 'Holmes'}

(2 rows)

ycqlsh> exit;

6. Verify one node failure has no impact

This cluster was created with a replication factor of 3 and hence needs only two replicas to make consensus. Therefore, it is resilient to one failure without any data loss. Let us simulate node 3 failure.

$ kubectl -n yb-demo delete pod yb-tserver-2

Now running the status command should would show that the yb-tserver-2 pod is Terminating.

$ kubectl -n yb-demo get pods

NAME           READY     STATUS        RESTARTS   AGE
yb-master-0    1/1       Running       0          33m
yb-master-1    1/1       Running       0          33m
yb-master-2    1/1       Running       0          33m
yb-tserver-0   1/1       Running       1          33m
yb-tserver-1   1/1       Running       1          33m
yb-tserver-2   1/1       Terminating   0          33m

Now connect to node 2.

$ kubectl -n yb-demo exec -it yb-tserver-1 -- ycqlsh yb-tserver-1

Let us insert some data to ensure that the loss of a node hasn't impacted the ability of the universe to take writes.

ycqlsh> INSERT INTO users.profile (id, email, password, profile) VALUES
  (3000, 'austin.powers@yugabyte.com', 'imGroovy',
   {'firstname': 'Austin', 'lastname': 'Powers'});

Now query the data. We see that all the data inserted so far is returned and the loss of the node has no impact on data integrity.

ycqlsh> SELECT email, profile FROM users.profile;

 email                        | profile
------------------------------+---------------------------------------------------------------
      james.bond@yugabyte.com | {'firstname': 'James', 'lastname': 'Bond', 'nickname': '007'}
 sherlock.holmes@yugabyte.com |               {'firstname': 'Sherlock', 'lastname': 'Holmes'}
   austin.powers@yugabyte.com |                 {'firstname': 'Austin', 'lastname': 'Powers'}

(3 rows)

7. Verify that Kubernetes brought back the failed node

We can now check the cluster status to verify that Kubernetes has indeed brought back the yb-tserver-2 node that had failed before. This is because the replica count currently effective in Kubernetes for the yb-tserver StatefulSet is 3 and there were only two nodes remaining after one node failure.

$ kubectl -n yb-demo get pods

NAME           READY     STATUS    RESTARTS   AGE
yb-master-0    1/1       Running   0          34m
yb-master-1    1/1       Running   0          34m
yb-master-2    1/1       Running   0          34m
yb-tserver-0   1/1       Running   1          34m
yb-tserver-1   1/1       Running   1          34m
yb-tserver-2   1/1       Running   0          7s

YugabyteDB's fault tolerance when combined with Kubernetes' automated operations ensures that distributed applications can be run with ease while ensuring extreme data resilience.

8. Clean up (optional)

Optionally, you can shut down the local cluster created in Step 1.

$ helm uninstall yb-demo -n yb-demo
$ kubectl delete pvc --namespace yb-demo --all