Skip to main content

Recover a WeSQL Cluster From All Nodes Fail

In a scenario where all nodes (including their local storage, such as EBS volumes) have been lost, the good news is that WeSQL persists all critical state in object storage. In this case, you will need to rebuild the entire cluster from the object store.

note

This tutorial assumes that the IP addresses or domain names of all nodes remain unchanged during the rebuilding process. If the IP addresses have changed, the configuration files will need to be updated accordingly. (TODO: Provide instructions for IP changes)

Step 1: Rebuild The Data Node

Step 1.1: Prepare the my.cnf configuration file

Recreate the configuration file to match the settings used when the Data Node was initially set up.

Now, create and edit the my.cnf file:

mkdir -p /u01/mysql_data_leader
vim /u01/mysql_data_leader/my.cnf

Add the following content:

[mysqld]
# binlog
sync_binlog=1
sync_relay_log=1
log-bin=master-bin
log-bin-index=master-bin.index

# Raft settings
consensus_replication_auto_leader_transfer=ON

# serverless settings
objectstore_provider=aws
objectstore_region=cn-northwest-1
objectstore_bucket=wesql-cluster-1
cluster_objectstore_id=FD2D88FB-7994-4318-AD3F-2E37ADF52DBC

#server
port=3006
datadir=/u01/mysql_data_leader
tmpdir=/u01/mysql_data_leader_tmp
socket=/u01/mysql_data_leader_tmp/mysqld.sock
pid-file=/u01/mysql_data_leader_run/mysqld.pid
log-error=/u01/mysql_data_leader_log/mysqld.err

Step 1.2: Start the Data Node

Start the data node as a single-node cluster using the following command:

mysqld 
--defaults-file=/u01/mysql_data_leader/my.cnf \
--consensus-replication-force-change-meta=ON \
--consensus-replication-cluster-info='192.168.0.2:13006@1' &

This will start the data node with the same settings as before, including using AWS S3 for object storage and participating in the Raft consensus.

Step 2: Rebuild The First Logger Node

Step 2.1: Initialize the Data Directory

Initialize the data directory for the logger node, use the following command:

mysqld \
--no-defaults \
--initialize-insecure \
--datadir=/u01/mysql_data_logger1 \
--consensus-replication-log-type-node=ON \
--table_on_objectstore=false \
--objectstore-provider=aws \
--objectstore-region=cn-northwest-1 \
--objectstore-bucket=wesql-cluster-1 \
--cluster-objectstore-id=FD2D88FB-7994-4318-AD3F-2E37ADF52DBC \
--consensus-replication-cluster-id=1 \
--consensus-replication-cluster-info='192.168.0.3:13006'

Step 2.2: Prepare the my.cnf configuration file

Create the my.cnf Configuration, and ensure the configuration file for Logger Node 1 matches the one used during the initial setup.

Now, create and edit the my.cnf file:

vim /u01/mysql_data_logger1/my.cnf

Add the following content:

[mysqld]
# binlog
sync_binlog=1
sync_relay_log=1
log-bin=slave-bin
log-bin-index=slave-bin.index

# raft settings
consensus_replication_auto_leader_transfer=ON
consensus-replication-log-type-node=ON

# serverless settings
consistent_snapshot_archive=false
table_on_objectstore=false
objectstore-provider=aws
objectstore_region=cn-northwest-1
objectstore_bucket=wesql-cluster-1
cluster_objectstore_id=FD2D88FB-7994-4318-AD3F-2E37ADF52DBC

# server
port=3007
datadir=/u01/mysql_data_logger1
tmpdir=/u01/mysql_data_logger1_tmp
socket=/u01/mysql_data_logger1_tmp/mysqld.sock
pid-file=/u01/mysql_data_logger1_run/mysqld.pid
log-error=/u01/mysql_data_logger1_log/mysqld.err

Step 2.3: Start the Logger Node

Once the configuration is ready, start Logger Node 1 using the following command:

mysqld --defaults-file=/u01/mysql_data_logger1/my.cnf &

Step 2.4: Rejoin the Logger Node to the Cluster

Once the logger node is successfully running as a learner, it needs to be rejoined to the cluster. The learner role allows the node to sync logs from the cluster without participating in consensus initially.

Execute the following commands on the leader node to add the logger node as a learner and then upgrade it to a follower:

CALL dbms_consensus.add_learner('192.168.0.3:13006');
CALL dbms_consensus.upgrade_learner('192.168.0.3:13006');
  • add_learner: Adds the node back to the cluster as a learner, allowing it to start receiving log updates from the cluster.
  • upgrade_learner: Promotes the node from learner to follower, enabling it to fully participate in consensus and replication.

Step 3: Rebuild The Second Logger Node

Starting the second Logger Node is similar to the Logger Node 2. Follow these steps to set it up.

Step 3.1: Initialize the Data Directory

Initialize the data directory for the Logger Node 2, use the following command:

mysqld \
--no-defaults \
--initialize-insecure \
--datadir=/u01/mysql_data_logger2 \
--consensus-replication-log-type-node=ON \
--table_on_objectstore=false \
--objectstore-provider=aws \
--objectstore-region=cn-northwest-1 \
--objectstore-bucket=wesql-cluster-1 \
--cluster-objectstore-id=FD2D88FB-7994-4318-AD3F-2E37ADF52DBC \
--consensus-replication-cluster-id=1 \
--consensus-replication-cluster-info='192.168.0.4:13006'

Step 3.2: Prepare the my.cnf configuration file

Create the my.cnf Configuration, and ensure the configuration file for Logger Node 2 matches the one used during the initial setup.

Now, create and edit the my.cnf file:

vim /u01/mysql_data_logger2/my.cnf

Add the following content:

[mysqld]
# binlog
sync_binlog=1
sync_relay_log=1
log-bin=slave-bin
log-bin-index=slave-bin.index

# raft settings
consensus_replication_auto_leader_transfer=ON
consensus-replication-log-type-node=ON

# serverless settings
consistent_snapshot_archive=false
table_on_objectstore=false
objectstore-provider=aws
objectstore_region=cn-northwest-1
objectstore_bucket=wesql-cluster-1
cluster_objectstore_id=FD2D88FB-7994-4318-AD3F-2E37ADF52DBC

# server
port=3007
datadir=/u01/mysql_data_logger2
tmpdir=/u01/mysql_data_logger2_tmp
socket=/u01/mysql_data_logger2_tmp/mysqld.sock
pid-file=/u01/mysql_data_logger2_run/mysqld.pid
log-error=/u01/mysql_data_logger2_log/mysqld.err

Step 3.3: Start the Logger Node

Once the configuration is ready, start Logger Node 2 using the following command:

mysqld --defaults-file=/u01/mysql_data_logger2/my.cnf &

Step 3.4: Rejoin the Logger Node to the Cluster

Once the logger node is successfully running as a learner, it needs to be rejoined to the cluster. The learner role allows the node to sync logs from the cluster without participating in consensus initially.

Execute the following commands on the leader node to add the logger node as a learner and then upgrade it to a follower:

CALL dbms_consensus.add_learner('192.168.0.4:13006');
CALL dbms_consensus.upgrade_learner('192.168.0.4:13006');
  • add_learner: Adds the node back to the cluster as a learner, allowing it to start receiving log updates from the cluster.
  • upgrade_learner: Promotes the node from learner to follower, enabling it to fully participate in consensus and replication.