Recover a WeSQL Cluster From All Nodes Fail

In a scenario where all nodes (including their local storage, such as EBS volumes) have been lost, the good news is that WeSQL persists all critical state in object storage. In this case, you will need to rebuild the entire cluster from the object store.

note

This tutorial assumes that the IP addresses or domain names of all nodes remain unchanged during the rebuilding process. If the IP addresses have changed, the configuration files will need to be updated accordingly. (TODO: Provide instructions for IP changes)

Step 1: Rebuild The Data Node

Step 1.1: Prepare the `my.cnf` configuration file

Recreate the configuration file to match the settings used when the Data Node was initially set up.

Now, create and edit the my.cnf file:

mkdir -p /u01/mysql_data_leader
vim  /u01/mysql_data_leader/my.cnf

Add the following content:

[mysqld]
# binlog
sync_binlog=1
sync_relay_log=1
log_bin=master-bin
log_bin_index=master-bin.index

# Raft settings
raft_replication_auto_leader_transfer=ON

# serverless settings
objectstore_provider=aws
objectstore_region=us-west-1
objectstore_bucket=wesql-storage
repo_objectstore_id=sysbench
branch_objectstore_id=main

#server
port=3006
datadir=/u01/mysql_data_leader
tmpdir=/u01/mysql_data_leader_tmp
socket=/u01/mysql_data_leader_tmp/mysqld.sock
pid-file=/u01/mysql_data_leader_run/mysqld.pid
log-error=/u01/mysql_data_leader_log/mysqld.err

Step 1.2: Start the Data Node

Start the data node as a single-node cluster using the following command:

mysqld 
    --defaults-file=/u01/mysql_data_leader/my.cnf \
    --raft-replication-force-change-meta=ON \
    --raft-replication-cluster-info='192.168.0.2:13006@1' &

This will start the data node with the same settings as before, including using AWS S3 for object storage and participating in the Raft protocol.

Step 2: Rebuild The First Logger Node

Step 2.1: Initialize the Data Directory

Initialize the data directory for the logger node, use the following command:

mysqld \
    --no-defaults \
    --initialize-insecure \
    --datadir=/u01/mysql_data_logger1 \
    --raft-replication-log-type-node=ON \
    --table-on-objectstore=false \
    --objectstore-provider=aws \
    --objectstore-region=us-west-1 \
    --objectstore-bucket=wesql-storage \
    --repo-objectstore-id=sysbench \
    --raft-replication-cluster-id=1 \
    --raft-replication-cluster-info='192.168.0.3:13006'

Step 2.2: Prepare the `my.cnf` configuration file

Create the my.cnf Configuration, and ensure the configuration file for Logger Node 1 matches the one used during the initial setup.

Now, create and edit the my.cnf file:

vim /u01/mysql_data_logger1/my.cnf

Add the following content:

[mysqld]
# binlog
sync_binlog=1
sync_relay_log=1
log_bin=slave-bin
log_bin_index=slave-bin.index

# raft settings
raft_replication_auto_leader_transfer=ON
raft_replication_log_type_node=ON

# serverless settings
snapshot_archive=false
table_on_objectstore=false
objectstore_provider=aws
objectstore_region=us-west-1
objectstore_bucket=wesql-storage
repo_objectstore_id=sysbench
branch_objectstore_id=main

# server
port=3007
datadir=/u01/mysql_data_logger1
tmpdir=/u01/mysql_data_logger1_tmp
socket=/u01/mysql_data_logger1_tmp/mysqld.sock
pid-file=/u01/mysql_data_logger1_run/mysqld.pid
log-error=/u01/mysql_data_logger1_log/mysqld.err

Step 2.3: Start the Logger Node

Once the configuration is ready, start Logger Node 1 using the following command:

mysqld --defaults-file=/u01/mysql_data_logger1/my.cnf &

Step 2.4: Rejoin the Logger Node to the Cluster

Once the logger node is successfully running as a learner, it needs to be rejoined to the cluster. The learner role allows the node to sync logs from the cluster without participating in raft protocol.

Execute the following commands on the leader node to add the logger node as a learner and then upgrade it to a follower:

CALL dbms_consensus.add_learner('192.168.0.3:13006');
CALL dbms_consensus.upgrade_learner('192.168.0.3:13006');

add_learner: Adds the node back to the cluster as a learner, allowing it to start receiving log updates from the cluster.
upgrade_learner: Promotes the node from learner to follower, enabling it to fully participate in raft and replication.

Step 3: Rebuild The Second Logger Node

Starting the second Logger Node is similar to the Logger Node 2. Follow these steps to set it up.

Step 3.1: Initialize the Data Directory

Initialize the data directory for the Logger Node 2, use the following command:

mysqld \
    --no-defaults \
    --initialize-insecure \
    --datadir=/u01/mysql_data_logger2 \
    --raft-replication-log-type-node=ON \
    --table-on-objectstore=false \
    --objectstore-provider=aws \
    --objectstore-region=us-west-1 \
    --objectstore-bucket=wesql-storage \
    --repo-objectstore-id=sysbench \
    --branch-objectstore-id=main \
    --raft-replication-cluster-id=1 \
    --raft-replication-cluster-info='192.168.0.4:13006'

Step 3.2: Prepare the `my.cnf` configuration file

Create the my.cnf Configuration, and ensure the configuration file for Logger Node 2 matches the one used during the initial setup.

Now, create and edit the my.cnf file:

vim /u01/mysql_data_logger2/my.cnf

Add the following content:

[mysqld]
# binlog
sync_binlog=1
sync_relay_log=1
log_bin=slave-bin
log_bin_index=slave-bin.index

# raft settings
raft_replication_auto_leader_transfer=ON
raft_replication_log_type_node=ON

# serverless settings
snapshot_archive=false
table_on_objectstore=false
objectstore_provider=aws
objectstore_region=us-west-1
objectstore_bucket=wesql-storage
repo_objectstore_id=sysbench

# server
port=3007
datadir=/u01/mysql_data_logger2
tmpdir=/u01/mysql_data_logger2_tmp
socket=/u01/mysql_data_logger2_tmp/mysqld.sock
pid-file=/u01/mysql_data_logger2_run/mysqld.pid
log-error=/u01/mysql_data_logger2_log/mysqld.err

Step 3.3: Start the Logger Node

Once the configuration is ready, start Logger Node 2 using the following command:

mysqld --defaults-file=/u01/mysql_data_logger2/my.cnf &

Step 3.4: Rejoin the Logger Node to the Cluster

Execute the following commands on the leader node to add the logger node as a learner and then upgrade it to a follower:

CALL dbms_consensus.add_learner('192.168.0.4:13006');
CALL dbms_consensus.upgrade_learner('192.168.0.4:13006');

add_learner: Adds the node back to the cluster as a learner, allowing it to start receiving log updates from the cluster.
upgrade_learner: Promotes the node from learner to follower, enabling it to fully participate in raft and replication.

Recover a WeSQL Cluster From All Nodes Fail

Step 1: Rebuild The Data Node​

Step 1.1: Prepare the my.cnf configuration file​

Step 1.2: Start the Data Node​

Step 2: Rebuild The First Logger Node​

Step 2.1: Initialize the Data Directory​

Step 2.2: Prepare the my.cnf configuration file​

Step 2.3: Start the Logger Node​

Step 2.4: Rejoin the Logger Node to the Cluster​

Step 3: Rebuild The Second Logger Node​

Step 3.1: Initialize the Data Directory​

Step 3.2: Prepare the my.cnf configuration file​

Step 3.3: Start the Logger Node​

Step 3.4: Rejoin the Logger Node to the Cluster​

Step 1: Rebuild The Data Node

Step 1.1: Prepare the `my.cnf` configuration file

Step 1.2: Start the Data Node

Step 2: Rebuild The First Logger Node

Step 2.1: Initialize the Data Directory

Step 2.2: Prepare the `my.cnf` configuration file

Step 2.3: Start the Logger Node

Step 2.4: Rejoin the Logger Node to the Cluster

Step 3: Rebuild The Second Logger Node

Step 3.1: Initialize the Data Directory

Step 3.2: Prepare the `my.cnf` configuration file

Step 3.3: Start the Logger Node

Step 3.4: Rejoin the Logger Node to the Cluster