Recover a WeSQL Cluster From All Nodes Fail
In a scenario where all nodes (including their local storage, such as EBS volumes) have been lost, the good news is that WeSQL persists all critical state in object storage. In this case, you will need to rebuild the entire cluster from the object store.
This tutorial assumes that the IP addresses or domain names of all nodes remain unchanged during the rebuilding process. If the IP addresses have changed, the configuration files will need to be updated accordingly. (TODO: Provide instructions for IP changes)
Step 1: Rebuild The Data Node
Step 1.1: Prepare the my.cnf
configuration file
Recreate the configuration file to match the settings used when the Data Node was initially set up.
Now, create and edit the my.cnf file:
mkdir -p /u01/mysql_data_leader
vim /u01/mysql_data_leader/my.cnf
Add the following content:
[mysqld]
# binlog
sync_binlog=1
sync_relay_log=1
log_bin=master-bin
log_bin_index=master-bin.index
# Raft settings
raft_replication_auto_leader_transfer=ON
# serverless settings
objectstore_provider=aws
objectstore_region=us-west-1
objectstore_bucket=wesql-storage
repo_objectstore_id=sysbench
branch_objectstore_id=main
#server
port=3006
datadir=/u01/mysql_data_leader
tmpdir=/u01/mysql_data_leader_tmp
socket=/u01/mysql_data_leader_tmp/mysqld.sock
pid-file=/u01/mysql_data_leader_run/mysqld.pid
log-error=/u01/mysql_data_leader_log/mysqld.err
Step 1.2: Start the Data Node
Start the data node as a single-node cluster using the following command:
mysqld
--defaults-file=/u01/mysql_data_leader/my.cnf \
--raft-replication-force-change-meta=ON \
--raft-replication-cluster-info='192.168.0.2:13006@1' &
This will start the data node with the same settings as before, including using AWS S3 for object storage and participating in the Raft protocol.
Step 2: Rebuild The First Logger Node
Step 2.1: Initialize the Data Directory
Initialize the data directory for the logger node, use the following command:
mysqld \
--no-defaults \
--initialize-insecure \
--datadir=/u01/mysql_data_logger1 \
--raft-replication-log-type-node=ON \
--table-on-objectstore=false \
--objectstore-provider=aws \
--objectstore-region=us-west-1 \
--objectstore-bucket=wesql-storage \
--repo-objectstore-id=sysbench \
--raft-replication-cluster-id=1 \
--raft-replication-cluster-info='192.168.0.3:13006'
Step 2.2: Prepare the my.cnf
configuration file
Create the my.cnf Configuration, and ensure the configuration file for Logger Node 1 matches the one used during the initial setup.
Now, create and edit the my.cnf file:
vim /u01/mysql_data_logger1/my.cnf
Add the following content:
[mysqld]
# binlog
sync_binlog=1
sync_relay_log=1
log_bin=slave-bin
log_bin_index=slave-bin.index
# raft settings
raft_replication_auto_leader_transfer=ON
raft_replication_log_type_node=ON
# serverless settings
snapshot_archive=false
table_on_objectstore=false
objectstore_provider=aws
objectstore_region=us-west-1
objectstore_bucket=wesql-storage
repo_objectstore_id=sysbench
branch_objectstore_id=main
# server
port=3007
datadir=/u01/mysql_data_logger1
tmpdir=/u01/mysql_data_logger1_tmp
socket=/u01/mysql_data_logger1_tmp/mysqld.sock
pid-file=/u01/mysql_data_logger1_run/mysqld.pid
log-error=/u01/mysql_data_logger1_log/mysqld.err
Step 2.3: Start the Logger Node
Once the configuration is ready, start Logger Node 1 using the following command:
mysqld --defaults-file=/u01/mysql_data_logger1/my.cnf &
Step 2.4: Rejoin the Logger Node to the Cluster
Once the logger node is successfully running as a learner, it needs to be rejoined to the cluster. The learner role allows the node to sync logs from the cluster without participating in raft protocol.
Execute the following commands on the leader node to add the logger node as a learner and then upgrade it to a follower:
CALL dbms_consensus.add_learner('192.168.0.3:13006');
CALL dbms_consensus.upgrade_learner('192.168.0.3:13006');
- add_learner: Adds the node back to the cluster as a learner, allowing it to start receiving log updates from the cluster.
- upgrade_learner: Promotes the node from learner to follower, enabling it to fully participate in raft and replication.
Step 3: Rebuild The Second Logger Node
Starting the second Logger Node is similar to the Logger Node 2. Follow these steps to set it up.
Step 3.1: Initialize the Data Directory
Initialize the data directory for the Logger Node 2, use the following command:
mysqld \
--no-defaults \
--initialize-insecure \
--datadir=/u01/mysql_data_logger2 \
--raft-replication-log-type-node=ON \
--table-on-objectstore=false \
--objectstore-provider=aws \
--objectstore-region=us-west-1 \
--objectstore-bucket=wesql-storage \
--repo-objectstore-id=sysbench \
--branch-objectstore-id=main \
--raft-replication-cluster-id=1 \
--raft-replication-cluster-info='192.168.0.4:13006'
Step 3.2: Prepare the my.cnf
configuration file
Create the my.cnf Configuration, and ensure the configuration file for Logger Node 2 matches the one used during the initial setup.
Now, create and edit the my.cnf file:
vim /u01/mysql_data_logger2/my.cnf
Add the following content:
[mysqld]
# binlog
sync_binlog=1
sync_relay_log=1
log_bin=slave-bin
log_bin_index=slave-bin.index
# raft settings
raft_replication_auto_leader_transfer=ON
raft_replication_log_type_node=ON
# serverless settings
snapshot_archive=false
table_on_objectstore=false
objectstore_provider=aws
objectstore_region=us-west-1
objectstore_bucket=wesql-storage
repo_objectstore_id=sysbench
# server
port=3007
datadir=/u01/mysql_data_logger2
tmpdir=/u01/mysql_data_logger2_tmp
socket=/u01/mysql_data_logger2_tmp/mysqld.sock
pid-file=/u01/mysql_data_logger2_run/mysqld.pid
log-error=/u01/mysql_data_logger2_log/mysqld.err
Step 3.3: Start the Logger Node
Once the configuration is ready, start Logger Node 2 using the following command:
mysqld --defaults-file=/u01/mysql_data_logger2/my.cnf &
Step 3.4: Rejoin the Logger Node to the Cluster
Once the logger node is successfully running as a learner, it needs to be rejoined to the cluster. The learner role allows the node to sync logs from the cluster without participating in raft protocol.
Execute the following commands on the leader node to add the logger node as a learner and then upgrade it to a follower:
CALL dbms_consensus.add_learner('192.168.0.4:13006');
CALL dbms_consensus.upgrade_learner('192.168.0.4:13006');
- add_learner: Adds the node back to the cluster as a learner, allowing it to start receiving log updates from the cluster.
- upgrade_learner: Promotes the node from learner to follower, enabling it to fully participate in raft and replication.