Skip to main content

Rebuild a WeSQL Cluster Logger Node

In the event of a logger node failure, including the loss of local storage (e.g., EBS volumes), the logger node can be reinitialized and re-added to the WeSQL cluster. This process allows the node to be restored without causing disruption to the rest of the cluster, ensuring continuity of operations and maintaining consistency across the cluster.

Step 1: Retrieve the Cluster Leader Node

To perform operations on the cluster, such as removing and rejoin nodes, you must first identify the current leader node. The leader node coordinates replication and maintains overall cluster consistency. All major changes to the cluster, such as adding or removing nodes, must be executed through the leader.

Use the following SQL query from any active node to identify the leader:

SELECT CURRENT_LEADER FROM INFORMATION_SCHEMA.WESQL_CLUSTER_LOCAL;

In the query result, locate the CURRENT_LEADER field, which contains the address of the leader node. This leader node will be used to execute commands to modify the cluster configuration.

Step 2: Remove or Degrade the Logger Node From the Cluster

Before removing the failed logger node from the cluster, it must first be demoted from a follower role to a learner role. This step ensures that the node can be safely removed without disrupting the raft protocol or affecting replication.

Execute the following commands on the leader node to demote the logger node and remove it as a learner:

CALL dbms_consensus.downgrade_follower('192.168.0.3:13006');
CALL dbms_consensus.drop_learner('192.168.0.3:13006');
mysql> SELECT ROLE FROM INFORMATION_SCHEMA.WESQL_CLUSTER_GLOBAL WHERE IP_PORT='192.168.0.3:13006';
+---------+
| ROLE |
+---------+
| Learner |
+---------+
  • downgrade_follower: Demotes the node’s role from a follower to a learner. A learner can replicate data but does not participate in raft protocol.
  • drop_learner: Removes the node from the cluster as a learner, making it ready to be reinitialized and re-added later.
note

If the logger node’s IP address remains unchanged during the recovery process, you may skip the drop_learner step, as the node’s identity in the cluster configuration remains intact. In this case, you only need to re-upgrade the node once it is reinitialized and started.

Step 3: Initialize and Start the new Logger Node as a Learner

Once the logger node has been removed (or demoted), it needs to be reinitialized. This involves configuring the logger node’s data directory and starting it as a learner to allow it to synchronize logs from the cluster.

Step 3.1: Initialize the Data Directory

Initialize the data directory for the logger node, use the following command:

mysqld \
--no-defaults \
--initialize-insecure \
--datadir=/u01/mysql_data_logger1 \
--raft-replication-log-type-node=ON \
--table_on_objectstore=false \
--objectstore-provider=aws \
--objectstore-region=us-west-1 \
--objectstore-bucket=wesql-storage \
--repo-objectstore-id=sysbench \
--raft-replication-cluster-id=1 \
--raft-replication-cluster-info='192.168.0.3:13006'
  • table_on_objectstore=false indicates that the Logger Node SmartEngine data is not persisted in object storage.
  • raft-replication-log-type-node indicates that this node is a Logger Node in the Raft group.
  • raft-replication-cluster-id indicates the replication cluster that this Logger Node is a part of.
  • raft-replication-cluster-info provides the IP:PORT information for the Logger Node to join the cluster.

Step 3.2: Prepare the my.cnf configuration file

Create the my.cnf Configuration, and ensure the configuration file for Logger Node 1 matches the one used during the initial setup.

Now, create and edit the my.cnf file:

vim /u01/mysql_data_logger1/my.cnf

Add the following content:

[mysqld]
#binlog
sync_binlog=1
sync_relay_log=1
log_bin=slave-bin
log_bin_index=slave-bin.index

# raft settings
raft_replication_auto_leader_transfer=ON
raft_replication_log_type_node=ON

# serverless settings
snapshot_archive=false
table_on_objectstore=false
objectstore-provider=aws
objectstore_region=us-west-1
objectstore_bucket=wesql-storage
repo_objectstore_id=sysbench
branch_objectstore_id=main

# server
port=3007
datadir=/u01/mysql_data_logger1
tmpdir=/u01/mysql_data_logger1_tmp
socket=/u01/mysql_data_logger1_tmp/mysqld.sock
pid-file=/u01/mysql_data_logger1_run/mysqld.pid
log-error=/u01/mysql_data_logger1_log/mysqld.err

Step 3.3: Start the Logger Node

Once the data directory is initialized, the Logger Node can be started. Use the following command to launch the node:

mysqld --defaults-file=/u01/mysql_data_logger1/my.cnf &

This command starts the logger node in the background, using the configuration file specified by my.cnf.

Step 4: Rejoin the Logger Node to the Cluster

Once the logger node is successfully running as a learner, it needs to be re-added to the cluster. The learner role allows the node to sync logs from the cluster without participating in raft protocol.

Execute the following commands on the leader node to re-add the logger node as a learner and then upgrade it to a follower:

CALL dbms_consensus.add_learner('192.168.0.3:13006');
CALL dbms_consensus.upgrade_learner('192.168.0.3:13006');
  • add_learner: Adds the node back to the cluster as a learner, allowing it to start receiving log updates from the cluster.
  • upgrade_learner: Promotes the node from learner to follower, enabling it to fully participate in raft and replication.
note

If the IP address of the logger node has remained unchanged, and the drop_learner command was skipped in Step 2, simply run the upgrade_learner command to reintegrate the node as a follower.

By following these steps, you can safely recover and reintegrate a failed logger node in the WeSQL cluster, maintaining cluster integrity and avoiding disruption.