WeSQL Disaster Recovery across Regions - WeSQL Replication Solution with S3 storage

January 21, 2025 · 7 min read

WeSQL Contributor

This blog introduces how to build a disaster recovery cluster across AWS Regions using WeSQL Replication. WeSQL Replication is a feature of WeSQL that provides a simple and efficient replication solution based on S3 Storage. For more information about WeSQL Replication, please refer to WeSQL Replication.

WeSQL Disaster Recovery Cluster Deployment Architecture

We use AWS as an example for deployment, deploying WeSQL disaster recovery cluster in two different regions. The Primary Database is deployed in Region us-west-1, and the Replica Database is deployed in Region us-west-2. The Replica server synchronizes Binlog from the Primary Server's S3 Bucket to the local RelayLog. The Replica Server replays the RelayLog and applies it to its own database, then persists it to its own S3 Bucket. images/wesql_replication.png

Deploying the Disaster Recovery Cluster

Create Source database in Region `us-west-1`

Start a EC2 instance, create a WeSQL Database, set the public IP source_public_ip, and open port 3306 in the security group.

Refer to Start A WeSQL-Server Container

Set S3 Access Key and Secret Key

cat wesql.env
WESQL_OBJECTSTORE_ACCESS_KEY=your-access-key
WESQL_OBJECTSTORE_SECRET_KEY=your-access-key

Start WeSQL Source Server as Primary Database.

docker run -itd --network host --name wesql-source \
  -e MYSQL_CUSTOM_CONFIG="[mysqld]\n\
  port=3306\n\
  log-bin=binlog\n\
  gtid_mode=ON\n\
  enforce_gtid_consistency=ON\n\
  log_replica_updates=ON\n\
  binlog_format=ROW\n\
  objectstore_provider=aws\n\
  objectstore_region=us-west-1\n\
  objectstore_bucket=wesql-storage\n\
  repo_objectstore_id=tutorial\n\
  branch_objectstore_id=main\n\
  smartengine_persistent_cache_size=1G" \
    -v ~/wesql-local-dir:/data/mysql \
    -e WESQL_DATA_DIR=/data/mysql/data \
    -e WESQL_LOG_DIR=/data/mysql/log \
    -e WESQL_CLUSTER_MEMBER=127.0.0.1:13306 \
    -e MYSQL_ROOT_PASSWORD=passwd \
    --env-file=./wesql.env \
    apecloud/wesql-server:8.0.35-0.1.0_beta5.40

Create Replica database in Region `us-west-2`

Start a EC2 instance, create a WeSQL Replica Database, set the public IP replica_public_ip, and open port 3306 in the security group.

Refer to Clone A WeSQL-Server Container, clone the Source Database to the Replica Database and enable the replication feature binlog_archive_replica.

Set the Replica node S3 and Source S3 Access Key and Secret Key. your_source_access_key and your_source_secret_key are the Source Database S3 Access Key and Secret Key.

cat wesql-local.env
WESQL_OBJECTSTORE_ACCESS_KEY=your_access_key
WESQL_OBJECTSTORE_SECRET_KEY=your_secret_key
WESQL_SOURCE_OBJECTSTORE_ACCESS_KEY=your_source_access_key
WESQL_SOURCE_OBJECTSTORE_SECRET_KEY=your_source_secret_key

Start WeSQL Replica Server as Secondary Database.

docker run -itd --network host --name wesql-secondary \
  -e MYSQL_CUSTOM_CONFIG="[mysqld]\n\
  port=3306\n\
  server_id=2\n\
  read_only=ON\n\
  log_bin=binlog\n\
  gtid_mode=ON\n\
  enforce_gtid_consistency=ON\n\
  log_replica_updates=ON\n\
  binlog_format=ROW\n\
  sync_relay_log=10000\n\
  relay_log_recovery=ON\n\
  relay_log=relay-bin\n\
  objectstore_provider='aws'\n\
  objectstore_region='us-west-2'\n\
  objectstore_bucket='wesql-storage'\n\
  repo_objectstore_id='tutorial'\n\
  branch_objectstore_id='main'\n\
  initialize_from_source_objectstore=true\n\
  binlog_archive_replica=true\n\
  source_objectstore_smartengine_data=true\n\
  source_objectstore_provider='aws'\n\
  source_objectstore_region='us-west-1'\n\
  source_objectstore_bucket='wesql-storage'\n\
  source_objectstore_repo_id='tutorial'\n\
  source_objectstore_branch_id='main'" \
    -v ~/wesql-local-dir:/data/mysql \
    -e WESQL_DATA_DIR=/data/mysql/data \
    -e WESQL_LOG_DIR=/data/mysql/log \
    -e WESQL_CLUSTER_MEMBER='127.0.0.1:13306' \
    -e MYSQL_ROOT_PASSWORD=passwd \
    --env-file=./wesql-local.env \
    apecloud/wesql-server:8.0.35-0.1.0_beta5.40

Create schema, user, and the required permissions.

mysql> CREATE SCHEMA sbtest;
mysql> CREATE USER sbtest@'%' IDENTIFIED BY 'password';
mysql> GRANT ALL PRIVILEGES ON sbtest.* to sbtest@'%';

Sysbench Test

Start a EC2 instance, start sysbench test, and connect to the Source Database.

Prepare sysbench data.

docker run --network host --rm \
apecloud/customsuites:latest \
sysbench \
--db-driver=mysql \
--report-interval=2 \
--tables=4 \
--table-size=250000 \
--threads=8 \
--time=300 \
--mysql-host=source_public_ip \
--mysql-port=3306 \
--mysql-user=sbtest \
--mysql-password=password \
/usr/share/sysbench/oltp_read_write.lua \
prepar

Run sysbench test.

docker run --network host --rm \
apecloud/customsuites:latest \
sysbench \
--db-driver=mysql \
--report-interval=2 \
--tables=4 \
--table-size=250000 \
--threads=4 \
--time=60 \
--mysql-host=source_public_ip \
--mysql-port=3306 \
--mysql-user=sbtest \
--mysql-password=password \
/usr/share/sysbench/oltp_read_write.lua \
run

Verify Replication Status.

mysql> show processlist;
+-----+-----------------+-----------------+------+---------+-------+---------------------------------------+------------------+
| Id  | User            | Host            | db   | Command | Time  | State                                 | Info             |
+-----+-----------------+-----------------+------+---------+-------+---------------------------------------+------------------+
|  12 | event_scheduler | localhost       | NULL | Daemon  | 80791 | Waiting on empty queue                | NULL             |
|  16 | system user     |                 | NULL | Query   |     0 | waiting for handler commit            | NULL             |
|  17 | system user     |                 | NULL | Query   |    10 | Waiting for an event from Coordinator | NULL             |
|  18 | system user     |                 | NULL | Query   | 82103 | Waiting for an event from Coordinator | NULL             |
|  19 | system user     |                 | NULL | Connect | 80791 | Waiting for an event from Coordinator | NULL             |
|  20 | system user     |                 | NULL | Connect | 80791 | Waiting for an event from Coordinator | NULL             |
| 409 | root            | 127.0.0.1:34106 | NULL | Query   |     0 | init                                  | show processlist |
+-----+-----------------+-----------------+------+---------+-------+---------------------------------------+------------------+
7 rows in set, 1 warning (0.00 sec)

mysql> select * from mysql.slave_relay_log_info where Channel_name='binlog_archive_replica'\G
*************************** 1. row ***************************
                             Number_of_lines: 14
                              Relay_log_name: ./relay-bin-binlog_archive_replica.000017
                               Relay_log_pos: 45745870
                             Master_log_name: binlog.000001
                              Master_log_pos: 777933242
                                   Sql_delay: 0
                           Number_of_workers: 4
                                          Id: 1
                                Channel_name: binlog_archive_replica
                   Privilege_checks_username: NULL
                   Privilege_checks_hostname: NULL
                          Require_row_format: 0
             Require_table_primary_key_check: STREAM
 Assign_gtids_to_anonymous_transactions_type: OFF
Assign_gtids_to_anonymous_transactions_value:
1 row in set (0.00 sec)

mysql> select * from  mysql.slave_worker_info where Channel_name='binlog_archive_replica'\G
*************************** 1. row ***************************
                        Id: 1
            Relay_log_name: ./relay-bin-binlog_archive_replica.000007
             Relay_log_pos: 51240185
           Master_log_name: binlog.000001
            Master_log_pos: 438066490
 Checkpoint_relay_log_name: ./relay-bin-binlog_archive_replica.000007
  Checkpoint_relay_log_pos: 51188714
Checkpoint_master_log_name: binlog.000001
 Checkpoint_master_log_pos: 438015019
          Checkpoint_seqno: 95
     Checkpoint_group_size: 64
   Checkpoint_group_bitmap: 0x40401008080404028200818000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
              Channel_name: binlog_archive_replica
*************************** 2. row ***************************
                        Id: 2
            Relay_log_name: ./relay-bin-binlog_archive_replica.000007
             Relay_log_pos: 51240721
           Master_log_name: binlog.000001
            Master_log_pos: 438067026
 Checkpoint_relay_log_name: ./relay-bin-binlog_archive_replica.000007
  Checkpoint_relay_log_pos: 51188714
Checkpoint_master_log_name: binlog.000001
 Checkpoint_master_log_pos: 438015019
          Checkpoint_seqno: 96
     Checkpoint_group_size: 64
   Checkpoint_group_bitmap: 0x80802010101008080808020101000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
              Channel_name: binlog_archive_replica

Disaster Recovery

When the primary database is unavailable, disaster recovery can be achieved by manually switching to the replica database.

Simulate primary database unavailability

Stop the primary WeSQL Server.

docker stop wesql-source

Replica Server Consistency Check

Compare the last slice object key of the primary database's persisted Binlog with the Master_log_name and Master_log_pos columns of the replica database's slave_relay_log_info. Ensure that the replica database has caught up to the last slice of the primary database.

Query the list of Binlog files in the Primary Server's S3 Bucket and check the key of the last Binlog file. {binlog file name}.{raft term}.{log end position} is the key format of the Binlog slice in the S3 Bucket.

$ aws s3 ls s3://wesql-dev-snapshot/tutorial/main/binlog/binlog. --region cn-northwest-1 | tail
2025-01-21 08:08:25     559153 binlog.000001.0000000002.0433507487
2025-01-21 08:08:26     619207 binlog.000001.0000000002.0434126694
2025-01-21 08:08:27     615463 binlog.000001.0000000002.0434742157
2025-01-21 08:08:28     562365 binlog.000001.0000000002.0435304522
2025-01-21 08:08:29     567711 binlog.000001.0000000002.0435872233
2025-01-21 08:08:30     567725 binlog.000001.0000000002.0436439958
2025-01-21 08:08:31     566661 binlog.000001.0000000002.0437006619
2025-01-21 08:08:32     547891 binlog.000001.0000000002.0437554510
2025-01-21 08:08:33     496433 binlog.000001.0000000002.0438050943
2025-01-21 08:08:34      18764 binlog.000001.0000000002.0438069707

Query the Master_log_name and Master_log_pos fields from the mysql.slave_master_info table on the Replica Server. This indicates the position of the Binlog synchronized from the Primary Server's S3 and applied to the local RelayLog. Note that synchronizing the Binlog to the local RelayLog does not mean it has been applied to the local database.

mysql> select Master_log_name, Master_log_pos from mysql.slave_master_info where channel_name='binlog_archive_replica'\G
*************************** 1. row ***************************
Master_log_name: binlog.000001
 Master_log_pos: 438069707
1 row in set (0.01 sec)

Query the Master_log_name and Master_log_pos fields from the mysql.slave_relay_log_info table on the Replica Server. This indicates the position of the Binlog synchronized from the Primary Server's S3 and applied to the local RelayLog. The Replica Server can only become the new Primary Server and ensure data consistency with the Primary Server after all RelayLogs have been applied to the local database.

mysql> select Relay_log_name, Relay_log_pos, Master_log_name, Master_log_pos from mysql.slave_relay_log_info where Channel_name='binlog_archive_replica'\G
*************************** 1. row ***************************
 Relay_log_name: ./relay-bin-binlog_archive_replica.000007
  Relay_log_pos: 51243402
Master_log_name: binlog.000001
 Master_log_pos: 438069707
1 row in set (0.00 sec)

Switchover

Disable the BINLOG_ARCHIVE_REPLICATION parameter of the Replica Server to stop replication. Disable the READ_ONLY parameter to allow the Replica Server to write.

mysql> SET persist binlog_archive_replica=OFF;
mysql> SET persist read_only=OFF;

Restart the Replica Server to become the new primary database.

docker restart wesql-secondary

Sysbench Test Recovery by Connecting to the New Primary Server

Modify the sysbench connection database replica_public_ip and 3306 to the new Primary Server ip/port.

docker run --network host --rm \
apecloud/customsuites:latest \
sysbench \
--db-driver=mysql \
--report-interval=2 \
--tables=4 \
--table-size=250000 \
--threads=4 \
--time=60 \
--mysql-host=replica_public_ip \
--mysql-port=3306 \
--mysql-user=sbtest \
--mysql-password=password \
/usr/share/sysbench/oltp_read_write.lua \
run

Scaling Database Connections

January 13, 2025 · 6 min read

earayu

WeScale Contributor

Have you ever wonder why do databases limit the maximum number of connections?

Each connection consumes valuable resources like memory and file descriptors, potentially affecting performance. As systems grow and demand more connections, these limits hinder scalability. Connection pooling offers a solution.

A database connection pool addresses two critical issues in high-performance systems:

Connection Creation Overhead: Establishing new database connections is expensive due to TCP handshakes, authentication, encryption, and setup tasks. Frequent connection creation adds latency, becoming a bottleneck under high load. Pooling connections mitigates this issue by reusing established connections, reducing overhead.
Limited Database Connections: Each connection consumes memory and server-side resources. As applications scale, the number of connections can quickly exceed what the database can efficiently handle. Connection multiplexing allows multiple clients to share fewer physical connections, alleviating the database server's load.

When discussing how to implement a Connection Pool, some find it straightforward, while others see it as challenging. Both perspectives are valid. Pooling is simple—it involves maintaining an efficient data structure to store and retrieve connections. Multiplexing is hard because it requires efficient management of connection states.

Client-Side Connection Pool vs. Server-Side Connection Pool

Choosing between a client-side connection pool and a server-side connection pool is critical. For large systems, server-side pooling is more scalable and controlled.

Why Client-Side Connection Pools Often Fail at Scale?

Client-side connection pools may suffice for smaller systems but become liabilities as client numbers increase. Since clients are typically stateless, scaling out leads to an exponential rise in connections, resulting in:

Excessive Resource Usage: Each client manages its own pool, consuming more connections than necessary.
Uneven Load Distribution: Persistent, long-lived client connections create load imbalances, with some database servers overwhelmed while others remain idle.

The clear conclusion: Server-side connection pooling is essential for scalability. Centralizing connection management on the server provides better control, efficient connection sharing, and optimal performance.

Connection Multiplexing: The Heart of Scalability

The core of connection multiplexing is effective state management. A single connection should serve multiple clients, but only if their states are properly isolated. Otherwise, the state left by one client could affect another, leading to data corruption or security breaches.

Connection state involves:

User variables
Session variables
Transaction status
Temporary tables
Current connected user
Current used database

Efficiently managing these states is key to enabling connection multiplexing at scale, directly addressing the challenge of limited connections.

Multiplexing Granularity: Balancing Efficiency and Isolation

The granularity of multiplexing determines how connections are shared. Finer granularity increases reuse and reduces the number of needed connections but complicates state management.

The multiplexing granularity can be categorized into three types:

Session Pooling: A connection is checked out when a client session begins and returned when the session ends. The connection remains dedicated to that client for the entire session, maintaining all session states.
Transaction Pooling: A connection is checked out at the start of a transaction and returned immediately after the transaction commits or rolls back. This ensures transaction integrity while allowing connections to be reused between transactions.
Statement Pooling: A connection is checked out for the execution of a single SQL statement and returned immediately after. This maximizes connection reuse but requires resetting the connection state between statements.

WeScale balances efficiency and isolation by adapting the pooling strategy based on the query type—using Statement Pooling for stateless, read-only queries and Transaction Pooling for write-heavy or transactional queries.

How WeScale Handles Connection Multiplexing

In WeScale, when a client checks out a connection, any changes to the connection’s state (like session variables or transaction states) are recorded. Upon returning the connection to the pool, WeScale resets the state, ensuring a clean state for the next client. All the connections in the pool are identical, allowing them to be shared across multiple clients freely.

However, certain states—such as the current used database and current connected user—are more challenging to manage and are almost always modified during checkout. These states are crucial because they are specified by the application at connection time.

Current Used Database: Clients typically specify a database context before executing queries. Switching databases within a connection requires sending a command to the database (e.g., USE <database>), introducing additional round-trip latency.
Current Connected User: The connected user also needs to be specified by client at connection time. Changing the user for a connection from the pool involves authentication checks and permissions, and frequent changes can degrade performance.

Optimizing Connection's Database State: DatabaseName Rewrite

A key optimization in WeScale's design is the DatabaseName rewrite. This technique parses SQL queries to prepend the current used database name to table names within the query itself. This allows the same connection to execute queries across multiple databases without frequent state resets or context switches.

Example:

Input Query:

SELECT (SELECT d FROM t2 WHERE d > a), t1.* FROM t1;

Transformed Query:

-- Suppose `test` is the current used database
SELECT (SELECT d FROM test.t2 WHERE d > a), t1.* FROM test.t1;

By transforming the query to include the database name explicitly, WeScale avoids relying on the connection's current used database. This reduces overhead and enhances the flexibility and reusability of connections across different client sessions.

Optimizing Connection's User State: Authentication and Authorization in Proxy

One challenge of connection pooling is dealing with varying user credentials and permissions. Changing users on an active connection is resource-intensive and requires state resets. WeScale simplifies this by maintaining a uniform user across all pooled connections. This allows the connection’s user state to remain constant, even as different client requests arrive, reducing unnecessary state changes and optimizing performance.

But how about user authentication and authorization?

WeScale ensures MySQL compatibility by supporting standard authentication methods like mysql_native_password and caching_sha2_password. The proxy handles user authentication and authorization directly against MySQL’s internal user database, eliminating the need for separate authentication systems or custom credentials management.

Unlike other proxies that require users to configure additional credentials or rely on external systems, WeScale leverages MySQL’s native authentication system, ensuring better security and compatibility.

Summary

Scaling database connections is challenging due to resource limits and connection overhead. Client-side connection pools often fail at scale, leading to excessive resource usage and uneven load. WeScale tackles this by implementing server-side connection pooling and efficient connection multiplexing, allowing multiple clients to share fewer connections without state conflicts. Key strategies include:

Adaptive Pooling: Using statement pooling for read-only queries and transaction pooling for transactional ones.
State Management: Resetting connection states upon return to the pool to ensure isolation, making sure that connections are clean for the next client.
DatabaseName Rewrite: Modifying queries to include explicit database names, reducing the need for resetting the connection’s current used database.
Proxy Authentication: Handling user authentication within the proxy, reducing the need for changing the connection's current connected user.

These approaches reduce overhead, optimize resource usage, and enhance scalability and performance in high-demand database environments.

Host Database Serverlessly On AWS Fargate - Just Like Your Applications

December 26, 2024 · 4 min read

earayu

WeScale Contributor

Running a database in the cloud often means paying for idle capacity and dealing with persistent storage overhead. AWS ECS + Fargate changes that equation by letting you pay only for actual compute time and seamlessly integrate with S3 for storage, effectively separating compute from data. This guide shows you how to host a MySQL-compatible database (WeSQL) using Fargate’s pay-as-you-go model while storing your data durably on S3—so you don’t pay for idle compute.

What We’ll Build

By following this guide, you’ll launch a MySQL-compatible database that:

Runs on AWS Fargate (no dedicated servers)
Uses S3 for permanent data storage
Starts/stops on demand, ensuring you only pay when it’s actually running

Here’s the architecture and resource dependencies:

images/ecs-dependencies.svg

Prerequisites

An AWS account with appropriate permissions
Basic understanding of AWS VPC, ECS, IAM
A VPC with public subnets and security groups configured

Networking Requirements

A VPC with a public subnet
An Internet Gateway attached
A security group allowing:
- Inbound: TCP port 3306 (MySQL)
- Outbound: All traffic

Create S3 Bucket

Create an S3 bucket for database files—this is where WeSQL will store all data.

images/create_s3_bucket.png

Create IAM Roles

Next, create an IAM role to grant ECS tasks permission to access the S3 bucket. Attach this role to your task definition.

images/create_iam_role.png

Create Log Group

Set up a CloudWatch Log Group to capture container logs:

images/log_group.png

Create ECS Cluster

Create an ECS cluster to organize Fargate tasks:

images/create_cluster.png

Create ECS Task Definition

Define how your WeSQL container runs in the ECS Task Definition:

images/create_task_definition.png

Below is an example JSON for WeSQL:

{
    "family": "wesql-fargate-task-def",
    "taskRoleArn": "<your-task-role-with-s3-permission>",
    "executionRoleArn": "<ecs-default-ecsTaskExecutionRole>",
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "1024",
    "memory": "2048",
    "containerDefinitions": [
        {
            "name": "wesql-server",
            "image": "apecloud/wesql-server:8.0.35-0.1.0_beta2.37",
            "cpu": 1024,
            "memory": 2048,
            "portMappings": [
                {
                    "containerPort": 3306,
                    "hostPort": 3306,
                    "protocol": "tcp"
                }
            ],
            "essential": true,
            "environment": [
                {
                    "name": "MYSQL_CUSTOM_CONFIG",
                    "value": "[mysqld]\nport=3306\nlog-bin=binlog\ngtid_mode=ON\nenforce_gtid_consistency=ON\nlog_slave_updates=ON\nbinlog_format=ROW\nobjectstore_provider=aws\nrepo_objectstore_id=tutorial\nbranch_objectstore_id=main\nsmartengine_persistent_cache_size=1G"
                },
                {
                    "name": "WESQL_OBJECTSTORE_ACCESS_KEY",
                    "value": "<REPLACE_ME>"
                },
                {
                    "name": "WESQL_DATA_DIR",
                    "value": "/data/mysql/data"
                },
                {
                    "name": "WESQL_OBJECTSTORE_SECRET_KEY",
                    "value": "<REPLACE_ME>"
                },
                {
                    "name": "MYSQL_ROOT_PASSWORD",
                    "value": "passwd"
                },
                {
                    "name": "WESQL_CLUSTER_MEMBER",
                    "value": "127.0.0.1:13306"
                },
                {
                    "name": "WESQL_OBJECTSTORE_REGION",
                    "value": "us-west-2"
                },
                {
                    "name": "WESQL_LOG_DIR",
                    "value": "/data/mysql/log"
                },
                {
                    "name": "WESQL_OBJECTSTORE_BUCKET",
                    "value": "wesql-fargate-test"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/wesql-fargate-task-def",
                    "awslogs-region": "us-west-2",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ]
}

Create ECS Service

Now create an ECS Service to keep the task running:

images/create_service2.png

Select the right subnets and security group:

images/create_service3.png

Check logs in CloudWatch to confirm the database is running:

images/create_service5.png

Connect to the Database

With the service active, you can connect to WeSQL using any MySQL client:

mysql -h <FARGATE_PUBLIC_IP> -P 3306 -uroot -ppasswd

images/connect_to_wesql.png

Pause and Resume the Database

WeSQL uses S3 for storage, so the database container in Fargate is stateless and can be stopped and restarted without losing data. This is ideal for saving costs—simply stop the database when not in use:

# Pause the database (set desired count to 0)
aws ecs update-service \
    --cluster <your-cluster-name> \
    --service <your-service-name> \
    --desired-count 0

# Resume the database (set desired count to 1)
aws ecs update-service \
    --cluster <your-cluster-name> \
    --service <your-service-name> \
    --desired-count 1

Or automate this with AWS Lambda:

import boto3

ecs = boto3.client('ecs')

def lambda_handler(event, context):
    action = event.get('action')
    cluster = event.get('cluster')
    service = event.get('service')
    desired_count = 0 if action == 'pause' else 1

    ecs.update_service(
        cluster=cluster,
        service=service,
        desiredCount=desired_count
    )

Other Considerations

Service Discovery: Use AWS Service Discovery for a stable DNS endpoint.
Secrets Management: Store credentials in AWS Secrets Manager.
Scheduled Start/Stop: Use AWS EventBridge to automate cost-saving start/stop schedules.

Automation with Python Scripts

Check out this simple Python script using AWS SDK (boto3) to automate resource creation. You can also adapt it or use Terraform, CloudFormation, etc.

By combining Fargate’s pay-as-you-go model with S3-based storage, you get a flexible “serverless” MySQL experience—no idle compute charges, no bulky EBS volumes, just on-demand database hosting that fits your actual usage.

WeSQL Outperforms AWS RDS MySQL Single AZ -- 4-6x the Speed, 1/2 the cost

December 17, 2024 · 7 min read

Dongsheng Zhao

SmartEngine:The Next-Generation Cloud-Native Storage Engine

When people hear "S3" and "OLTP database" in the same sentence, skepticism is a common reaction. S3 is known for its low cost, but its performance and latency are often seen as unsuitable for the demands of OLTP workloads.

At WeSQL, we’ve previously explained our approach in articles like Why S3 and Persistent Cache. We use S3 for its low cost, reliability, and scalability as durable storage, while addressing performance and latency challenges with efficient write and read caching mechanisms.

Still, questions remain: Can an S3-based OLTP database perform well in practical use cases?

In this blog, we’ll show how WeSQL achieves significant storage cost savings while delivering several times the computational efficiency of AWS RDS. By combining low cost with strong performance, WeSQL challenges the traditional expectations of OLTP databases with S3-based storage.

Test Explanations

Sysbench is a widely used tool for testing OLTP performance, but its final metrics can vary greatly depending on factors like instance specifications, data volume, concurrency levels, and access patterns.

In the following Sysbench tests, we have designed the testing scenarios to closely resemble real-world business use scenarios.

Instance size

We chose the 4-core 16GB specification because This specification is widely used for small to medium-sized database instances in production, making the test results more relevant to typical workloads. And the 4-core 16GB setup provides a well-balanced combination of CPU and memory, making it suitable for handling most OLTP workloads efficiently without over-provisioning resources.

Data volume and Random type

In typical online database environments, the total data scale usually ranges from 200GB to 300GB,but only a portion of this data is actively accessed. Following the "80/20 rule," the actively accessed data typically amounts to 40GB to 60GB. To simulate real-world business scenarios, we chose a test data volume of 48GB (100 tables with 2 million rows each), which falls within this active data range. The data is accessed using a uniform pattern to ensure all parts of the dataset are evenly accessed, reflecting common usage patterns. This setup creates a realistic test environment for accurate performance evaluation.

With 16GB of memory available, the 48GB data volume exceeds the memory capacity by a large margin. This forces the system to rely on disk-based operations, effectively testing the storage engine’s performance in areas such as I/O efficiency and caching strategies.

Test Environment

Compute Instances:
- AWS RDS Single-AZ:
  - Instance type: db.m5d.xlarge (4vCPU, 16GB RAM)
  - Equipped with a 150GB local NVMe SSD for temporary storage. Persistent storage relies on EBS volumes.
- WeSQL EC2:
  - Instance type: m5d.xlarge (4vCPU, 16GB RAM)
  - Also equipped with a 150GB local NVMe SSD, which WeSQL uses for caching to optimize read & update performance.
Storage Backend:
- AWS RDS:
  - EBS gp3 volumes (200GB, 125MB/s, 3000 IOPS) for persistent storage.
- WeSQL:
  - EBS gp3 volumes (100GB, 125MB/s, 3000 IOPS) for logs and metadata.
  - Primary data storage is offloaded to S3.

WeSQL is designed to minimize dependency on expensive EBS storage by leveraging S3 for data storage, so it uses a small EBS volume to store logs and metadata.

Client:
- Sysbench 1.0.20
- EC2: t3a.2xlarge (8vCPU, 32GB RAM)
Server:
- Database Version:
  - AWS RDS: 8.0.35
  - WeSQL: Built on MySQL 8.0.35
- Deployment:
  - Both systems were deployed as single-node instances for a direct performance comparison.
Network Configuration:
- Availability Zone: All components—including AWS RDS, WeSQL EC2 instances, and the Sysbench client—were deployed within the same AWS availability zone to reduce network latency and ensure consistent test conditions.

Test Method

We used the Sysbench oltp_read_write workload to evaluate performance. The test configuration was as follows:

DataSet: Prepared 100 tables, each containing 2 million rows of data.
Concurrency: Tests were conducted with concurrency levels of 2, 4, 8, 16, 32, 64, 96, and 128.
Duration: Each concurrency level ran for 300 seconds.
Interval: A 60-second interval was applied before starting the next concurrency level.

Results

We tested both AWS RDS and WeSQL under the oltp_read_write workload using varying levels of concurrency.

throughput comparison

images/wesql_rds_throughput.png

95th Percentile Latency comparison

images/wesql_rds_rt.png

Conclusions

Performance perspective

Based on the test results, WeSQL demonstrates peak performance that is nearly 4 times higher than AWS RDS using the same resources. Additionally, WeSQL provides significantly better latency compared to AWS RDS.

In real-world business scenarios, low latency is often critical. For instance, under a 32-thread load, WeSQL achieves a QPS of 7356.72, with a P95 latency of 110.66ms. In contrast, at the same latency level, AWS RDS achieves only 1232.69 QPS. This means that WeSQL has approximately 6 times the throughput of AWS RDS when comparing performance at equivalent latency thresholds.

Cost perspective

WeSQL also provides a significant storage cost advantage. In this test scenario, where the overall data volume is relatively small, our costs are still nearly half of AWS RDS Single-AZ. As data volume grows, this cost advantage becomes even more pronounced.

AWS RDS Single-AZ (db.m5d.xlarge): USD 0.419 per hour
AWS RDS Multi-AZ (db.m5d.xlarge): USD 0.838 per hour
AWS EC2 (m5d.xlarge): USD 0.226 per hour
Above prices are based on the us-east-1 availability zone.

Although we used a single-node deployment in our test, real-world environments typically require cross-AZ disaster recovery for resilience and fault tolerance. In WeSQL’s architecture, data durability is ensured by continuously uploading data to S3, which inherently provides cross-AZ disaster recovery capabilities. As a result, a single-data-node WeSQL deployment offers cross-AZ disaster recovery capabilities, while costing nearly 1/4 of AWS RDS Multi-AZ.

To ensure no data is lost during an AZ failure, including logs stored on EBS, WeSQL’s multi-AZ deployment adds two additional log nodes. In upcoming articles, we will provide a detailed analysis of the cost and performance differences between WeSQL and AWS RDS Multi-AZ.

Analysis

By separating the read and write QPS from the above tests for comparison, it is clear that WeSQL delivers superior performance over AWS RDS in both read and write operations.

images/qps_write.png

images/qps_read.png

Why WeSQL outperforms AWS RDS in write performance

SmartEngine's Write-Optimized LSM-Tree Data Structure

The storage engine used by WeSQL, SmartEngine, is built on an LSM-Tree architecture.This design minimizes write amplification compared to the B+ Tree structure used by InnoDB in AWS RDS,resulting in more efficient write operations and better overall write performance.
S3's High Write Bandwidth Beats EBS GP3

SmartEngine uses S3 as the persistent storage layer, taking advantage of its higher bandwidth to accelerate flush and compaction operations.In contrast, AWS RDS relies on gp3 volumes for persistent storage,where the limited bandwidth of gp3 can become a bottleneck during dirty page flushing.This leads to I/O constraints that hinder write performance in RDS.

Why WeSQL outperforms AWS RDS in read performance

Low-Latency Local NVMe SSDs Cache

SmartEngine makes use of local NVMe SSDs as a read cache, which provides several key advantages:
- Separation of Read and Write I/O: By isolating reads from writes, WeSQL reduces I/O contention, resulting in smoother and faster read operations.
- Higher Performance of NVMe SSDs: Local NVMe SSDs offer significantly better performance compared to the gp3 volumes used by AWS RDS, enabling faster data access and lower read latencies.
Optimizations for LSM-Tree’s Read Challenges

While the LSM-Tree architecture traditionally underperforms B+ Tree in read-heavy workloads, SmartEngine incorporates a series of optimizations to bridge this gap and achieve read performance comparable to InnoDB. These include:
- Multi-priority caching mechanisms to prioritize hot data.
- Bloom filters to minimize unnecessary disk reads.
- Asynchronous I/O for better concurrency and throughput.
- Latch-free metadata operations for lower contention and higher throughput.

Run Database in GitHub Actions, Persisting Data to S3, and Access it Publicly

December 12, 2024 · 7 min read

earayu

WeScale Contributor

When it comes to running a database for development, testing, demos, or short-lived workloads, cost and complexity can be big headaches. Traditional database hosting—like a VPS, a cloud VM, or a managed service—comes with ongoing costs, storage expenses, and configuration overhead. You often end up paying for resources even when you're not using them.

But what if you could spin up a database only when you need it, leverage cheap (or free) object storage for persistence, and tear it down afterward at nearly zero cost, without losing data? One possible approach is to use GitHub Actions as an ephemeral compute environment, combined with S3 (or an S3-compatible service) as persistent storage. With a secure tunnel, you can even access it publicly.

Important Note on Usage: This approach should be used only for short-term integration tests, temporary demos, or quick development tasks. Do not abuse GitHub Actions by running a database continuously or using it as a long-term service platform. GitHub Actions is designed primarily for CI/CD, not as a free computing resource for persistent services. If you need continuous or long-running database hosting, please consider other services or set up a Self-Hosted GitHub Runner in an environment you control, ensuring compliance with GitHub’s usage policies.

Key Idea

The core concept is:

Ephemeral Compute from GitHub Actions: Spin up a MySQL-compatible database only on-demand as part of a CI/CD or testing workflow.
S3-Compatible Storage for Persistence: Store all database data in object storage like AWS S3 or Cloudflare R2. When the ephemeral environment ends, you still have your data safely stored off-runner.
Tunneling for Public Access: Temporarily expose the database to the internet for tests or demos.
Short-Term Use Only: The database runs during the workflow execution window. Once done, the workflow ends and the ephemeral compute resource is freed. This is not intended as a permanent hosting solution.

Want it entirely free? Consider an S3-compatible service like Cloudflare R2 with a generous free tier.

Use Cases

Integration Testing in CI/CD: Spin up a real MySQL-compatible environment for test runs, then shut it down.
Temporary Demos: Need a quick, shareable DB instance for a one-off demo? Perfect.
Short-Term Development: Quickly test new code against a real DB environment without maintaining a full-time service.

Not Recommended For:

Long-term database hosting or production workloads.
Maintaining an always-on public database endpoint.
Circumventing GitHub Actions usage policies.

If you have longer-running needs, consider setting up a Self-Hosted Runner where you manage resources and adhere to the appropriate usage rules.

Example GitHub Actions Workflow

Below is an example GitHub Actions workflow that demonstrates this approach. The workflow briefly spins up a WeSQL database, uses object storage for persistence, and provides temporary tunnel access. If you want to use this workflow, you can follow the steps in the README.

name: Start WeSQL Cluster

on:
  workflow_dispatch:

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Configure AWS CLI
        run: |
          aws configure set aws_access_key_id ${{ secrets.WESQL_OBJECTSTORE_ACCESS_KEY }}
          aws configure set aws_secret_access_key ${{ secrets.WESQL_OBJECTSTORE_SECRET_KEY }}
          aws configure set default.region ${{ secrets.WESQL_OBJECTSTORE_REGION }}

      - name: Start WeSQL Server
        run: |
          export WESQL_OBJECTSTORE_BUCKET=${{ secrets.WESQL_OBJECTSTORE_BUCKET }}
          export WESQL_OBJECTSTORE_REGION=${{ secrets.WESQL_OBJECTSTORE_REGION }}
          export WESQL_OBJECTSTORE_ACCESS_KEY=${{ secrets.WESQL_OBJECTSTORE_ACCESS_KEY }}
          export WESQL_OBJECTSTORE_SECRET_KEY=${{ secrets.WESQL_OBJECTSTORE_SECRET_KEY }}

          docker run -itd --network host --name wesql-server \
            -p 3306:3306 \
            -e MYSQL_CUSTOM_CONFIG="[mysqld]\n\
            port=3306\n\
            log-bin=binlog\n\
            gtid_mode=ON\n\
            enforce_gtid_consistency=ON\n\
            log_slave_updates=ON\n\
            binlog_format=ROW\n\
            objectstore_provider='aws'\n\
            repo_objectstore_id='tutorial'\n\
            objectstore_bucket='${WESQL_OBJECTSTORE_BUCKET}'\n\
            objectstore_region='${WESQL_OBJECTSTORE_REGION}'\n\
            branch_objectstore_id='main'" \
            -v ~/wesql-local-dir:/data/mysql \
            -e WESQL_CLUSTER_MEMBER='127.0.0.1:13306' \
            -e MYSQL_ROOT_PASSWORD=${{ secrets.WESQL_ROOT_PASSWORD }} \
            -e WESQL_OBJECTSTORE_ACCESS_KEY=${WESQL_OBJECTSTORE_ACCESS_KEY} \
            -e WESQL_OBJECTSTORE_SECRET_KEY=${WESQL_OBJECTSTORE_SECRET_KEY} \
            apecloud/wesql-server:8.0.35-0.1.0_beta4.38

      - name: Wait for MySQL port
        run: |
          for i in {1..60}; do
            if nc -z localhost 3306; then
              echo "MySQL port 3306 is ready!"
              exit 0
            fi
            echo "Waiting for MySQL port 3306..."
            sleep 5
          done
          echo "Timeout waiting for MySQL port 3306"
          exit 1

      - name: Start and parse Serveo tunnel
        run: |
          # Just a neat trick: start a tunnel and parse out the assigned port
          nohup ssh -o StrictHostKeyChecking=no -R 0:localhost:3306 serveo.net > serveo.log 2>&1 &
          sleep 5

          TUNNEL_LINE=$(grep 'Forwarding TCP' serveo.log || true)
          if [ -z "$TUNNEL_LINE" ]; then
            echo "No forwarding line found"
            exit 1
          fi

          HOST="serveo.net"
          PORT=$(echo "$TUNNEL_LINE" | sed 's/.*Forwarding TCP connect from .*:\([0-9]*\)/\1/')

          echo "MySQL Public Access:"
          echo "Host: $HOST"
          echo "Port: $PORT"
          echo "Connect: mysql -h $HOST -P $PORT -u root -p${{ secrets.WESQL_ROOT_PASSWORD }}"

          echo "HOST=$HOST" >> $GITHUB_ENV
          echo "PORT=$PORT" >> $GITHUB_ENV

      - name: Write Connection Info to S3
        run: |
          # Just a convenience: store connection info in S3 so you can find it later
          cat << EOF > connection_info.txt
          host=$HOST
          port=$PORT
          username=root
          password=${{ secrets.WESQL_ROOT_PASSWORD }}
          mysql_cli="mysql -h $HOST -P $PORT -u root -p${{ secrets.WESQL_ROOT_PASSWORD }}"
          EOF

          aws s3 cp connection_info.txt s3://${{ secrets.WESQL_OBJECTSTORE_BUCKET }}/connection_info.txt
          echo "Connection info is now in s3://${{ secrets.WESQL_OBJECTSTORE_BUCKET }}/connection_info.txt"

      - name: Keep session running
        run: |
          # Keep the workflow alive so the database stays accessible.
          echo "Press Ctrl+C or cancel the workflow when done."
          tail -f /dev/null

Breakdown of the Workflow (Tips & Tricks)

Using S3 or R2: By default, this example shows AWS S3. But since WeSQL just needs an S3-compatible API, you could easily switch to Cloudflare R2’s free tier. That could make the whole setup cost zero, no matter how often you run it.
Tunneling: Serveo is a neat trick. By using SSH, we ask Serveo to forward a random high port on their server to our MySQL port on the ephemeral runner. Suddenly, anyone on the internet can connect to our ephemeral DB—cool, right? You can replace Serveo with other tunneling services such as ngrok as you like, but Serveo is simple and free.
Persistence: Because all data lives in S3 (or R2), the DB can vanish and reappear as needed with full persistence. Think of it as "serverless" MySQL.
Storing Connection Info in S3: Another trick. Instead of always reading logs, you can fetch the connection details from S3. This is handy if you want to programmatically retrieve connection info for some automation elsewhere.

Connecting to the Database

After this workflow runs, check the Actions log for the Host and Port. On your local machine, just do:

mysql -h serveo.net -P <PORT> -u root -p<YOUR_PASSWORD>

images/use-github-actions-as-a-database/actions_run_page.png

Data Persistence & Restarting

The big win here: When the runner dies, you lose the container, but not the data. Next time you run the workflow, WeSQL pulls data back from your chosen object storage provider (AWS S3, R2, or any S3-compatible service). This means you can treat your DB like a long-lived environment—even though it’s technically ephemeral compute.

You can see all your data in S3 bucket: images/use-github-actions-as-a-database/s3_page.png

A Note on Security

The tunnel is public. Use a strong password, maybe restrict access (if the service supports it), and rotate credentials when done.
Keep AWS and DB credentials in GitHub Secrets, not in your code.
If you handle sensitive data, consider TLS and additional security layers.

Conclusion

This approach challenges the traditional “rent a server and keep it running” model for databases. Using GitHub Actions as ephemeral compute plus S3 for storage, you get a “serverless-ish” database that runs only when you want it, free of charge, and is still accessible from anywhere on the net.

It’s a neat hack. Give it a try next time you need a throwaway or on-demand DB environment. You might never go back to your old ways of spinning up costly VMs or managed instances just to show someone a quick demo or run a batch of tests.

The Shift to S3 - Every Database Will Be Rearchitected for S3 in the Next Few Years

November 7, 2024 · 7 min read

Wei Cao

Founder at ApeCloud

In recent years, we’ve seen a profound shift in how databases handle storage, with more and more systems transitioning to S3 (or S3-compatible object storage) as their primary storage backend. This trend is not isolated to any single type of database but is happening across the spectrum— from OLAP data warehouses, to streaming systems, to OLTP databases, and even embedded databases. It’s becoming increasingly clear that object storage will be the future of database architecture, and in the next two to three years, we can expect nearly every database to be rearchitected to utilize S3 or similar services.

Early Adopters: OLAP Data Warehouses

The shift to S3 started in the OLAP (Online Analytical Processing) domain. Object storage, with its unlimited scalability, pay-as-you-go pricing, and high read&write bandwidth, is a perfect fit for the I/O-heavy, large-scale data operations typical of OLAP systems.

Take Snowflake, for example. Snowflake was one of the earliest products to adopt S3 as its backend storage, revolutionizing how data warehouses could scale in the cloud. One of the biggest advantages Snowflake gained from adopting S3 was the ability to separate compute from storage. This separation enabled Snowflake to offer elastic scaling of compute resources, allowing users to independently scale compute nodes as needed. This architecture also supports features like virtual data warehouses, where multiple teams or workloads can operate on the same underlying data without resource contention. By leveraging S3, Snowflake not only achieved cost savings and reliability but also unlocked powerful cloud-native capabilities.

Another player, Rockset also embraced S3 for its data storage, achieving compute-storage separation. It uses RocksDB as its storage engine, storing data on S3 to provide data durability, while leveraging a tiered storage architecture to offer better cost efficiency.

Beyond OLAP: Streaming Systems and Kafka Alternatives

The shift to S3 is not limited to OLAP databases. Even streaming systems are exploring object storage as a primary backend. Kafka was once challenged by WarpStream, a Kafka alternative that leveraged S3 for handling streams of data. WarpStream claimed a No Disk architecture, persisting all data directly to S3 to provide durability and scalability in ways that traditional Kafka architectures—relying on local disks or expensive block storage—could not match. Additionally, brokers across multiple Availability Zones (AZs) could share the same data via S3, avoiding the costly cross-AZ replication traffic typically required for synchronizing WAL logs. WarpStream has since been acquired by Confluent, consolidating its innovations into the broader Kafka ecosystem.

PostgreSQL and the Rise of S3 Storage Engines

One of the most exciting developments is Neon, an open-source alternative to Aurora PostgreSQL, that brings compute-storage separation to PostgreSQL by leveraging S3 as the storage layer. Neon's Page Server converts the database's random writes into log-structured sequential writes stored on S3, transforming PostgreSQL into a serverless database with compute-storage separation, where compute nodes can be restarted, migrated, and recovered from failures extremely quickly.

But Neon is not alone. Another project, OrioleDB(acquired by Supabase), is also developing an experimental S3 storage engine for PostgreSQL, for increasing data safety, and for scaling and changing the architecture of compute instances preserving all data.

Embedded Databases: SQLite and DuckDB Embrace S3

Even embedded databases are not immune to the growing influence of S3. Cloudflare, for instance, has completely replaced the persistent layer of its SQLite storage backend for DO (Durable Objects) with R2, Cloudflare’s S3-compatible object storage service. This move highlights that even lightweight, embedded databases can benefit by using a local disk as a cache on top of durable, cheap object storage, combining fast access with the reliability of S3-like storage.

Meanwhile, MotherDuck is using Differential Storage to store DuckDB data on S3. DuckDB is a high-performance embedded analytical database, often described as SQLite for OLAP workloads. By shifting the storage layer to S3, MotherDuck ensures that DuckDB can take on the role of a central data warehouse, scaling efficiently and handling large datasets without being constrained by local storage limits.

Enterprise Databases: DB2 Integrates with S3

The trend toward S3 is not limited to modern, open-source databases. Even traditional enterprise databases are being rearchitected to take advantage of object storage. A recent paper presented at VLDB 2024 revealed that IBM DB2 Warehouse is undergoing a significant transformation. IBM is replacing DB2’s traditional storage engine with RocksDB, an LSM-based storage engine, and moving data to S3. This change allows DB2 to handle the massive scale of contemporary data workloads, benefiting from object storage not only for cost efficiency but also for improved performance.

Why S3?

So, why are so many databases moving to S3 or S3-compatible storage systems?

Some of the advantages come from S3 itself, which the database can benefit from without requiring any modifications:

Scalability: S3 offers virtually unlimited storage capacity, allowing databases to grow without the headaches of managing and provisioning storage manually.
Cost Efficiency: S3’s pay-as-you-go pricing model makes it much more affordable than traditional block storage, especially for databases with large datasets or archival needs. It eliminates the need for expensive, pre-provisioned disk space.
Durability and Reliability: S3 guarantees 99.999999999% (11 nines) of durability, which means data is incredibly safe from loss. This level of reliability is difficult to achieve with traditional storage systems without significant overhead.
High Bandwidth: S3 provides high write bandwidth, ensuring that database I/O operations, such as flushing dirty pages, no longer become bottlenecks. Additionally, S3's high read bandwidth allows databases to load data quickly during startup and enables concurrent scanning of multiple files during queries, reducing latency.
Disaster Recovery and Multi-AZ Support: S3’s architecture inherently supports database replication across different Availability Zones (AZs), providing robust disaster recovery options out of the box.

Other reasons are that databases, through modifications, can gain functional and elasticity advantages:

Separation of Compute and Storage: By moving data to S3, databases can separate compute resources from storage resources. Compute resources can scale independently, and data (such as partitions or cache hotspots) can be rebalanced between nodes much faster without the need to physically move the data.
Shared Data Access: An S3 bucket can be accessed simultaneously by multiple virtual machines. This allows databases to support advanced functionalities such as shared access to data files and the creation of instant, low-cost clones.

The Future: Every Database Will Use S3

As we look ahead, it’s clear that the trend of using S3 for database storage is only accelerating. From OLAP systems like Snowflake and Rockset, to streaming platforms like WarpStream, to relational databases like PostgreSQL (with Neon and OrioleDB), and even embedded databases with DuckDB and SQLite, the entire database ecosystem is moving towards object storage.

The flexibility, scalability, and cost advantages of S3 make it an irresistible choice for modern databases. As more and more projects adopt S3 as their storage backend, it’s only a matter of time before every database—from enterprise systems like DB2 to lightweight embedded databases— will be rearchitected to use S3 or S3-compatible storage solutions.

In the next two to three years, this will no longer be a trend—it will be the norm.

What We Did

WeSQL, to the best of our knowledge, is the first open-source, S3-based database built within the MySQL ecosystem. We replaced InnoDB, the traditional B+ tree-based storage engine, with SmartEngine, an LSM-tree-based storage engine. To overcome S3's write latency, we implemented Raft replication, ensuring that transaction latency is decoupled from S3 write delays. Additionally, we addressed read latency by introducing a multi-tier caching system.

References

Hello, WeSQL

October 28, 2024 · 2 min read

Wei Cao

Founder at ApeCloud

MySQL is a popular open-source database with a large user base. Building on top of MySQL, cloud providers like AWS have developed cloud-native MySQL services, such as Aurora MySQL, which adopt a compute-storage separation architecture. These cloud-native services offer features like serverless operation, rapid scaling, and fast snapshots. However, they are proprietary and vendor-locked, meaning Aurora can only be used on AWS. This creates challenges for multi-cloud or on-premise users who might use Aurora in AWS environments but are forced to revert to RDS MySQL or open-source MySQL binaries in others.

We believe that an open-source MySQL distribution with a compute-storage separation architecture, capable of running on any cloud—including on-premise environments—is essential. This belief led to the development of WeSQL.

Similar to Neon, an open-source alternative to Aurora PostgreSQL, WeSQL also chose S3 as the storage backend due to its scalability, pay-as-you-go pricing model, exceptional reliability, and high bandwidth. For a detailed analysis of why S3 was chosen, you can refer to this article. In addition, WeSQL also supports MinIO, a S3-compatible object storage system often used for on-premise deployments. Looking ahead, WeSQL plans to expand its compatibility with more S3-compatible object storage systems.

WeSQL retains an almost unmodified MySQL Server layer, replacing InnoDB with SmartEngine, a Log-Structured Merge tree (LSM) storage engine derived from LevelDB and RocksDB. The choice of LSM is driven by its superior compatibility with S3-based storage, as it optimally handles sequential writes and bulk operations, which are well-suited for object storage systems like S3 that don’t support random writes.

At present, WeSQL is based on MySQL 8.0.35. However, due to the minimal nature of our patches, WeSQL can be easily adapted to newer MySQL versions, including MySQL 8.4, 9.0, and future releases.

WeSQL is still in its early stages and evolving rapidly. We welcome your support and feedback as we continue to build and improve it. Thank you for your interest, and we hope you enjoy exploring what WeSQL has to offer!

WeSQL Disaster Recovery Cluster Deployment Architecture​

Deploying the Disaster Recovery Cluster​

Create Source database in Region us-west-1​

Create Replica database in Region us-west-2​

Sysbench Test​

Verify Replication Status.​

Disaster Recovery​

Simulate primary database unavailability​

Replica Server Consistency Check​

Switchover​

Sysbench Test Recovery by Connecting to the New Primary Server​

Client-Side Connection Pool vs. Server-Side Connection Pool​

Connection Multiplexing: The Heart of Scalability​

Multiplexing Granularity: Balancing Efficiency and Isolation​

How WeScale Handles Connection Multiplexing​

Optimizing Connection's Database State: DatabaseName Rewrite​

Optimizing Connection's User State: Authentication and Authorization in Proxy​

Summary​

What We’ll Build​

Prerequisites​

Networking Requirements​

Create S3 Bucket​

Create IAM Roles​

Create Log Group​

Create ECS Cluster​

Create ECS Task Definition​

Create ECS Service​

Connect to the Database​

Pause and Resume the Database​

Other Considerations​

Automation with Python Scripts​

Test Explanations

Instance size​

Data volume and Random type​

Test Environment​

Test Method​

Results

throughput comparison​

95th Percentile Latency comparison​

Conclusions

Performance perspective​

Cost perspective​

Analysis

Why WeSQL outperforms AWS RDS in write performance​

Why WeSQL outperforms AWS RDS in read performance​

Key Idea​

Use Cases​

Example GitHub Actions Workflow​

Breakdown of the Workflow (Tips & Tricks)​

Connecting to the Database​

Data Persistence & Restarting​

A Note on Security​

Conclusion​

Early Adopters: OLAP Data Warehouses​

Beyond OLAP: Streaming Systems and Kafka Alternatives​

PostgreSQL and the Rise of S3 Storage Engines​

Embedded Databases: SQLite and DuckDB Embrace S3​

Enterprise Databases: DB2 Integrates with S3​

Why S3?​

The Future: Every Database Will Use S3​

What We Did​

References​

WeSQL Disaster Recovery Cluster Deployment Architecture

Deploying the Disaster Recovery Cluster

Create Source database in Region `us-west-1`

Create Replica database in Region `us-west-2`

Sysbench Test

Verify Replication Status.

Disaster Recovery

Simulate primary database unavailability

Replica Server Consistency Check

Switchover

Sysbench Test Recovery by Connecting to the New Primary Server

Client-Side Connection Pool vs. Server-Side Connection Pool

Connection Multiplexing: The Heart of Scalability

Multiplexing Granularity: Balancing Efficiency and Isolation

How WeScale Handles Connection Multiplexing

Optimizing Connection's Database State: DatabaseName Rewrite

Optimizing Connection's User State: Authentication and Authorization in Proxy

Summary

What We’ll Build

Prerequisites

Networking Requirements

Create S3 Bucket

Create IAM Roles

Create Log Group

Create ECS Cluster

Create ECS Task Definition

Create ECS Service

Connect to the Database

Pause and Resume the Database

Other Considerations

Automation with Python Scripts

Instance size

Data volume and Random type

Test Environment

Test Method

throughput comparison

95th Percentile Latency comparison

Performance perspective

Cost perspective

Why WeSQL outperforms AWS RDS in write performance

Why WeSQL outperforms AWS RDS in read performance

Key Idea

Use Cases

Example GitHub Actions Workflow

Breakdown of the Workflow (Tips & Tricks)

Connecting to the Database

Data Persistence & Restarting

A Note on Security

Conclusion

Early Adopters: OLAP Data Warehouses

Beyond OLAP: Streaming Systems and Kafka Alternatives

PostgreSQL and the Rise of S3 Storage Engines

Embedded Databases: SQLite and DuckDB Embrace S3

Enterprise Databases: DB2 Integrates with S3

Why S3?

The Future: Every Database Will Use S3

What We Did

References