2 posts tagged with "aws"

Host Database Serverlessly On AWS Fargate - Just Like Your Applications

December 26, 2024 · 4 min read

WeScale Contributor

Running a database in the cloud often means paying for idle capacity and dealing with persistent storage overhead. AWS ECS + Fargate changes that equation by letting you pay only for actual compute time and seamlessly integrate with S3 for storage, effectively separating compute from data. This guide shows you how to host a MySQL-compatible database (WeSQL) using Fargate’s pay-as-you-go model while storing your data durably on S3—so you don’t pay for idle compute.

What We’ll Build

By following this guide, you’ll launch a MySQL-compatible database that:

Runs on AWS Fargate (no dedicated servers)
Uses S3 for permanent data storage
Starts/stops on demand, ensuring you only pay when it’s actually running

Here’s the architecture and resource dependencies:

images/ecs-dependencies.svg

Prerequisites

An AWS account with appropriate permissions
Basic understanding of AWS VPC, ECS, IAM
A VPC with public subnets and security groups configured

Networking Requirements

A VPC with a public subnet
An Internet Gateway attached
A security group allowing:
- Inbound: TCP port 3306 (MySQL)
- Outbound: All traffic

Create S3 Bucket

Create an S3 bucket for database files—this is where WeSQL will store all data.

images/create_s3_bucket.png

Create IAM Roles

Next, create an IAM role to grant ECS tasks permission to access the S3 bucket. Attach this role to your task definition.

images/create_iam_role.png

Create Log Group

Set up a CloudWatch Log Group to capture container logs:

images/log_group.png

Create ECS Cluster

Create an ECS cluster to organize Fargate tasks:

images/create_cluster.png

Create ECS Task Definition

Define how your WeSQL container runs in the ECS Task Definition:

images/create_task_definition.png

Below is an example JSON for WeSQL:

{
    "family": "wesql-fargate-task-def",
    "taskRoleArn": "<your-task-role-with-s3-permission>",
    "executionRoleArn": "<ecs-default-ecsTaskExecutionRole>",
    "networkMode": "awsvpc",
    "requiresCompatibilities": ["FARGATE"],
    "cpu": "1024",
    "memory": "2048",
    "containerDefinitions": [
        {
            "name": "wesql-server",
            "image": "apecloud/wesql-server:8.0.35-0.1.0_beta2.37",
            "cpu": 1024,
            "memory": 2048,
            "portMappings": [
                {
                    "containerPort": 3306,
                    "hostPort": 3306,
                    "protocol": "tcp"
                }
            ],
            "essential": true,
            "environment": [
                {
                    "name": "MYSQL_CUSTOM_CONFIG",
                    "value": "[mysqld]\nport=3306\nlog-bin=binlog\ngtid_mode=ON\nenforce_gtid_consistency=ON\nlog_slave_updates=ON\nbinlog_format=ROW\nobjectstore_provider=aws\nrepo_objectstore_id=tutorial\nbranch_objectstore_id=main\nsmartengine_persistent_cache_size=1G"
                },
                {
                    "name": "WESQL_OBJECTSTORE_ACCESS_KEY",
                    "value": "<REPLACE_ME>"
                },
                {
                    "name": "WESQL_DATA_DIR",
                    "value": "/data/mysql/data"
                },
                {
                    "name": "WESQL_OBJECTSTORE_SECRET_KEY",
                    "value": "<REPLACE_ME>"
                },
                {
                    "name": "MYSQL_ROOT_PASSWORD",
                    "value": "passwd"
                },
                {
                    "name": "WESQL_CLUSTER_MEMBER",
                    "value": "127.0.0.1:13306"
                },
                {
                    "name": "WESQL_OBJECTSTORE_REGION",
                    "value": "us-west-2"
                },
                {
                    "name": "WESQL_LOG_DIR",
                    "value": "/data/mysql/log"
                },
                {
                    "name": "WESQL_OBJECTSTORE_BUCKET",
                    "value": "wesql-fargate-test"
                }
            ],
            "logConfiguration": {
                "logDriver": "awslogs",
                "options": {
                    "awslogs-group": "/ecs/wesql-fargate-task-def",
                    "awslogs-region": "us-west-2",
                    "awslogs-stream-prefix": "ecs"
                }
            }
        }
    ]
}

Create ECS Service

Now create an ECS Service to keep the task running:

images/create_service2.png

Select the right subnets and security group:

images/create_service3.png

Check logs in CloudWatch to confirm the database is running:

images/create_service5.png

Connect to the Database

With the service active, you can connect to WeSQL using any MySQL client:

mysql -h <FARGATE_PUBLIC_IP> -P 3306 -uroot -ppasswd

images/connect_to_wesql.png

Pause and Resume the Database

WeSQL uses S3 for storage, so the database container in Fargate is stateless and can be stopped and restarted without losing data. This is ideal for saving costs—simply stop the database when not in use:

# Pause the database (set desired count to 0)
aws ecs update-service \
    --cluster <your-cluster-name> \
    --service <your-service-name> \
    --desired-count 0

# Resume the database (set desired count to 1)
aws ecs update-service \
    --cluster <your-cluster-name> \
    --service <your-service-name> \
    --desired-count 1

Or automate this with AWS Lambda:

import boto3

ecs = boto3.client('ecs')

def lambda_handler(event, context):
    action = event.get('action')
    cluster = event.get('cluster')
    service = event.get('service')
    desired_count = 0 if action == 'pause' else 1

    ecs.update_service(
        cluster=cluster,
        service=service,
        desiredCount=desired_count
    )

Other Considerations

Service Discovery: Use AWS Service Discovery for a stable DNS endpoint.
Secrets Management: Store credentials in AWS Secrets Manager.
Scheduled Start/Stop: Use AWS EventBridge to automate cost-saving start/stop schedules.

Automation with Python Scripts

Check out this simple Python script using AWS SDK (boto3) to automate resource creation. You can also adapt it or use Terraform, CloudFormation, etc.

By combining Fargate’s pay-as-you-go model with S3-based storage, you get a flexible “serverless” MySQL experience—no idle compute charges, no bulky EBS volumes, just on-demand database hosting that fits your actual usage.

WeSQL Outperforms AWS RDS MySQL Single AZ -- 4-6x the Speed, 1/2 the cost

December 17, 2024 · 7 min read

Dongsheng Zhao

SmartEngine:The Next-Generation Cloud-Native Storage Engine

When people hear "S3" and "OLTP database" in the same sentence, skepticism is a common reaction. S3 is known for its low cost, but its performance and latency are often seen as unsuitable for the demands of OLTP workloads.

At WeSQL, we’ve previously explained our approach in articles like Why S3 and Persistent Cache. We use S3 for its low cost, reliability, and scalability as durable storage, while addressing performance and latency challenges with efficient write and read caching mechanisms.

Still, questions remain: Can an S3-based OLTP database perform well in practical use cases?

In this blog, we’ll show how WeSQL achieves significant storage cost savings while delivering several times the computational efficiency of AWS RDS. By combining low cost with strong performance, WeSQL challenges the traditional expectations of OLTP databases with S3-based storage.

Test Explanations

Sysbench is a widely used tool for testing OLTP performance, but its final metrics can vary greatly depending on factors like instance specifications, data volume, concurrency levels, and access patterns.

In the following Sysbench tests, we have designed the testing scenarios to closely resemble real-world business use scenarios.

Instance size

We chose the 4-core 16GB specification because This specification is widely used for small to medium-sized database instances in production, making the test results more relevant to typical workloads. And the 4-core 16GB setup provides a well-balanced combination of CPU and memory, making it suitable for handling most OLTP workloads efficiently without over-provisioning resources.

Data volume and Random type

In typical online database environments, the total data scale usually ranges from 200GB to 300GB,but only a portion of this data is actively accessed. Following the "80/20 rule," the actively accessed data typically amounts to 40GB to 60GB. To simulate real-world business scenarios, we chose a test data volume of 48GB (100 tables with 2 million rows each), which falls within this active data range. The data is accessed using a uniform pattern to ensure all parts of the dataset are evenly accessed, reflecting common usage patterns. This setup creates a realistic test environment for accurate performance evaluation.

With 16GB of memory available, the 48GB data volume exceeds the memory capacity by a large margin. This forces the system to rely on disk-based operations, effectively testing the storage engine’s performance in areas such as I/O efficiency and caching strategies.

Test Environment

Compute Instances:
- AWS RDS Single-AZ:
  - Instance type: db.m5d.xlarge (4vCPU, 16GB RAM)
  - Equipped with a 150GB local NVMe SSD for temporary storage. Persistent storage relies on EBS volumes.
- WeSQL EC2:
  - Instance type: m5d.xlarge (4vCPU, 16GB RAM)
  - Also equipped with a 150GB local NVMe SSD, which WeSQL uses for caching to optimize read & update performance.
Storage Backend:
- AWS RDS:
  - EBS gp3 volumes (200GB, 125MB/s, 3000 IOPS) for persistent storage.
- WeSQL:
  - EBS gp3 volumes (100GB, 125MB/s, 3000 IOPS) for logs and metadata.
  - Primary data storage is offloaded to S3.

WeSQL is designed to minimize dependency on expensive EBS storage by leveraging S3 for data storage, so it uses a small EBS volume to store logs and metadata.

Client:
- Sysbench 1.0.20
- EC2: t3a.2xlarge (8vCPU, 32GB RAM)
Server:
- Database Version:
  - AWS RDS: 8.0.35
  - WeSQL: Built on MySQL 8.0.35
- Deployment:
  - Both systems were deployed as single-node instances for a direct performance comparison.
Network Configuration:
- Availability Zone: All components—including AWS RDS, WeSQL EC2 instances, and the Sysbench client—were deployed within the same AWS availability zone to reduce network latency and ensure consistent test conditions.

Test Method

We used the Sysbench oltp_read_write workload to evaluate performance. The test configuration was as follows:

DataSet: Prepared 100 tables, each containing 2 million rows of data.
Concurrency: Tests were conducted with concurrency levels of 2, 4, 8, 16, 32, 64, 96, and 128.
Duration: Each concurrency level ran for 300 seconds.
Interval: A 60-second interval was applied before starting the next concurrency level.

Results

We tested both AWS RDS and WeSQL under the oltp_read_write workload using varying levels of concurrency.

throughput comparison

images/wesql_rds_throughput.png

95th Percentile Latency comparison

images/wesql_rds_rt.png

Conclusions

Performance perspective

Based on the test results, WeSQL demonstrates peak performance that is nearly 4 times higher than AWS RDS using the same resources. Additionally, WeSQL provides significantly better latency compared to AWS RDS.

In real-world business scenarios, low latency is often critical. For instance, under a 32-thread load, WeSQL achieves a QPS of 7356.72, with a P95 latency of 110.66ms. In contrast, at the same latency level, AWS RDS achieves only 1232.69 QPS. This means that WeSQL has approximately 6 times the throughput of AWS RDS when comparing performance at equivalent latency thresholds.

Cost perspective

WeSQL also provides a significant storage cost advantage. In this test scenario, where the overall data volume is relatively small, our costs are still nearly half of AWS RDS Single-AZ. As data volume grows, this cost advantage becomes even more pronounced.

AWS RDS Single-AZ (db.m5d.xlarge): USD 0.419 per hour
AWS RDS Multi-AZ (db.m5d.xlarge): USD 0.838 per hour
AWS EC2 (m5d.xlarge): USD 0.226 per hour
Above prices are based on the us-east-1 availability zone.

Although we used a single-node deployment in our test, real-world environments typically require cross-AZ disaster recovery for resilience and fault tolerance. In WeSQL’s architecture, data durability is ensured by continuously uploading data to S3, which inherently provides cross-AZ disaster recovery capabilities. As a result, a single-data-node WeSQL deployment offers cross-AZ disaster recovery capabilities, while costing nearly 1/4 of AWS RDS Multi-AZ.

To ensure no data is lost during an AZ failure, including logs stored on EBS, WeSQL’s multi-AZ deployment adds two additional log nodes. In upcoming articles, we will provide a detailed analysis of the cost and performance differences between WeSQL and AWS RDS Multi-AZ.

Analysis

By separating the read and write QPS from the above tests for comparison, it is clear that WeSQL delivers superior performance over AWS RDS in both read and write operations.

images/qps_write.png

images/qps_read.png

Why WeSQL outperforms AWS RDS in write performance

SmartEngine's Write-Optimized LSM-Tree Data Structure

The storage engine used by WeSQL, SmartEngine, is built on an LSM-Tree architecture.This design minimizes write amplification compared to the B+ Tree structure used by InnoDB in AWS RDS,resulting in more efficient write operations and better overall write performance.
S3's High Write Bandwidth Beats EBS GP3

SmartEngine uses S3 as the persistent storage layer, taking advantage of its higher bandwidth to accelerate flush and compaction operations.In contrast, AWS RDS relies on gp3 volumes for persistent storage,where the limited bandwidth of gp3 can become a bottleneck during dirty page flushing.This leads to I/O constraints that hinder write performance in RDS.

Why WeSQL outperforms AWS RDS in read performance

Low-Latency Local NVMe SSDs Cache

SmartEngine makes use of local NVMe SSDs as a read cache, which provides several key advantages:
- Separation of Read and Write I/O: By isolating reads from writes, WeSQL reduces I/O contention, resulting in smoother and faster read operations.
- Higher Performance of NVMe SSDs: Local NVMe SSDs offer significantly better performance compared to the gp3 volumes used by AWS RDS, enabling faster data access and lower read latencies.
Optimizations for LSM-Tree’s Read Challenges

While the LSM-Tree architecture traditionally underperforms B+ Tree in read-heavy workloads, SmartEngine incorporates a series of optimizations to bridge this gap and achieve read performance comparable to InnoDB. These include:
- Multi-priority caching mechanisms to prioritize hot data.
- Bloom filters to minimize unnecessary disk reads.
- Asynchronous I/O for better concurrency and throughput.
- Latch-free metadata operations for lower contention and higher throughput.

What We’ll Build​

Prerequisites​

Networking Requirements​

Create S3 Bucket​

Create IAM Roles​

Create Log Group​

Create ECS Cluster​

Create ECS Task Definition​

Create ECS Service​

Connect to the Database​

Pause and Resume the Database​

Other Considerations​

Automation with Python Scripts​