Skip to main content

Data Archive

What's the problem

Enterprises often need to store vast amounts of historical data while ensuring it's accessible for compliance or analytics. This data may need to be retained for years or decades, leading to significant long-term storage costs (e.g., several TBs to tens of TBs).

A common solution is to periodically transfer MySQL data to a NoSQL database like HBase to reduce storage costs. However, this approach introduces several operational and usability challenges, such as:

  • SQL Compatibility Issues MySQL and NoSQL systems like HBase differ significantly in data models and query languages. While MySQL follows a well-established SQL standard, NoSQL lacks uniformity, making it harder for teams familiar with SQL to interact with the data. This often requires additional training and complicates data retrieval for analytics or reporting.

  • Increased Operational Complexity Managing both MySQL and NoSQL databases adds infrastructure overhead. Each system has unique requirements for monitoring, scaling, and security, increasing operational complexity and costs. Additionally, maintaining expertise for both systems strains DevOps and database administration teams.

  • Data Consistency Challenges Ensuring data consistency across MySQL and NoSQL systems can be difficult. Periodic data transfers increase the risk of inconsistencies, as real-time updates in MySQL may not reflect immediately in HBase, leading to potential data discrepancies—especially critical for financial or regulatory data.

Our solution

WeSQL tackles the challenges of long-term data storage by introducing a cost-effective data archiving solution directly within MySQL. This solution eliminates the need to transfer data to external NoSQL databases, allowing enterprises to retain and access large volumes of historical data directly in MySQL, while significantly reducing storage costs.

Key features of WeSQL’s SmartEngine include:

  • Persist Data on Object Storage SmartEngine leverages object storage (e.g., S3) for virtually unlimited capacity, high reliability (99.999999999%), and 7-10x lower costs compared to EBS for data archiving.

  • LSM-Tree for High Compression The LSM-Tree architecture delivers high compression and efficient write performance, ideal for archival storage.

  • Columnar Storage for Better Compression The columnar format groups similar data types, optimizing compression and reducing storage costs.

  • Optimized Data Format and Algorithm SmartEngine’s data format and filling algorithm maximize space efficiency, offering long-term cost savings and improved data density.

Evaluation

Using the percona TPC-C testing tool, we can generate a large dataset with the following command:

./tpcc.lua --mysql-socket=/work_space/test/install_release/run/mysqld.sock --mysql-user=root --mysql-db=tpcc 
--threads=512 --report-interval=10 --tables=10 --scale=10000 --use_fk=0 --db-driver=mysql prepare

For the same scale of data, InnoDB consumes 9.5TB of storage, while WeSQL SmartEngine reduces this to just 785GB. This significant reduction demonstrates SmartEngine’s superior storage efficiency and compression capabilities.

note

In this test, the tables use columnar storage format with 64KB block size, and the bloom filter is disabled.

Cost Savings with AWS

When calculating storage costs on AWS, SmartEngine’s advantages become even more apparent, achieving over 80x cost savings. This is because InnoDB relies on EBS, which is significantly more expensive than S3. Additionally, InnoDB typically requires two or more EBS replicas due to its lower reliability (99.8%-99%), which means out of every 1000 EBS volumes, one might experience data loss annually. This further increases storage costs.

FeatureInnoDBSmartEngine
Storage MediumEBS (gp3)S3 (Standard)
Data Storage Requirement9.5 TB785 GB
Total Storage Requirement19 TB (Primary + Standby)785 GB
Unit Storage Cost$0.08 USD/GB/Month$0.023 USD/GB/Month
Total Storage Cost$1,556.48$18.05