Data Archive
What's the problem
Enterprises often need to store vast amounts of historical data while ensuring it's accessible for compliance or analytics. This data may need to be retained for years or decades, leading to significant long-term storage costs (e.g., several TBs to tens of TBs).
A common solution is to periodically transfer MySQL data to a NoSQL database like HBase to reduce storage costs. However, this approach introduces several operational and usability challenges, such as:
-
SQL Compatibility Issues MySQL and NoSQL systems like HBase differ significantly in data models and query languages. While MySQL follows a well-established SQL standard, NoSQL lacks uniformity, making it harder for teams familiar with SQL to interact with the data. This often requires additional training and complicates data retrieval for analytics or reporting.
-
Increased Operational Complexity Managing both MySQL and NoSQL databases adds infrastructure overhead. Each system has unique requirements for monitoring, scaling, and security, increasing operational complexity and costs. Additionally, maintaining expertise for both systems strains DevOps and database administration teams.
-
Data Consistency Challenges Ensuring data consistency across MySQL and NoSQL systems can be difficult. Periodic data transfers increase the risk of inconsistencies, as real-time updates in MySQL may not reflect immediately in HBase, leading to potential data discrepancies—especially critical for financial or regulatory data.
Our solution
WeSQL tackles the challenges of long-term data storage by introducing a cost-effective data archiving solution directly within MySQL. This solution eliminates the need to transfer data to external NoSQL databases, allowing enterprises to retain and access large volumes of historical data directly in MySQL, while significantly reducing storage costs.
Key features of WeSQL’s SmartEngine include:
-
Persist Data on Object Storage SmartEngine leverages object storage (e.g., S3) for virtually unlimited capacity, high reliability (99.999999999%), and 7-10x lower costs compared to EBS for data archiving.
-
LSM-Tree for High Compression The LSM-Tree architecture delivers high compression and efficient write performance, ideal for archival storage.
-
Columnar Storage for Better Compression The columnar format groups similar data types, optimizing compression and reducing storage costs.
-
Optimized Data Format and Algorithm SmartEngine’s data format and filling algorithm maximize space efficiency, offering long-term cost savings and improved data density.
Evaluation
Using the percona TPC-C testing tool, we can generate a large dataset with the following command:
./tpcc.lua --mysql-socket=/work_space/test/install_release/run/mysqld.sock --mysql-user=root --mysql-db=tpcc
--threads=512 --report-interval=10 --tables=10 --scale=10000 --use_fk=0 --db-driver=mysql prepare
For the same scale of data, InnoDB consumes 9.5TB of storage, while WeSQL SmartEngine reduces this to just 785GB. This significant reduction demonstrates SmartEngine’s superior storage efficiency and compression capabilities.
In this test, the tables use columnar storage format with 64KB block size, and the bloom filter is disabled.
Cost Savings with AWS
When calculating storage costs on AWS, SmartEngine’s advantages become even more apparent, achieving over 80x cost savings. This is because InnoDB relies on EBS, which is significantly more expensive than S3. Additionally, InnoDB typically requires two or more EBS replicas due to its lower reliability (99.8%-99%), which means out of every 1000 EBS volumes, one might experience data loss annually. This further increases storage costs.
Feature | InnoDB | SmartEngine |
---|---|---|
Storage Medium | EBS (gp3) | S3 (Standard) |
Data Storage Requirement | 9.5 TB | 785 GB |
Total Storage Requirement | 19 TB (Primary + Standby) | 785 GB |
Unit Storage Cost | $0.08 USD/GB/Month | $0.023 USD/GB/Month |
Total Storage Cost | $1,556.48 | $18.05 |