Dev/Test
My Story as a Rookie Developer
When I first joined a company as a software development engineer, I was eager to take on my first task. I was assigned to implement a new feature, so, following the Git Flow methodology, I created a feature branch to start my work.
I wrote some code and needed to connect to a database to debug my feature branch code. We have an Integration Environment that corresponds to the develop
branch in Git Flow. This environment connects to a separate database populated with sanitized data synchronized from Production. It's crucial to keep the Integration Environment relatively stable because other teams rely on our services, and the testing team needs to use it as well.
However, since my feature involved changes to the database schema, my colleagues advised me to test thoroughly in my Local Development Environment before connecting to the Integration Environment's database.
This led to my first challenge: I spent a significant amount of time setting up the database service locally, but I had no data to work with. I initially tried to manually construct some data but quickly abandoned this approach because the data was complex and had intricate dependencies.
Next, I attempted to dump data from the Integration Environment. After going through approvals from higher management, I finally succeeded and imported the Integration Environment's data into my local database. This process took me about a week.
After completing the development on my feature branch and testing it in my Local Development Environment, I finally merged my code into the Integration Environment. A few days later, I prepared to release it to the Staging Environment.
That's when I encountered my second challenge: our Staging Environment, despite deploying code from the unreleased release
branch, was connected to the Production Environment's database! This meant that deploying my code to the Staging Environment would require me to modify the Production Environment's database schema. This was absurd—I needed to test in the Staging Environment before releasing to Production, but to test in the Staging Environment, I had to release to Production first, creating a paradox.
With great apprehension, I submitted an approval request for the database schema changes in the Production Environment. After a few days, the changes were approved and went live in Production. I then tested my functionality in the Staging Environment, and finally, my code was released to the Production Environment. Fortunately, there were no bugs, but the overall experience was quite nerve-wracking.
Problems of My Story
In my journey as a rookie developer, I faced two significant problems:
-
Difficulty in local testing: Setting up a local database with realistic data was a cumbersome process. Manually constructing data was impractical due to complex dependencies, and obtaining a data dump from the Integration Environment required high-level approvals and took a considerable amount of time.
-
Risky deployment process: The Staging Environment's reliance on the Production database schema forced me to apply schema changes directly to the Production Environment before testing. This approach was risky and counterintuitive, as it could introduce untested changes into Production, potentially impacting live users. Now, as an experienced application developer, I learned some insights and best practices on how to effectively use Git Flow, deploy code branches to corresponding environments, and connect to different databases:
-
Feature branches and Local Development Environment:
feature
branches are usually created from thedevelop
branch and correspond to the Local Development Environment. Often, multiple feature branches exist in parallel, and they may all modify the database schema. Therefore, it's best for each feature branch to have its own isolated database, including both schema and data. This isolation prevents conflicts and ensures that developers can work independently. However, this setup requires solutions for data synchronization and sanitization to provide meaningful test data without exposing sensitive information. -
Develop branch and Integration Environment: The
develop
branch integrates the latest code and needs to remain stable. It corresponds to the Integration Environment, which is accessible to developers for testing integrated features. The database connected to this environment should contain either mock data or sanitized production data. Mock data can be generated to mimic real-world scenarios, while sanitized data ensures that no sensitive information is exposed. This practice allows developers to test their code against realistic datasets without compromising security. -
Release branch and Staging Environment: The
develop
branch is merged into arelease
branch, which is then deployed to the Staging Environment. The data in this environment should be very close to the Production Environment to accurately reflect how the application will perform in real-world conditions. However, it should not connect directly to the Production Database. Instead, an isolated staging database that mirrors the production data should be used. This approach allows thorough testing of the release candidate without risking the integrity or security of the production data. -
Main branch and Production Environment: The
release
branch is ultimately merged into themain
branch. This merge should be a fast-forward merge to ensure that the code being released tomain
has been fully tested and vetted. Themain
branch is then deployed to the Production Environment, which connects to the Production Database. By ensuring that only tested code reaches themain
branch, we maintain the reliability and stability of the production application.
By following these best practices, we can align our code branches with their respective environments and databases effectively. This alignment minimizes risks, enhances collaboration among team members, and streamlines the development and deployment processes. It addresses the challenges of data consistency, security, and efficient testing that I experienced as a rookie developer.
However, knowing these best practices raised new questions: How can we effectively implement them?
-
How to quickly create isolated databases for feature branches with schema and mock data?
-
When multiple
feature
branches modify the database schema and merge intodevelop
, how do we merge the schema changes into the database that the Integration Environment connects to? What if conflicts occur? -
How can the Staging Environment's database be synchronized from the Production Environment? Do we need to build a custom CDC (Change Data Capture) pipeline? Can we ensure data sanitization?
-
When synchronizing database schemas across environments, how do we prepare DDL statements? Is it possible to achieve declarative schema changes?
-
When merging
release
intomain
, Git can enforce fast-forward merges, but how can we enforce this for the database schema?
Our Solution
To address these challenges, WeSQL introduced the Database Branch feature. This functionality allows developers to create a branch of the database, much like branching code in Git. Here's how it solves the issues:
-
Quickly create isolated databases for
feature
branches with schema and mock data- Solution: With Database Branching, developers can instantly create isolated database branches from the main database. These branches include the current schema and, optionally, the data, which can be sanitized or replaced with mock data. This allows each feature branch to have its own isolated database, ensuring that developers can work independently without interfering with each other. Moreover, during the CI/CD process of a feature branch, it can connect to its own isolated database, facilitating automated testing and integration. See How to create a branch.
-
Merging database schema changes from multiple
feature
branches intodevelop
- Solution: WeSQL's Database Branching automatically computes the schema differences between a feature branch and the
develop
branch. When merging, it generates the necessary DDL statements to apply the changes to thedevelop
database. If conflicts are detected—such as two feature branches adding columns with the same name but different types—WeSQL alerts the developers, allowing them to resolve the conflicts similarly to how code merge conflicts are handled in Git. This ensures that schema changes are merged smoothly and consistently without manual DDL management. See How to merge a branch back.
- Solution: WeSQL's Database Branching automatically computes the schema differences between a feature branch and the
-
Synchronizing the Staging Database from Production with data sanitization
- Solution: WeSQL provides an ETL (Extract, Transform, Load) feature that allows the Staging Database to be treated as a branch of the Production Database. Developers can initiate an ETL process where the Production Database serves as the source, and the Staging Database is the target branch. During this process, data can be transformed and sanitized according to predefined rules, ensuring that sensitive information is not exposed in the Staging Environment. This eliminates the need to build custom CDC pipelines and simplifies data synchronization between environments. See Data Streaming and Transformation.
-
Automating declarative schema changes and enforcing fast-forward merges for database schemas
-
Solution: WeSQL's Database Branching automatically computes the schema differences between branches and generates the necessary DDL statements to transition the database from one schema version to another. Developers do not need to manually prepare DDL statements. By using declarative schema definitions, WeSQL ensures that the target database's schema matches the desired state, simplifying schema migrations and maintaining consistency across environments.
Additionally, when merging the
release
branch intomain
, WeSQL can enforce a fast-forward-like merge for the database schema by using themerge_option=override
. This option generates DDL statements that, when executed on the Production Database, will update its schema to match exactly with the Staging Database's schema. This approach ensures that only tested and approved schema changes are applied to the Production Environment, mirroring the fast-forward merge policy in Git and maintaining a linear and predictable schema evolution. See View Schema Differences.
-
By aligning database branches with code branches, WeSQL allows the database to flow with your Git workflow. Developers can work on new features that require database schema changes in isolation, merge changes seamlessly, and ensure that database updates are applied consistently across all environments.
Database Should Flow with Your Git Branch
Integrating database branching into our development workflow ensures that the database evolves alongside the codebase. With WeSQL's Database Branch feature, we achieve:
-
Seamless Integration: Each code branch corresponds to a database branch, ensuring that changes in code and database schema are developed and tested together. This tight coupling eliminates discrepancies and synchronization issues.
-
Efficient Collaboration: Developers can work independently on their own feature branches with isolated databases, reducing conflicts and merge complications. This fosters a more collaborative and efficient development environment.
-
Automated Schema Management: WeSQL automates the detection of schema differences and the generation of necessary DDL statements, simplifying the process of synchronizing databases across environments. This declarative approach to schema changes reduces manual effort and errors.
-
Secure Data Handling: By using ETL processes with data sanitization for environments like staging, WeSQL ensures that testing is conducted on realistic datasets without compromising sensitive production data.
Allowing the database to flow with your Git branch creates a cohesive and agile development process. It aligns database changes with code changes, enhances collaboration, and ensures consistency and security across all stages of development and deployment.