Migrate from centralized version control to Git
By Matt Cooper
The switch from centralized version control to Git is more than just learning new commands. Git uses a fundamentally different model for storing previous versions of code. Instead of storing a linear series of changes to files, Git represents your code as a graph of snapshots called commits. Lightweight branching and local repositories isolate developers from changes made by the rest of the team.
A successful migration path recognizes these challenges and requires that you address:
- Revising tools and processes
- Deciding on a branching strategy
- Tracking history prior to the switch
- Removing binaries and executables
- Training your team
- Migrating the code
We’ll cover each of these areas, offering guidance and best practices learned from helping dozens of teams make the switch. Your migration to Git will be more successful if you understand Git’s strengths and trade-offs. A bit of reshaping your codebase and training your team will ease the transition tremendously.
Tools and processes
Changing version control systems will naturally disrupt your development workflows. Take this opportunity to re-evaluate other aspects of your development process.
- How will the build system access the code?
- When will tests run? Adopting continuous integration will provide a strong safety net for your project.
- Does your release management strategy need to change?
- How will code reviews happen? Git works really well with pull requests.
First, decide on a Git branching strategy. Two common strategies are GitFlow and GitHub Flow, which you might look to for inspiration. Then decide which of your legacy branches, if any, you want to bring forward into Git. Document the mapping between legacy branches and the new ones in Git so that your team understands where they should commit new work.
You may be tempted to migrate your source code’s history to Git. At first glance, a commit maps relatively well to a changeset, checkin, or whatever your previous version control system used.
Version control tracks more than just files. Your history has metadata associated with it as well: who made the change, when and how it was made, and so on. To migrate history, you need to port over both content and metadata. Centralized systems often support change types (undelete, rollback, and rename, for example) that simply don’t exist in Git. The history you migrate won’t be faithful to the history represented in your older version control system.
We strongly recommend against migrating full history due to the lossy nature of the migrated history. Instead, bring over only the tips of the branches you need.
Migrating history is a lot of effort and you probably don’t need it. This has been our observation time and time again across a range of customers, including many projects at Microsoft. Resources are better spent on other areas of the migration which have a higher return on investment.
Moving years of history into Git increases the storage and bandwidth cost of the repo. Centralized VC servers have plenty of storage and bandwidth, and developers only download a fraction of the history. When you move to Git, every developer has a full local copy of the repo and its history. This multiplies the negative impact of migrating history while providing little benefit.
Keep your old history around
During and after a migration, engineers will need access to recent history. Keep your old version control online for a while for reference purposes while new work happens in Git. The need for immediate access to the old system will diminish over time. Once most people aren’t accessing the old system in their daily work any more, archive or shut down the old system.
Very large teams or highly regulated projects (such as the Windows team at Microsoft) may wish to plant breadcrumbs in in Git to point back at the previous version control system. A simple example would be a text file at the root of repo, added as the first Git commit, pointing to the URL of the previous system. Each migrated branch could have an entry in a text file documenting which branch and changeset it came from.
Binary files and tools
Due to Git’s design, binaries and other large files with contents that change entirely when updated are not a good fit for Git. Each developer has a complete copy of the repo, so everyone pays the cost to download every version of every file. Source code, which Git is optimized for, is compact and highly compressible. Binaries typically are neither.
Small, infrequently-changed assets like icons are fine to include in Git. If an asset is large, changes frequently, or is an output of your code, you should not store it in Git. Take the migration to Git as an opportunity to separate out binaries from the rest of your codebase. Once you commit a binary to a Git repo, it will come along with every future clone even if you no longer use it.
Learn more about managing large files with Git.
One of the biggest challenges in migrating to Git is helping developers understand how Git stores changes and how commits form a history of development. It’s not enough to just prepare a “cheat sheet” mapping commands in the old system to Git commands. Your developers have to stop thinking of history in linear terms and get comfortable with the commit graph.
Since people learn in different ways, plan on making several types of training material available. Live, lab-based training with an expert instructor works well for some people. The Git book is available for free online and is a great starting point. Microsoft also offers a Git walkthrough designed to rapidly get someone up to speed. Designate key members of the team as experts, and make sure they’re encouraged to help others.
After you’ve analyzed your code and started training your team, it’s time to actually migrate the code. We recommend you do one or more test runs into a testing repo. Before you do the real migration, you’ll want to make sure:
- All code files have migrated
- There are no stray binaries
- You can push the repo to Team Services
- All branches are available
- Tests are passing
- Builds are successful
We recommend you do the big migration at a time when few people are working. Trying to keep multiple VC systems working in parallel saps resources and runs into the same fidelity issues covered earlier. Pick a date where development switches from the old system to Git. If your old VC system supports it, set the old system to read-only before that date. Otherwise, you may have to do a second migration wave to catch any changes made since you started.
The actual commands you’ll enter vary based on which system you’re coming from. We have a detailed article about TFVC, and the commands will be similar on other systems. These steps assume you’re only keeping the tips of branches, since we recommend against migrating history. You should turn this list of commands into a script that can be run repeatably.
For the mainline or first branch you wish to migrate
- Check out the latest revision or changeset in your legacy VC system.
- Remove binary assets, tools, etc.
- Port VC-specific directives that you need to retain in Git (e.g. convert .tfignore files to .gitignore).
- Delete files or data which bind your code to the legacy VC system (e.g. the $tf directory).
- Optional: Create “breadcrumbs” or pointers back to the old system.
- Initialize a new Git repo in your main branch’s folder and add VSTS as a remote named
- Add and commit your files.
- Push the repo to VSTS.
For any remaining branches you wish to migrate
- In the Git repo, checkout a new branch.
- Replace the working directory’s contents with the contents of your legacy branch.
- Add and commit all files. Only the files which differ between branches will be added.
- Push your new branch to VSTS.
- Checkout master again to prepare for the next branch.
|Team workflows||Determine how builds will run|
|Determine when tests will run|
|Develop a release management process|
|Move your code reviews to pull requests|
|Branching strategy||Pick a Git branching strategy|
|Document the branching strategy, including why it was selected and how legacy branches map|
|History||Decide how long to keep legacy VC running|
|Identify branches which need to migrate|
|If needed, create “breadcrumbs” to help engineers navigate back to the legacy system|
|Binaries and tools||Identify which binaries and undiffable files to remove from the repo|
|Decide on an approach for large files, such as Git-LFS|
|Decide on an approach for delivering tools and libraries, such as NuGet|
|Training||Identify training materials|
|Plan training: events, written material, videos, etc.|
|Identify members of the team to serve as local Git experts|
|Code migration||Run multiple test runs to ensure the migration will go smoothly|
|Identify and communicate a time to make the cutover|
|Create the new Git repo on VSTS|
|Migrate the mainline branch first, followed by any additional branches needed|
Get started with unlimited free private Git repos in Visual Studio Team Services.