Introduction
Version control, also known as Revision control or Source code management (SCM), is the management of multiple revisions of the same unit of information. It is most commonly used in engineering and software development to manage ongoing development of digital documents like application source code, art resources such as blueprints or electronic models and other critical information that may be worked on by a team of people. Changes to these documents are identified by incrementing an associated number or letter code, termed the revision number, revision level, or simply revision and associated historically with the person making the change. A simple form of revision control, for example, has the initial issue of a drawing assigned the revision number 1. When the first change is made, the revision number is incremented to 2 and so on.
revision control is any practice that tracks and provides control over changes to source code. Software developers sometimes use revision control software to maintain documentation and configuration files as well as source code. In theory, revision control can be applied to any type of information record. However, in practice, the more sophisticated techniques and tools for revision control have rarely been used outside of software development circles.
Why Version Control ?
As software is developed and deployed, it is extremely common for multiple versions of the same software to be deployed in different sites, and for the software's developers to be working simultaneously on updates. Bugs and other issues with software are often only present in certain versions (because of the fixing of some problems and the introduction of others as the program develops). Therefore, for the purposes of locating and fixing bugs, it is vitally important to be able to retrieve and run different versions of the software to determine in which version(s) the problem occurs. It may also be necessary to develop two versions of the software concurrently (for instance, where one version has bugs fixed, but no new features, while the other version is where new features are worked on).
At the simplest level, developers could simply retain multiple copies of the different versions of the program, and number them appropriately. This simple approach has been used on many large software projects. While this method can work, it is inefficient as many near-identical copies of the program have to be maintained. This requires a lot of self-discipline on the part of developers, and often leads to mistakes. Consequently, systems to automate some or all of the revision control process have been developed.
Storage Models
File locking
The simplest method of preventing concurrent access problems is to lock files so that only one developer at a time has write access to the central repository copies of those files. Once one developer checks out a file, others can read that file, but no one else is allowed to change that file until that developer checks in the updated version (or cancels the checkout).
Version merging
Most version control systems, such as CVS and Subversion, allow multiple developers to be editing the same file at the same time. The first developer to check in changes to the central repository always succeeds. The system provides facilities to merge changes into the central repository, so the improvements from the first developer are preserved when the other programmers check in.
The concept of a reserved edit can provide an optional means to explicitly lock a file for exclusive write access, even though a merging capability exists.
Distributed version control
Distributed systems inherently allow multiple simultaneous editing. In a distributed revision control model, there is no such thing as checking in or out. Instead, every programmer has a working copy that includes the complete repository. All changes are distributed by merging (pushing/pulling) between repositories. This mode of operation allows developers to work without a network connection, and it also allows developers full revision control capabilities without requiring permissions to be granted by a central authority. One of the leading proponents of distributed revision control is Linus Torvalds, the main developer of the Linux kernel. He made the GIT distributed version control now being used by the Linux kernel developers.
Some Popular Version Control Systems
CVS
CVS is extremely popular, and it does the job. In fact, when CVS was released, CVS was a major new innovation in software configuration management. However, CVS is now showing its age through a number of awkward limitations: changes are tracked per-file instead of per-change, commits aren't atomic, renaming files and directories is awkward, and its branching limitations mean that you'd better faithfully tag things or there'll be trouble later. Some of the maintainers of the original CVS have declared that the CVS code has become too crusty to effectively maintain. These problems led the main CVS developers to start over and create Subversion.
SVN
Subversion (SVN) is a new system, intending to be a simple replacement of CVS. Subversion is basically a re-implementation of CVS with its warts fixed, and it still works the same basic way (supporting a centralized repository). Like CVS, subversion by itself is intended to support a centralized repository for developers and doesn't handle decentralized development well; the svk project extends subversion to support decentralized development.
GNU Arch
GNU arch is a very interesting competitor, and works in a completely different way from CVS and Subversion. GNU Arch is released under the GNU GPL. GNU arch is fully decentralized, which makes it very work well for decentralized development (like the Linux kernel's development process). It has a very clever and remarkably simple approach to handling data, so it works very easily with many other tools. The smarts are in the client tools, not the server, so a simple secure ftp site or shared directory can serve as the repository, an intriguing capability for such a powerful SCM system. It has simple dependencies, so it's easy to set up too.
Bazaar
Bazaar is a decentralized revision control system. Revision control involves keeping track of changes in software source code or similar information, and helping people work on it in teams. Bazaar is a free software project with a large community of contributors, sponsored by Canonical Limited, the founders of Ubuntu and Launchpad. Bazaar is genuinely Free Software, released under the GNU GPL. It is written in Python and designed for correctness, performance, simplicity, and familiarity for developers migrating from CVS or Subversion.
Bazaar branches can be hosted on any web server, and uploaded over sftp, ftp, or rsync. For the fastest possible network performance, there is a smart server. Bazaar supports flexible work models: centralized like cvs or svn, commit offline, enforced code review when desired, and automatic regression testing. Decentralized revision control systems give people the ability to collaborate more efficiently over the internet using the bazaar development model. Using Bazaar, commit can be done to our local branches of our favorite free software projects without needing special permission.
Common Terminologies in Version Control
Repository
The repository is where the current and historical file data is stored, often on a server. Sometimes also called a depot.
Working copy
The working copy is the local copy of files from a repository, at a specific time or revision. All work done to the files in a repository is initially done on a working copy, hence the name. Conceptually, it is a sandbox.
Check-out
A check-out (or checkout or co) creates a local working copy from the repository. Either a specific revision is specified, or the latest is obtained.
Commit
A commit (check-in, ci or, more rarely, install or submit) occurs when a copy of the changes made to the working copy is written or merged into the repository.
Change
A change (or diff, or delta) represents a specific modification to a document under version control. The granularity of the modification considered a change varies between version control systems.
Change list
On many version control systems with atomic multi-change commits, a changelist, change set, or patch identifies the set of changes made in a single commit. This can also represent a sequential view of the source code, allowing source to be examined as of any particular changelist ID.
Update
An update (or sync) merges changes that have been made in the repository (e.g. by other people) into the local working copy.
Branch
A set of files under version control may be branched or forked at a point in time so that, from that time forward, two copies of those files may be developed at different speeds or in different ways independently of the other.
Merge
A merge or integration brings together two sets of changes to a file or set of files into a unified revision of that file or files.
- This may happen when one user, working on those files, updates their working copy with changes made, and checked into the repository, by other users. Conversely, this same process may happen in the repository when a user tries to check-in their changes.
- It may happen after a set of files has been branched, then a problem that existed before the branching is fixed in one branch and this fix needs merging into the other.
- It may happen after files have been branched, developed independently for a while and then are required to be merged back into a single unified trunk.
Dynamic stream
A stream (a data structure that implements a configuration of the elements in a particular repository) whose configuration changes over time, with new versions promoted from child workspaces and/or from other dynamic streams. It also inherits versions from its parent stream.
Revision
A revision or version is one version in a chain of changes.
Tag
A tag or release refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number.
Import
An import is the action of copying a local directory tree (that is not currently a working copy) into the repository for the first time.
Export
An export is similar to a check-out except that it creates a clean directory tree without the version control metadata used in a working copy. Often used prior to publishing the contents.
Conflict
A conflict occurs when two changes are made by different parties to the same document or place within a document. When the software is not intelligent enough to decide which change is 'correct', a user is required to resolve such a conflict.
Resolve
The act of user intervention to address a conflict between different changes to the same document.
Baseline
An approved revision of a document or source file from which subsequent changes can be made.
References
1. http://en.wikipedia.org/wiki/Revision_control - Version Control
2. http://www.dwheeler.com/essays/scm.html - Introduction to CVS, SVN, GNU Arch
3. http://bazaar-vcs.org/ - Bazaar Version Control
Subversion - An Introduction
Subversion is a free/open-source version control system. That is, Subversion manages files and directories over time. A tree of files is placed into a central repository. The repository is much like an ordinary file server, except that it remembers every change ever made to your files and directories. This allows you to recover older versions of your data, or examine the history of how your data changed. In this regard, many people think of a version control system as a sort of âtime machineâ.
Subversion can access its repository across networks, which allows it to be used by people on different computers. At some level, the ability for various people to modify and manage the same set of data from their respective locations fosters collaboration. Progress can occur more quickly without a single conduit through which all modifications must occur. And because the work is versioned, you need not fear that quality is the trade-off for losing that conduitâif some incorrect change is made to the data, just undo that change.
Some version control systems are also software configuration management (SCM) systems. These systems are specifically tailored to manage trees of source code, and have many features that are specific to software developmentâsuch as natively understanding programming languages, or supplying tools for building software. Subversion, however, is not one of these systems. It is a general system that can be used to manage any collection of files. For example, those files might be source codeâfor others, anything from grocery shopping lists to digital video mixdowns and beyond.
What is a Repository ?
Subversion is a centralized system for sharing information. At its core is a repository, which is a central store of data. The repository stores information in the form of a filesystem treeâa typical hierarchy of files and directories. Any number of clients connect to the repository, and then read or write to these files. By writing data, a client makes the information available to others; by reading data, the client receives information from others.
So why is this interesting? So far, this sounds like the definition of a typical file server. And indeed, the repository is a kind of file server, but it's not your usual breed. What makes the Subversion repository special is that it remembers every change ever written to it: every change to every file, and even changes to the directory tree itself, such as the addition, deletion, and rearrangement of files and directories.
When a client reads data from the repository, it normally sees only the latest version of the filesystem tree. But the client also has the ability to view previous states of the filesystem. For example, a client can ask historical questions like, âWhat did this directory contain last Wednesday?â or âWho was the last person to change this file, and what changes did he make?â These are the sorts of questions that are at the heart of any version control system: systems that are designed to record and track changes to data over time.
Features of Subversion
Directory versioning
CVS only tracks the history of individual files, but Subversion implements a âvirtualâ versioned filesystem that tracks changes to whole directory trees over time. Files and directories are versioned.
True version history
Since CVS is limited to file versioning, operations such as copies and renamesâwhich might happen to files, but which are really changes to the contents of some containing directoryâaren't supported in CVS. Additionally, in CVS you cannot replace a versioned file with some new thing of the same name without the new item inheriting the history of the oldâperhaps completely unrelatedâfile. With Subversion, you can add, delete, copy, and rename both files and directories. And every newly added file begins with a fresh, clean history all its own.
Atomic commits
A collection of modifications either goes into the repository completely, or not at all. This allows developers to construct and commit changes as logical chunks, and prevents problems that can occur when only a portion of a set of changes is successfully sent to the repository.
Versioned metadata
Each file and directory has a set of propertiesâkeys and their valuesâassociated with it. You can create and store any arbitrary key/value pairs you wish. Properties are versioned over time, just like file contents.
Choice of network layers
Subversion has an abstracted notion of repository access, making it easy for people to implement new network mechanisms. Subversion can plug into the Apache HTTP Server as an extension module. This gives Subversion a big advantage in stability and interoperability, and instant access to existing features provided by that serverâauthentication, authorization, wire compression, and so on. A more lightweight, standalone Subversion server process is also available. This server speaks a custom protocol which can be easily tunneled over SSH.
Consistent data handling
Subversion expresses file differences using a binary differencing algorithm, which works identically on both text (human-readable) and binary (human-unreadable) files. Both types of files are stored equally compressed in the repository, and differences are transmitted in both directions across the network.
Efficient branching and tagging
The cost of branching and tagging need not be proportional to the project size. Subversion creates branches and tags by simply copying the project, using a mechanism similar to a hard-link. Thus these operations take only a very small, constant amount of time.
Hackability
Subversion has no historical baggage; it is implemented as a collection of shared C libraries with well-defined APIs. This makes Subversion extremely maintainable and usable by other applications and languages.
Basic Operations with Subversion
Creating a Subversion Repository
A repository can be created using the svn tool svnadmin. For example, to create a svn repository named mysvn in /home/myuser/sourcecodes/ directory, open up a terminal, change the Current Directory to /home/myuser and issue the create command of the svnadmin tool, as follows
$ cd /home/myuser/sourcecodes
$ svnadmin create mysvn
This will create mysvn repository in the present working direcotry which is /home/myuser/sourcecodes
Adding files to the created repository
Now, as we have a repository available, we can add files into the repository to be available under version control. To do this, create a dummy directory structure as required and import them into the newly created subversion repository.
Most subversion repositories of software projects contain three basic directory in the top level named 'branches', 'trunk' and 'tags'. These three are placed in the root directory which is given the name of the project. Taking an example of the project being 'myproject', these are the steps to create a basic directory structure to the repository.
- Change the PWD to home directory,
$ cd /home/myuser
- Create a directory called myproject and change the PWD to it,
$ mkdir ./myproject $ cd myproject
- Create the three basic directories as stated above,
$ mkdir ./trunk $ mkdir ./branches $ mkdir ./tags
- Check whether everything is ok,
$ ls . branches tags trunk
- Change back to the home directory
$ cd /home/myuser
- Import the newly created directory structure to the repository using the import command,
$ svn import ./myproject file:///home/myuser/sourcecodes/mysvn/myproject -m 'first import'
In this statement, there are certain things to note down.
First, while using subversion on the client side, all user commands start with the word 'svn' followed by an option such as 'import', finally followed by arguments and additional options. The typical syntax for a svn user command is of the form
svn option argument [extra_options arguments..]
Thus, the above statement executes the import option. The given argument is the source directory and the address of the destination repository. In our case, the destination repository is in our local file system. It can also be in a remote location or hosted in the Internet.
In the case of a hosted repository whose url is known (ex. http://svn.myhost.com/public/mysvn), the same statement takes the following form,
$ svn import ./myproject https://svn.myhost.com/public/mysvn
The additional option in our case is a '-m', which specifies a message to be saved regarding what is being added. This s helpful when the logs are checked to find what was actually done with the repository.
Now, the basic directory structure has been added to the repository. In addition to just the basic directory structure, we can also have some files within them. These files will be added to the appropriate directory when import statement executes.
These are the basic steps of creating a repository and adding a basic file structure to it. Most probably in real life situations, there will be repository already available or your web host will help you in making a repository for you. Having seen the basic operations to be performed on the server end, let us see the operations which can be performed from the client side when is more important to us.
Checking out a working copy from the repository
What we created was a subversion repository, a central place where our files are going to be stored. This will act as a source for our files. the most basic operation to be done with an existing repository is to get the files present in the repository.
An important point to note at this juncture is that when we fetch a copy of the repository, we are fetching just a working copy of it which is independent to the content in the repository. To be more precise, a checked out copy is a local copy of the repository which contains the files which were present under the latest revision or state of the repository when we did a check out. Having a working copy does not guarantee us a permission to add files, update files or delete files from the repository. But, as we have a copy of the files, we have full freedom to play around with it without worrying that we are going to affect the original repository.
Another interesting thing which is hidden within this working copy is that it has some details regarding the revision of the repository it corresponds to as well as details regarding the source repository and our permissions over it. Thus, after some time, if we want to update our working copy to a latest version in the repository, we just need to issue an update command. Viola! All our files get updated to the latest revision, but if we had made some changes in the files either they will be merged with the new updates or will result in a conflict. Lets first check how to check out a working copy before wondering about resolving conflicts.
The repository we created is available at the location /home/myuser/sourcecodes/mysvn/myproject in our local file system. To get a working copy of it, we have to use the checkout (alias co)command with the name of the working copy we want as the second argument.
$ cd /home/myuser/sourcecodes/
$ svn checkout file:///home/myuser/sourcecodes/mysvn/myproject myprojectWC A branches A tags A trunk Checked out revision 1.
$ ls . mysvn myprojectWC
Thus we have obtained a local working copy named 'myprojectWC' of the repository 'myporject' under the subversion 'mysvn'. The output said what are the files being added into the local copy and finally what is the version currently being checked out.
Always keep an eye on the verbose messages given by svn, they are very informative
What if the repository had been hosted in the web ? This is how we can check out a copy of a repository hosted somewhere,
$ svn co https://svn.myhost.com/public/mysvn/myproject myprojectWC
Note: we used an alias 'co' to the 'checkout' option
Adding files and Committing changes
Now we have a local working copy of the repository, we can try adding files to it and trying to upload them into the repository. Any changed which is done on the working copy doesn't affect the repository until we do a 'commit'. It is equivalent to making a commitment that we are consciously modifying something in the repository and we hold the responsibility for the same. For a commit to be made we need a commit permission, which is got in the form of a username and password. We need to specify the username and password only once during the first commit and it is valid for the future commits made from the same working copy.
Let us create a file named 'myfirstfile.txt' and add it to the repository. To do this, create a file named myfirstfile.txt in 'trunk' directory of the repository. Then issue the add command to add it to the next commit and then issue the commit command.
Note: all commands are to be executed from the top level directory of the working copy. In our case, the present working directory PWD needs to be myprojectWC when issuing any svn commands. We can issue from any inner level directories as well, but the changes will be made only to the corresponding directory and within it.
$ cd myprojectWC/trunk
$ cat myfirstfile.txt This is my first file. Am going to add it to my repository. <CTRL-D>
$ cd ..
$ echo $PWD /home/myuser/mysourcecode/myprojectWC
$ svn add trunk/myfirstfile.txt A trunk/myfirstfile.txt
$ svn commit -m 'my firs file addition' Sending trunk/myfirstfile.txt Transmitting file data. Committed revision 2.
Though we have done a commit from the client end, for the revision to be updated in our local records we need to follow every commit command with an update command.
$ svn update At revision 2.
Some points to notice here are, the first revision was numbered 1. When we added file and committed, the revision changed to 2. Likewise, with every change in the repository, let it be addition of files, modification of file content, deletion of files or merging of files, the revision number is moved to the next value.
What actually happens is, instead of adding our new file to the revision 1 contents, it created a new copy of the existing revision, named it as revision 2 and added our new file to the revision 2. Thus, we have our revision 1 intact and we can, if we want to, check out the first revision from the repository. The main advantage of this is,
- If we want to work with earlier revisions of our repository, we always have it available at a single command.
- If we ever did a mistake and committed a wrong file or a broken code, we can 'roll-back' to the earlier revision safely. This erases the wrong commit we made to the repository, thus getting us all our precious files intact as in the previous state.
Handling Conflicts
Conflicts occur when we try to apply changes over a revision which is latest compared to the one over which we are working on. That is, the code has changed since we last updated our working copy and hence we need to first update our working copy before we can commit our changes.
A typical example will be that we are trying to commit some changes when our working copy is at revision 89, while the latest revision in the repository is 90. Thus our working copy is outdated with respect to the repository and we might have missed the changes made in revision 90. To solve this we need to update our working copy to version 90, make sure there is no conflict of code with the changes we made and then proceed with a commit.
The conflict is not serious when the file changed and file we are trying to commit are different. The conflict needs to be resolved only when the code which has changed in the latest revision with respect to the last revision and the code we are trying to commit are same. At such a situation we need to look at the code and decide which code needs to remain. That is we have to manually remove the conflicting part of the code. To solve this conflict we need to issue a 'resolved' command over the file in conflict. Then we can continue with the commit. Hence, it is always advisable to update our working copy before we start working. Conflict which rises due to changes made after we updated has to be resolved manually.
A typical resolve command looks like this,
$ svn resolved trunk/myfirstfile.txt
Removing Files
As files can be added to the repository, files can be also removed from the repository. Removing files from our working copy will not remove the files from the repository, rather it appears back on an update. The only way to remove a file from the repository is to mark it for removal and commit.
For example, to remove the myfirstfile.txt from the repository issue the following commands.
$ svn remove myfirstfile.txt
$ svn commit -m 'file removed'
Status, Info and Logs
There are few commands which will let us know some information about the working copy and the repository. Among them status, info and log are important ones, very useful in day to day operations.
$ svn info
This given information about the current revision of the working copy and who committed the current revision and when. It also gives the URL of the subversion repository, the root and UID of the repository.
$ svn log
This outputs the entire log of the repository, which gives information about the commits made till now. In the logs, the message added to the commits with the -m option are displayed.
$ svn status
This command gives the current status of the working copy. The main difference of status from info is that info just tells about the version and repository details while status shows the current state of files in the repository.
In the output of status, special symbols are used to mark files for corresponding actions to be taken on them. For example, 'A' marks files to be added to the repository, 'D' marks files to be deleted, 'M' marks modified files and a '?' marks unversioned files.
It is important to do a status check on the working copy before doing a commit, to make sure all the files are marked properly and there is no files left out unversioned. Thus, all the files we need to commit should not have a '?' mark over them.
Tag
Tags are nothing but simple snaphsots of project in time. Tags are useful in creating snapshots of directories and subdirectories with a human readable name. This is very useful during release periods where a certain revision can be tagged with a release number.
For example, the revision 4330 under the trunk can be tagged as release 2.10. It is easy to be handled with a human readable name like release 2.10 rather than revision 4330 of the trunk. Also, we can tag certain specific sub-directories like i386 sub-directory of trunk and release it as version 2.10-i386, where only the i386 code is released under the version 2.10 while others are not.
To tag the current revision in the trunk as release 1.0, issue the following command
$ svn copy https://svn.myhost.com/public/mysvn/myproject/trunk \ https://svn.myhost.com/public/mysvn/myproject/tags/release-1.0 \ -m 'Tagging release 1.0 of myproject'
Branch
Branch in a Subversion repository is a deviation from the current code in the trunk. A branch is created when doing something different from what is being done in the trunk, but it is still dependent on the code in the trunk.
For example, we are writing a software whose user interface is implemented using GTK tool kit. Now, we also want to support Qt tool kit and hence we create a branch for the same project that is the trunk. In the branch, we implement the Qt based user interface instead of GTK but still the other parts of the software are same.
Branches are generally created when an idea has to be implemented in the project, but not disturbing the development happening in the trunk rather by doing a parallel implementation of the same in the new branch. The advantage of this is, the developments happening in the trunk can still be merged with the branch and hence the branch doesn't lag too far behind the developments in the trunk. The vice versa is also possible, merging code from the branch into the trunk.
Thus, a branch is an independent line of development which doesn't interfere or get interfered by another line, still shares a common history with the other line.
Merging a Branch into Trunk
Often we would have a situation where we need to made the improvements which were done in the branch to me implemented into the trunk. This is done by finding the difference between the trunk and the branch, and merging the difference into the trunk.
The solution is not to create the difference between the current trunk revision and current branch revision and use it to merge the code from branch into trunk. The reason is, when a merge command is given, it actually creates a diff of the two codes in comparison, and in the above case it will generate a diff for not only the additions which were made in the branch, but also the deletions which happened in the trunk which in no way is connected to branch.
The solution is to find where the branch actually forked off from the trunk and use if as a starting point for the diff, while the ending point is the current revision in the branch, which is known as HEAD revision of the branch. The point of branching can be found by inpsecting the logs with the svn log command.
For an example, considering that the branch was created at revision 340 and the current HEAD is at 364, all the changes between revisions 340 and 364 in the branch needs to be merged into the trunk. Here is the commands to do it,
$ cd trunk $ svn update at revision 364 $ svn merge -r 340:HEAD https://svn.myhost.com/public/mysvn/myproject/branches/june_branch U db_pgsql.py U db_utils.py U urlfetch.py U urlfetch.tmpl U browseurl.tmpl $ svn status M db_pgsql.py M db_utils.py M urlfetch.py M urlfetch.tmpl M browseurl.tmpl $ svn commit -m 'Merging june_branch changes r340:364 into the trunk' Sending db_pgsql.py Sending db_utils.py Sending urlfetch.py Sending urlfetch.tmpl Sending browseurl.tmpl Transmitting file data ..... Committed revision 365
There are two important points to note, first is to do a svn update on the trunk before merging anything into the trunk and second is to indicate the revisions being considered for merging in the commit message. The former is important because the merge commands creates a diff in the working copy and then applies it to the trunk in the repository, hence requiring an updated trunk in the working copy as well. The latter is important because, when we continue development in the branch and at a later date we again want to merge the branch, we need not merge it from the beginning when the branch was created. Instead it is enough to merge from where the last merge happened.
The Basic Work Cycle
Although there are lot of commands and options with svn to do a lot of things, there is a general work cycle or sequence of commands to be issued when working with the working copy. This is to ensure that we make good use of the subversion facilities as well as do not waste our time solving problems which could have been prevented by following this work cycle. This work cycle is advised to be followed every time we work with our working copy.
- Update the working copy to the latest revision in the repository, so that we are aware of the changes made in the repository when you start your work.
- Make changes to the working copy, which is what happens during the time of working.
- Make use of the status, diff and revert commands to examine the changes made to the working copy.
- Resolve the conflicts which exists due to changes made by others. This can be done by hand editing, copying file onto the working file in conflict or by reverting the changes we made.
- Commit the changes made to the working copy, with proper commit messages
- Update the working copy to the commit which was just made, by issuing svn update.
[ Source : http://svnbook.red-bean.com/en/1.0/index.html ]
