What are Relational Databases and Why Use Them?

A basic introduction in to the world of relational databases.

A long time ago in a college far far away I remember asking a friend who was a few years further along in his studies than I what a relational database was. While he did a fine job of explaining what it was, he couldn't for the life of him tell me why it was better - in the end we gave up and left only partially enlightened.

After having used relational database for many years I now can't imagine life without them. I've recently found myself, however, trying to explain relational databases to clients - I scoured the web for simple explanations and couldn't find much so in the end relented and decided to write a quick and simple exlpanation myself, so here goes:

So - What is a relational database?

Assuming that you have used a spreadsheet, table ( on a computer or on paper ) or wall planner for storing information then you are familiar with the concept of storing information like this:

Month Hours Worked Employee Name Employee Social Security Employee Address Branch / Office Name Branch / Office Address
January 2008 24 Mr Joe Bloggs 4365124 5 Shaftsbury Lane, Albatrossville, FL, 32432 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
January 2008 32 Mr Roger Dodger 134324 17 Topcroft Avenue, Miami, FL, 23122 ATC Illumination 11a Bagshot Row, Miami , FL, 32432
January 2008 7 Mr Sherlock Holmes 468721 15 Baker Street, Albatrossville, FL, 32432 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
February 2008 35 Mr Joe Bloggs 4365124 5 Shaftsbury Lane, Albatrossville, FL, 2432 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
February 2008 17 Mr Roger Dodger 134324 17 Topcroft Avenue, Miami, FL, 23122 ATC Illumination 11a Bagshot Row, Miami , FL, 32432
February 2008 19 Mr Sherlock Holmes 468721 15 Baker Street, Albatrossville, FL, 32432 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
March 2008 78 Mr Joe Bloggs 4365124 5 Shaftsbury Lane, Albatrossville, FL, 2432 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
March 2008 56 Mr Roger Dodger 134324 17 Topcroft Avenue, Miami, FL, 23122 ATC Illumination 11a Bagshot Row, Miami , FL, 32432
March 2008 43 Mr Sherlock Holmes 468721 15 Baker Street, Albatrossville, FL, 32432 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432

The above table is a simple example of a "flat" database - all the data is stored in rows. As you can see, a great deal of data is duplicated here - So what? - you may ask. Well therein lies one of the key advantages of relational databases.

Problem One: Updatability
If Sherlock Holmes points out to the personnel department that he was knighted two years ago and his title should now be "Sir", the accounts department needs to go back and change every entry of "Mr" to "Sir" - this means they need to find and replace 24 months of entries. In revenge they mis-handled Sherlock's tax return and now he's being punished by the government. This is bad.

At this point I'd like to point out that when I was first explained this, I pointed out to my friend that I would simply use the "Find and Replace" tool to make all the changes for me. This is all well and good but if your database has 150,000 rows then the "Find and Replace" tool is going to check every single one of them, taking a long time and probably bringing the database to its knees - not the way to do it.

Problem Two: Size / Performance
Databases are all about performance and once a database starts getting too big, it can start to move at a crawl - searches take a long time, pulling up customer data keeps "timing out", etc.

The example table above is miniscule compared to the databases used in real-world applications but we will use it to calculate the size/ performance benefit of switching it to a relational database. At present the database above is contains 1,203 characters, we will attempt to reduce the number of characters while storing the same data.

Relational Databases: Seeing is Believing

The key to relational databases ( theres a pun there you'll understand later ) is to break the information in to seperate tables to eliminate duplicate information - this process is referred to as normalisation ( normalization in the U.S. :-)

So lets begin by storing all of the employee information in a seperate table:

Employee Name Employee Social Security Employee Address
Mr Joe Bloggs 4365124 5 Shaftsbury Lane, Albatrossville, FL, 32432
Mr Roger Dodger 134324 17 Topcroft Avenue, Miami, FL, 23122
Mr Sherlock Holmes 468721 15 Baker Street, Albatrossville, FL, 32432

This leaves our original table looking like this:

Month Hours Worked Branch / Office Name Branch / Office Address
January 2008 24 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
January 2008 32 ATC Illumination 11a Bagshot Row, Miami , FL, 32432
January 2008 7 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
February 2008 35 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
February 2008 17 ATC Illumination 11a Bagshot Row, Miami , FL, 32432
February 2008 19 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432

Now this is all well and good but as it stands we have no way of knowing which employee worked which ours in which branch so we need to create a link between the two tables. Nothing is smaller to store in a database than a number so we use a number to link the tables - this number is called the key. We give every employee a unique ID (or key) and we enter that key in place of all their employee information in the larger table:

Employee ID Employee Name Employee Social Security Employee Address
1 Mr Joe Bloggs 4365124 5 Shaftsbury Lane, Albatrossville, FL, 32432
2 Mr Roger Dodger 134324 17 Topcroft Avenue, Miami, FL, 23122
3 Mr Sherlock Holmes 468721 15 Baker Street, Albatrossville, FL, 32432
Employee ID Month Hours Worked Branch / Office Name Branch / Office Address
1 January 2008 24 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
2 January 2008 32 ATC Illumination 11a Bagshot Row, Miami , FL, 32432
3 January 2008 7 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
1 February 2008 35 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
2 February 2008 17 ATC Illumination 11a Bagshot Row, Miami , FL, 32432
3 February 2008 19 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
1 March 2008 78 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
2 March 2008 56 ATC Illumination 11a Bagshot Row, Miami , FL, 32432
3 March 2008 43 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432

Our database is now 837 characters - a big different when you consider our main database table has only 9 rows and in reality this would likely be thousands.

Despite the data now being broken in two two seperate tables, whatever program or system you use will join the data back together before showing it to the end user - in most cases the end user will never know the difference on the face of things ( except that there is a performance different and changing employee data in one place changes it globally )

So what next?

Well as you can see, our main table still has a great deal of duplicate information in it for branches so we break that off in to a seperate table as well, creating a key as before to maintain the link between the two tables - our database will now look like this:

Employee ID Employee Name Employee Social Security Employee Address
1 Mr Joe Bloggs 4365124 5 Shaftsbury Lane, Albatrossville, FL, 32432
2 Mr Roger Dodger 134324 17 Topcroft Avenue, Miami, FL, 23122
3 Mr Sherlock Holmes 468721 15 Baker Street, Albatrossville, FL, 32432
Branch ID Branch / Office Name Branch / Office Address
1 ATC Lighting Systems 243 Glory Road, Albatrossville, FL, 32432
2 ATC Illumination 11a Bagshot Row, Miami , FL, 32432
Employee ID Month Hours Worked Branch ID
1 January 2008 24 1
2 January 2008 32 2
3 January 2008 7 1
1 February 2008 35 1
2 February 2008 17 2
3 February 2008 19 1
1 March 2008 78 1
2 March 2008 56 2
3 March 2008 43 2

Our database now contains exactly the same information as it did previously but is only 443 characters in size - our original was 1,203 characters, thats 63% smaller! This figure changes exponentially the larger your "flat database" is so on our tiny example you can really see the benefit.