A basic introduction in to the world of relational databases.
A long time ago in a college far far away I remember asking a friend who was a few years further along in his studies than I what a relational database was. While he did a fine job of explaining what it was, he couldn't for the life of him tell me why it was better - in the end we gave up and left only partially enlightened.
After having used relational database for many years I now can't imagine life without them. I've recently found myself, however, trying to explain relational databases to clients - I scoured the web for simple explanations and couldn't find much so in the end relented and decided to write a quick and simple exlpanation myself, so here goes:
Assuming that you have used a spreadsheet, table ( on a computer or on paper ) or wall planner for storing information then you are familiar with the concept of storing information like this:
| Month | Hours Worked | Employee Name | Employee Social Security | Employee Address | Branch / Office Name | Branch / Office Address |
|---|---|---|---|---|---|---|
| January 2008 | 24 | Mr Joe Bloggs | 4365124 | 5 Shaftsbury Lane, Albatrossville, FL, 32432 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| January 2008 | 32 | Mr Roger Dodger | 134324 | 17 Topcroft Avenue, Miami, FL, 23122 | ATC Illumination | 11a Bagshot Row, Miami , FL, 32432 |
| January 2008 | 7 | Mr Sherlock Holmes | 468721 | 15 Baker Street, Albatrossville, FL, 32432 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| February 2008 | 35 | Mr Joe Bloggs | 4365124 | 5 Shaftsbury Lane, Albatrossville, FL, 2432 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| February 2008 | 17 | Mr Roger Dodger | 134324 | 17 Topcroft Avenue, Miami, FL, 23122 | ATC Illumination | 11a Bagshot Row, Miami , FL, 32432 |
| February 2008 | 19 | Mr Sherlock Holmes | 468721 | 15 Baker Street, Albatrossville, FL, 32432 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| March 2008 | 78 | Mr Joe Bloggs | 4365124 | 5 Shaftsbury Lane, Albatrossville, FL, 2432 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| March 2008 | 56 | Mr Roger Dodger | 134324 | 17 Topcroft Avenue, Miami, FL, 23122 | ATC Illumination | 11a Bagshot Row, Miami , FL, 32432 |
| March 2008 | 43 | Mr Sherlock Holmes | 468721 | 15 Baker Street, Albatrossville, FL, 32432 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
The above table is a simple example of a "flat" database - all the data is stored in rows. As you can see, a great deal of data is duplicated here - So what? - you may ask. Well therein lies one of the key advantages of relational databases.
Problem One: Updatability
If Sherlock Holmes points out to the personnel department that he was knighted two years ago and his title should now be "Sir", the accounts department needs to go back and change every entry of "Mr" to "Sir" - this means they need to find and replace 24 months of entries. In revenge they mis-handled Sherlock's tax return and now he's being punished by the government. This is bad.
At this point I'd like to point out that when I was first explained this, I pointed out to my friend that I would simply use the "Find and Replace" tool to make all the changes for me. This is all well and good but if your database has 150,000 rows then the "Find and Replace" tool is going to check every single one of them, taking a long time and probably bringing the database to its knees - not the way to do it.
Problem Two: Size / Performance
Databases are all about performance and once a database starts getting too big, it can start to move at a crawl - searches take a long time, pulling up customer data keeps "timing out", etc.
The example table above is miniscule compared to the databases used in real-world applications but we will use it to calculate the size/ performance benefit of switching it to a relational database. At present the database above is contains 1,203 characters, we will attempt to reduce the number of characters while storing the same data.
The key to relational databases ( theres a pun there you'll understand later ) is to break the information in to seperate tables to eliminate duplicate information - this process is referred to as normalisation ( normalization in the U.S. :-)
So lets begin by storing all of the employee information in a seperate table:
| Employee Name | Employee Social Security | Employee Address |
|---|---|---|
| Mr Joe Bloggs | 4365124 | 5 Shaftsbury Lane, Albatrossville, FL, 32432 |
| Mr Roger Dodger | 134324 | 17 Topcroft Avenue, Miami, FL, 23122 |
| Mr Sherlock Holmes | 468721 | 15 Baker Street, Albatrossville, FL, 32432 |
This leaves our original table looking like this:
| Month | Hours Worked | Branch / Office Name | Branch / Office Address |
|---|---|---|---|
| January 2008 | 24 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| January 2008 | 32 | ATC Illumination | 11a Bagshot Row, Miami , FL, 32432 |
| January 2008 | 7 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| February 2008 | 35 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| February 2008 | 17 | ATC Illumination | 11a Bagshot Row, Miami , FL, 32432 |
| February 2008 | 19 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
Now this is all well and good but as it stands we have no way of knowing which employee worked which ours in which branch so we need to create a link between the two tables. Nothing is smaller to store in a database than a number so we use a number to link the tables - this number is called the key. We give every employee a unique ID (or key) and we enter that key in place of all their employee information in the larger table:
| Employee ID | Employee Name | Employee Social Security | Employee Address |
|---|---|---|---|
| 1 | Mr Joe Bloggs | 4365124 | 5 Shaftsbury Lane, Albatrossville, FL, 32432 |
| 2 | Mr Roger Dodger | 134324 | 17 Topcroft Avenue, Miami, FL, 23122 |
| 3 | Mr Sherlock Holmes | 468721 | 15 Baker Street, Albatrossville, FL, 32432 |
| Employee ID | Month | Hours Worked | Branch / Office Name | Branch / Office Address |
|---|---|---|---|---|
| 1 | January 2008 | 24 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| 2 | January 2008 | 32 | ATC Illumination | 11a Bagshot Row, Miami , FL, 32432 |
| 3 | January 2008 | 7 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| 1 | February 2008 | 35 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| 2 | February 2008 | 17 | ATC Illumination | 11a Bagshot Row, Miami , FL, 32432 |
| 3 | February 2008 | 19 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| 1 | March 2008 | 78 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| 2 | March 2008 | 56 | ATC Illumination | 11a Bagshot Row, Miami , FL, 32432 |
| 3 | March 2008 | 43 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
Our database is now 837 characters - a big different when you consider our main database table has only 9 rows and in reality this would likely be thousands.
Despite the data now being broken in two two seperate tables, whatever program or system you use will join the data back together before showing it to the end user - in most cases the end user will never know the difference on the face of things ( except that there is a performance different and changing employee data in one place changes it globally )
Well as you can see, our main table still has a great deal of duplicate information in it for branches so we break that off in to a seperate table as well, creating a key as before to maintain the link between the two tables - our database will now look like this:
| Employee ID | Employee Name | Employee Social Security | Employee Address |
|---|---|---|---|
| 1 | Mr Joe Bloggs | 4365124 | 5 Shaftsbury Lane, Albatrossville, FL, 32432 |
| 2 | Mr Roger Dodger | 134324 | 17 Topcroft Avenue, Miami, FL, 23122 |
| 3 | Mr Sherlock Holmes | 468721 | 15 Baker Street, Albatrossville, FL, 32432 |
| Branch ID | Branch / Office Name | Branch / Office Address |
|---|---|---|
| 1 | ATC Lighting Systems | 243 Glory Road, Albatrossville, FL, 32432 |
| 2 | ATC Illumination | 11a Bagshot Row, Miami , FL, 32432 |
| Employee ID | Month | Hours Worked | Branch ID |
|---|---|---|---|
| 1 | January 2008 | 24 | 1 |
| 2 | January 2008 | 32 | 2 |
| 3 | January 2008 | 7 | 1 |
| 1 | February 2008 | 35 | 1 |
| 2 | February 2008 | 17 | 2 |
| 3 | February 2008 | 19 | 1 |
| 1 | March 2008 | 78 | 1 |
| 2 | March 2008 | 56 | 2 |
| 3 | March 2008 | 43 | 2 |
Our database now contains exactly the same information as it did previously but is only 443 characters in size - our original was 1,203 characters, thats 63% smaller! This figure changes exponentially the larger your "flat database" is so on our tiny example you can really see the benefit.