When implementing a new database, it’s easy to fall into the trap of trying to quickly get something up and running without dedicating adequate time and effort to the design. This carelessness frequently leads to costly redesigns and reimplementations down the track. Designing a database is similar to drafting the blueprints for a house; it’s silly to start building without detailed plans. Importantly, good design allows you to extend the original building without having to pull everything down and start from scratch.
How Not to Develop a Database
Database design is probably not the most exciting task in the world, but it’s still important. Before we describe how to go about the design process, let’s look at an example of database design on the run.
Imagine we want to create a database to store student grades for a university computer science department. We could create a Student_Grades table to store grades for each student and each course. The table would have columns for the given names and the surname of each student as well as for each course they have taken, the course name, and the percentage result (shown as Pctg). We’d have a different row for each student for each of his courses:
GivenNames
Surname
CourseName
Pctg
John Paul
Bloggs
Web Database Applications
72
Sarah
Doe
Programming 1
87
John Paul
Bloggs
Computing Mathematics
43
John Paul
Blogs
Computing Mathematics
65
Sarah
Doe
Web Database Applications
65
Susan
Smith
Computing Mathematics
75
Susan
Smith
Programming 1
55
Susan
Smith
Computing Mathematics
80
This is nice and compact, and we can easily access grades for any student or any course. However, we could have more than one student called Susan Smith; in the sample data, there are two entries for Susan Smith and the Computing Mathematics course. Which Susan Smith got an 80? A common way to differentiate duplicate data entries is to assign a unique number to each entry. Here, we can assign a unique Student ID number to each student:
StudentID
GivenNames
Surname
CourseName
Pctg
12345678
John Paul
Bloggs
Web Database Applications
72
12345121
Sarah
Doe
Prgramming 1
87
12345678
John Paul
Bloggs
Computing Mathematics
43
12345678
John Paul
Bloggs
Computing Mathematics
65
12345121
Sarah
Doe
Web Database Applications
65
12345876
Susan
Smith
Computing Mathematics
75
12345876
Susan
Smith
Programming 1
55
12345303
Susan
Smith
Computing Mathematics
80
So, the Susan Smith who got 80 is the one with the Student ID number 12345303.
There’s another problem. In our table, John Paul Bloggs has failed the Computing Mathematics course once with 45 percent, and passed it with 65 percent in his second attempt. In a relational database, the rows form a set, and there is no implicit ordering between them; you might guess that the pass happened after the fail, but you can’t actually be sure. There’s no guarantee that the newer grade will appear after the older one, so we need to add information about when each grade was awarded, say by adding a year and semester (Sem):
StudentID
GivenNames
Surname
CourseName
Year
Sem
Pctg
12345678
John Paul
Bloggs
Web Database Applications
2004
2
72
12345121
Sarah
Doe
Programming 1
2006
1
87
12345678
John Paul
Bloggs
Computing Mathematics
2005
2
43
12345678
John Paul
Bloggs
Computing Mathematics
2006
1
65
12345121
Sarah
Doe
Web Database Applications
2006
1
65
12345876
Susan
Smith
Computing Mathematics
2005
1
75
12345876
Susan
Smith
Programming 1
2005
2
55
12345303
Susan
Smith
Computing Mathematics
2006
1
80
Notice that the Student_Grades table has become a bit bloated: the student ID, given names, and surname are repeated for every grade. We could split up the information and create a Student_Details table:
StudentID
GivenNames
Surname
12345121
Sarah
Doe
12345303
Susan
Smith
123456578
John Paul
Bloggs
12345876
Susan
Smith
and keep less information in the Student_Grades table:
StudentID
CourseName
Year
Sem
Pctg
12345678
Web Database Applications
2004
2
72
12345121
Programming 1
2006
1
87
12345678
Computing Mathematics
2005
2
43
12345678
Computing Mathematics
2006
1
65
12345121
Web Database Applications
2006
1
65
12345876
Computing Mathematics
2005
1
75
12345876
Programming 1
2005
2
55
12345303
Computing Mathematics
2006
1
80
To look up a student’s grades, we’d need to first look up her Student ID from the Student_Details table and then read the grades for that Student ID from the Student_Grades table.
There are still issues we haven’t considered. For example, should we keep information on a student’s enrollment date, postal and email addresses, fees, or attendance? Should we store different types of postal address? How should we store addresses so that things don’t break when a student changes his address?
Implementing a database in this way is problematic; we keep running into things we hadn’t thought about and have to keep changing our database structure. Clearly, we can save a lot of reworking by carefully documenting the requirements and then working through them to develop a coherent design.