How To Go About Designing A Relational DatabasePublished: 20-Aug-2014
This is an article about designing a relational database. In this case, we will be designing a membership database based on the membership application forms from a couple of four-wheel drive clubs, and combining the needs articulated in these forms into a database that should work for any organisation. But we’re really doing this for fun.
The relational database application we will use to build our relational database with is Microsoft Access®. There’s no particular reasoning behind this (apart from the value of the built-in reporting). We could just as easily use Filemaker Pro or MySQL or SQlite, or DB2 or SQL Server. But we’re not. We’re using Access®. Enjoy the ride.
As mentioned previously, in this case, we’re looking at designing a database based on published data entry forms. In the cases I’ve chosen, the following patterns emerged:
- Address information is desirable
- Contact phone numbers and emails are desirable
- First and Last Names of individuals are required
As these are four-wheel drive clubs, vehicle information is also required, including registration and whether or not low-range is a feature of the vehicle.
We won’t get too pedantic about specific field requirements at this point. What we’re going to focus on at first is determining the tables required and the relationships between the tables. We’re going to build tables that include a minimum amount of repeating data. The trade off will be that the queries used to build the reports and forms will be more complicated, but we’re going to create some workarounds to that problem as we go.
We’re also going to create the minimum amount of relationships possible. Yes, we’ll have some relationships with referential integrity enforced, but we’re also going to see if we can get away with table relationships that have a relationship, but not mapped out and enforced by Access®’ relationship editor. We will instead be relying on forms, queries and scripts to manage certain relationships. These types of non-enforced relationships will be to repeating lookup values, such as vehicle makes and models. Whilst it will be important to correctly define a vehicle, it is not required to enforce referential integrity to make that happen.
Something else we’ll do is change our minds as we go. If an idea is no longer working, we’ll adjust the design. Just like we might in a real working situation where our database might strain under larger data requirements or changing customer needs. Or simply because we made a decision earlier on that is no longer ideal, and we need to change gears a little to get our application back on track.
Probably the trickiest part with database design is normalizing the data. Normalizing the data involves reducing the repeating information to a minimum. In a relational database, repeating data should be replaced with numbers. For humans, numbers can be pretty meaningless. But numbers are easier for computers to utilize. What we want is for the computer to do the heavy lifting, and for the human to reap the rewards of the computer doing the heavy lifting.
A simple example to illustrate what we mean is by discussing addresses. In a table of addresses, we have at the minimum the following required fields:
- Post Code
We could add more fields, but we’re not going to get so detailed for this exercise. In fact, for our database, we’re not even going to use the “Number” field, and instead we will simply combine “Number” and “Street” together. This database is for a club membership list, not a delivery enterprise.
Back to topic. From the above, to some people it will be apparent that some address information will eventually repeat. For example, if a club membership caters to members only from one country, then the country field might even be considered redundant. We’re going to include the country field anyway, because we’re planning for members traveling overseas or living overseas as part of this exercise. Other fields that will contain repeating information will be the City/Suburb/Town field, the State/Province field, and the Post Code field. In the USA, there are fifty states. In Australia, there are only six states and two territories. We could very easily repeat the names of the states and countries for every record we create in the database. The way we’re going to do it though is to use a number to represent a city (or suburb or town) and relate that number to a smaller table with only unique place names in it in the case of the City/Suburb/Town table. And we'll do the same for State/Province, Post Code and Country field data. This will keep the address table as small as possible. It will create extra work for the developer though in order to make sure the necessary lookup tables work correctly both for data entry and for reporting. But that’s what we’re paid for. A good developer will make these complications seamless to the end-user. The end-user should only notice smooth lines, not nuts and bolts and a side-affect should be a smaller overall database footprint and reasonable to excellent query and report performance.
We’re going to also enforce referential integrity on certain tables. We haven’t touched on the idea of referential integrity before, so we'll stop for a moment and discuss it.
In simple terms, referential integrity means that if a record is deleted in a parent table, it will cascade that delete down through the related records in each of the child tables. It also forces records to be created in a logical order within the overall structure. As a simple example, let’s think of an apple tree. We can’t have an apple without an apple tree. If we destroy the tree, we destroy all of the apples on the tree. If we destroy an apple, we only destroy one apple on the tree, leaving all the other apples and limbs intact. If we destroy a limb of the tree, we destroy a number of apples (those on that limb which we destroyed), but we leave the apples on all the other limbs in peace, and the tree remains otherwise intact as well.
Referential integrity also stipulates that we can’t have an apple without a limb for it to grow from, nor a tree to support the limb. So if we wanted to have an apple, we’d have to plant a tree first.
Database relationships with referential integrity enforced create benefits by describing what data needs to be added in what order, and what happens when data at a certain level within the database is deleted. Referential integrity helps prevent orphan records remaining when a parent record is deleted without regard to what should happen to the related child records.
Referential integrity is enforced by in-built logic within a relational database application such as Microsoft Access®, and once it’s assigned, it’s less work for the developer. If it creates more work for the developer, then it’s possible the developer has created a poor schema to suit the solution requirements.
What We'll Need To Start
By analyzing a few forms, plus associated information such as club constitutions and other published club documents, the main tables that have been determined as being required in order to collect the necessary membership data is as follows:
- Member IDs - some manner of schema to identify new and existing members.
- Addresses - some manner of locating members in case we need to mail them.
- Contacts - who are the members, what are their names, and direct methods for communication.
- Vehicles - record of vehicle data. Registration details, insurance details, make and model details.
- Payments - record of payments.
Hold On Right There - How Do You Know That This Is How To Do It?
That’s a good question. In this case, we’re going on a few assumptions. Sometimes, assumptions are all that you have. In other cases, your customer may provide guidance as to what data to collect and in what manner to collect it. Instructions can and will guide the design of any application. In cases such as this, we’re using intuition based on common themes found in a few ‘new member’ form documents found on the internet, plus some additional information determined from the About pages of these same Clubs. In a project, this same information could be communicated by stakeholders.
Unsurprisingly, there are some common data points that are requested amongst unrelated organizations that have a common interest. And, if a design needs to be tweaked, then it will be tweaked. It’s not uncommon to see an application evolve over time. Sometimes change will be due to determining a “better way” to collect or store data. Sometimes change will be due to a shift in requirements. In any case that requires change, the important thing is to be able to adapt.
What Will Be The Parent Table?
Another good question. The answer to this one is based on judgement. Get it right, and all is good. Get it wrong, and be prepared to throw your data model out and start again. The key is, get it close to right, and hopefully any new learning about the project will either slot easily into the existing design or will only require minor tweaking to adapt the existing model.
To answer the question though, let’s come back to our apple tree analogy. We need to know what will be the apple, what will be a limb, and what will be the trunk of the tree, where the limbs and apples hang.
What you may recognize from describing our needs in this way is we that have described some one-to-many relationships (one of the most common relational database concepts). Our apples form a many side to the limbs, and the limbs form a one side to the apples, but a many side to the trunk. The trunk is indirectly related to the apples, but forms a one sided relationship to the limbs. In other words, a limb can only be attached to one trunk, and a set of apples can only be attached to one limb. Sure, there can be two trunks in a tree, but a limb cannot be attached to more than one trunk. How many times have you seen an apple growing from two separate limbs?
It’s these concepts that we want to keep in mind when designing our table relationships. What pieces are common, what pieces are unique, and what pieces form direct relationships to other pieces. The unique pieces tend to deserve their own table, and the non-unique pieces tend to deserve to be described as a number. The bits that are numbers should relate to rows in other tables.
In the case of a membership database, we might be tempted to immediately assume the person is the member. And you know, that’s not a bad assumption. However, in this case, we’d probably be throwing out the data model at some point if we chose to design based on this assumption. And we may be making this assumption if we only referred to membership forms and no other information. It is almost always best to refer to more than one source of information in order to deduce the best possible solution to a problem.
In the case of memberships, it is the membership itself that is the most unique item. We know this from our reading the About pages and constitutions and other published documents of the Clubs that are the inspiration for this project. What keeps coming up in the documentation is that a member can be defined as an individual person or as a family unit, or as something in between. Since a family can be more than one person, but the family unit is described as one member in the related documentation, then we know that a person cannot be the unique feature of the membership. A person therefore must be an apple to the limb if we go back to our apple tree analogy. We’ll next need to figure out if a person is directly related to the member table or to another table.
To answer this, we need to ask what features of families versus addresses are common. Well, generally speaking, one or more people tend to live at an address. Sometimes, we have family units who reside in separate abodes (not common certainly, but possible). We can sometimes have a situation where one person resides at more than one address, but in the case of our the application we’re building, we’re going to force a single address for people who might have that kind of circumstance. So, our address table is going to have a mailing address and a home address. It seems we’re concluding that our people are related to addresses for the purpose of our table relationship designing. In that case, our apples are still people, and they are related to a limb, the limb being an address table. There may be more than one apple on each limb (that is, more than one person living at an address).
The same deductive analysis is applied to the rest of the data points required on the form. The question of the vehicle does deserve to be dealt with in some detail though. It is tempting at first glance to relate the vehicle to a person. However, in doing this, it is implying that the vehicle can only be driven by that one person. In the case of a car club, the vehicle is owned by the member, rather than an individual person, and we’ve already determined that a member can be a family unit (in other words a member can represent more than one person). We also want the flexibility to allow a member to have more than one vehicle, should the member choose to have more than one vehicle. We can of course allow this by relating the vehicle to an individual, but if the vehicle’s partner should be allowed to drive the vehicle, the membership database may not recognize that, which could lead to miscommunication on trips or outings if the membership list is referred to in a strict manner. The easy fix is to then relate the vehicle to addresses. Another option is to create a vehicle table with a many-to-many relationship to contacts.
We know that more than one person can live at an address, and we also know that more than one vehicle can be parked at an address, which implies that a vehicle table could be directly related to the address table. Another solution, the one that will be adopted here, is to relate the vehicle table directly to the membership table. This will provide a more direct relationship between vehicles and memberships, and will fulfill the requirement to relate more than one person to a single vehicle (albeit indirectly). This design choice will also limit the default set of drivers to those that are associated to the same membership identity as the vehicle or vehicles in the vehicle table.
As with vehicles, we’re going to adopt similar logic in order to determine how to relate our membership payments to memberships.
And that gives us our overall schema, as illustrated here:
One final word about the schema diagram. You may notice a "1" and an "∞" in the diagram. The "1" represents the one-side of the data set (where the most unique portion of data is located). The "∞" represents where the many-side of the data set is located (where related records are located). What the symbols imply is that for every membership record, there may be more than one address, there may be more than one payment and more than one vehicle. For every address, there may be more than one contact. The schema above also shows that a contact is indirectly related to a vehicle via the contact's related membership record.
When designing an application, keep in mind that there can be more than one possible solution to a problem and that there can be more than one valid and necessary source of information to consult in order to make the best design choices. By casting a wide net for information gathering, we have a greater chance of gathering all requirements, including some requirements that may not be obvious at first glance.
By reflecting on nature - on how we do things in real life - we again can make the best design decisions. Our application design decisions should be adaptations and reflections of behaviors and patterns we see in life. Why is a hammer so effective? It is not just that it fits comfortably in the hand, but that it also has a suitable head for driving nails. And if it has a claw behind that head, then wow, it can also extract nails. All these features combine to make a great tool that is both easy to use and efficient.
This is what we strive for in database and application design.