Novice database designers (who are often web developers) will quickly construct schemas based on the projection of data on their web applications, such as a contact form. Unfortunately, this thinking can lead to redundancy and quickly eat away precious space (and performance) on your system. Today we discuss an elementary approach to combating this problem.

 

An Example

One of the challenges with web development has always been solid database construction. Oftentimes developers will simply designate a table in relation to a page and offer up data to the server in the order it was processed.

For example, if a login page requires a user’s first name, last name, email address, password, and confirmation, the logical move would be to create a table called LOGIN with columns firstName, lastName, email, pwd, pwdConfirm. Next, if you create a billing page for users to enter their billing information upon purchasing a product, you might also create a separate table called BILLING with all of the necessary billing information, such as first name, last name, address, and so on.

The problem here, is that we’ve created redundancy – items such as firstName and lastName exist in two tables, and this degrades performance. This performance is vitally important to any website, particularly with thousands and millions of users. If you can guarantee your clients fast and efficient data access, then page loading will improve significantly, and data on the disk will be minimized.

Imagine having 1 million customers, and instead of having 1 million rows in both a LOGIN and a BILLING table, you had a single USER table with this information that could link to both LOGIN and BILLING, eliminating all of the duplicate columns?

Functional Dependencies to the Rescue

The layout for this optimized schema is called a functional dependency. Now I’m not going to explain the whole thing for you here (Wikipedia can, and will do a better job), but the gist is to create constraints between attributes (firstName, lastName) and relations (LOGIN, BILLING) so that no redundancies exist, and the transitions between these attributes are optimized.

Another great resource is the homepage of the slides for the book Database Systems Concepts. Check out Chapter 7, Relational Database Design, available in both PDF and PPT for your viewing pleasure. These two resources provide a fantastic explanation of how you can optimize your database schemas like we did for the previous example.

As someone who specializes in databases, I can tell you that optimizing data, particularly for larger companies, will save them thousands of gigabytes, dollars, hours…all of which you will pocket when you do this all for them in a relatively easy, methodical process. Now you have more than enough excuses for you to click on those links and brush up on your database knowledge, it’s more important than you might think!