Friday, March 9, 2012

Refactoring Database...How Much is Too Much?

I have a new database that I am designing. I come from a programming background, so as far as I am concerned nothing should ever be repeated, and if it is, refactor it.

I am getting to the point where I am not sure if I am refactoring too much. For instance I have a contact table which contains a first_name, and a last_name. This also shows up in a salesman table that I have. So I refactored it to a name table:

Name:
id
first_name
last_name

Contact:
id
name_id
...

Salesman:
id
name_id
...

I am going to far with my refactoring, or am I still on the right track? I figured I would ask this early before I get waist deep in refactoring something that needs to not be refactored.

Thanks in advance

I do not see the need to create the Name table. Contact table should be it. You can then reference the Contact.ContactID in the SaleMan table (if needed).

Here is a very good reference to schema design. You should be able to pick out one that best fits your need.

http://www.databaseanswers.org/data_models/index.htm

|||

I commonly think of tables as "like" information. For example, if you're going to be dealing with human beings, then perhpas a "People" table...this might be clients, or employees or relatives, suppliers...but they're all "people" with attributes (perhaps) like:

PeopleID bigint IDENTITY(1,1)

Prefix (Mr. Ms. Dr. ) (or this could be Prefix ID and you might have a separate PeoplePrefix table)

FirstName

MiddleName

LastName

GoesBy

Gender

Birthday

Phone

Email

Type

(for "Type" if any one person might have two roles in your organization, then maybe use Binary(16) or something where each of the 16 bits represents a different Role).

Then you PeopleID = 1 might be a Sales Person and People ID 15 might be a customer.

Abstracting the data is good, to a point. But too far can become quite cumbersome.

THis is purely opinion...actual mileage may vary.

|||

The word for this in SQL is "normalization." The process of normalization (more or less) seeks to equate one table with one "thing". A big way to notice commonality that might represent a common "thing" is just like you have, the same columns showing up in multiple tables. But, you would not want to normalize just to only have the same column names in a table together. The question is, do they represent a "thing" and is it the same thing?

You can find commonality in many places. For example, a car and a television set both have a color, and a model number, manufacturer, etc. But it is very unlikely they would ever show up in the same table, as there is little reason to inventory television sets with automobiles. An extreme example, perhaps, but the point is that you need to normalize to make things easier to maintain, not just to be doing it.

In your case, what you have really identified is that a Contact is a person with a name, and a Salesman is too. From that angle, you might just want to have a table called Person, with more than a first_name and last_name, more than likely you need some form of identification, like a social security number, or employee id....

Yet, this is where the paradigm starts to break down. A contact is a person, and a salesman is too. But is there any reason to model them as one table? Possibly not. The key comes down to whether or not you have any reason to do any common task with a contact and a salesman.

Like if you needed a list of persons in your system, you might have a person table, then a contactPerson and a salesPerson table for the non-common values. This would be especially true if you cared if a person was a contact AND a salesPerson.

On the other hand, if they are completely seperate concepts, so much so that their data didn't make sense being in the same table, then it is doesn't make logical sense to have a person table, at least not in this situation.

No comments:

Post a Comment