Sunday, 10 January 2016

Database denormalization techniques

We have seen about the normalization and different levels of normalization in previous post. A normalized database design will store the information in more number smaller and separate tables. When you need to retrieve particular information from a normalized database you need to perform join operation, when you have more number of relationships among tables for particular information, you need to perform more join operation; it may further reduce the retrieval speed of the database. Here comes the denormalization, we are violating the normalization selectively in order to reduce the data retrieval time of a database. But at the same time over denormalization also leads to redundancy and inconsistent data, so it is data modeler’s response that, after denormalization the database should not become inconsistent.


In this post let me explain you about denormalization technique with an example. Let’s consider the user model from the previous post, reproduced below for easy understanding.

Normalized user data model
USER MODEL IN NORMALIZED FORM

Technique 1: moving all super type data fields into each subtype.

In this technique we move all the data fields from the super type into each subtype, here relationships also moved from super-type to subtype. The following figure displays the new updated model.

Moving supertype data fields into subtype
MOVING SUPERTYPE DATA FIELDS INTO EACH SUBTYPE

In the above model, you can see that the supertype User entity removed and the data fields of the USER entity moved to each subtypes COMPANY and PERSON. This technique enforces only the business rules containing the subtypes since the supertype is removed from the model. When applying this technique of denormalization you should keep in mind that, this model does not allow you add new subtypes in future without altering the data model.


Technique 2: moving all the data fields of each subtype into supertype.

In this technique, all the data fields of the each subtype moved into supertype and the subtypes are removed from the model. After applying this technique to our User model, the new updated model will look as follows:

Moving subtype data fields into supertype
MOVING SUBTYPE DATA FIELDS INTO SUPERTYPE

Note that in the above mentioned model, we have created a new data field User Type in USER entity, since we merged data fields of both PERSON and COMPANY so we need something to distinguish each of these properties. The data field User Type contains the values such as “Person” and “Company”. In this model we can enforce the rules of the supertype USER but we cannot enforce for subtypes PERSON and COMPANY since we removed both of these from our model.

Technique 3: denormalization with directly accessing parent entity.


In this technique we apply normal database denormalization techniques but we are allowing access to the parent entity directly. For example, consider our USER model, if business people need report including following details:

  • What is the total number of users by City?
  • What is the total number of users by State?
  • What is the total number of users by Country?
  • What is the total number of users born in the year 2015?
  •  What is the total number of users registered in the month of January?

In a normalized database, we need to use join operation to answer the mentioned question from the child entity to parent entity. In order to solve this performance issue, we can model the one-to-many side of the relationship as follows. For understanding better, I applied the both the technique 2  and 3 in this model.

Denormalization with direct access to parent entity
DENORMALIZATION WITH DIRECT ACCESS TO PARENT ENTITY


After applying this technique we got the above displayed updated model, in which you can see that three entities such as CITY, STATE and COUNTRY grouped into an entity called GEOGRAPHIC LOCATION. In that entity we created a new data field called Geo Level Code, which is used to distinguish the level of geographic location, which contains the following values such as “city”, “state”, “country”. Similarly the data field calendar level contains the values “Date”, “Month” and “Year”.

Technique 4: Same data fields in the same entity:


Before the explaining about this technique, let us ask our self, what if user having only two Phone numbers? Do we really need to model phone number as a separate entity? In order to answer these questions, we should know our requirement very clearly. Since we have the requirements where the USER have only two phone numbers, but what if he needs to add many number of phone numbers in future? Also what if the business needs to track the history of phone numbers used by a User? In those cases having normalized separate entities are good options, since we can add requirements any time in future without altering the data model.

When a user have only two phone numbers and no additional phone number facilities allowed in future, we can add the phone number data field in the USER entity itself, instead of creating a separate entity. So after applying technique 2, 3 and 4 our new denormalized data model will look as follows:
Same data fields in the same entity
SAME DATA FIELDS IN THE SAME ENTITY


Technique 5: copying data field from an entity into one or more entity into one or more entity:


Recall the technique three where we grouped all the geographic location entities into an entity called GEO LOCATION and created a Geo Level Code and similarly created CALENDAR entity and Calendar level.  There the data model provides us option to select the level of geographic location and calendar and we removed relationship among the separate entity and created a new entity. Now let’s assume, your business require only Country level data most the time and needs state level and city level very rarely, in that case, in order to retrieve the data we need navigate to Geo location table very frequently. Technique five help us to make this process very less time consuming and increase the speed of retrieval by simply copying the most frequently needed Country data field to User entity, while remaining data fields and relationship untouched. Here the relationships are not removed only the data fields are copied. Similarly, for example, business needs only month level data, by applying this rule our updated denormalized data model may look as follows:

Copying data fields among entities
COPYING DATA FIELDS 


Note that, in our example, we have applied more than one technique to our model at a time, but you should remember that, these techniques are not step by step process; you can apply the required technique as per the requirements. It completely depends on your business requirement. I hope this post gives you a basic understanding of denormalization technique. If you have any doubts or if you needs to share your thoughts kindly write down at the comments section.
Post a Comment