Infolinks

Tuesday 17 July 2012

Normalization and De-normalization

=========================================================================
haii,

As opposed to denormalization, the purpose of the normalization process is to reduce redundancy (same information stored more than once), and secure data integrity (that the database contains valid information).

This is achieved by reducing large entities (large meaning a large number of attributes) into several other, lesser entities which together contains the same information, but without repeating it.

This was demonstrated in the lesson on 3NF, where we removed a denormalized entity which contained a calculated attribute, and only accessed the TRANSACTIONS table whenever we wanted to know the balance of an account.


===============

Normalization is the process removing multidatatable.

1.First Normal Form

2.Second Normal Form

3.Third Normal Form

4.Boyce Codd NormalFOrm

Denormalization is a process in which remerge the child table in specific 


=============

normalization includes reducing redudndancy, redudndancy means data repetition.By removing the repeated data from the table we can seperate the redudndant data and create another table ,that is we are seperating repeated data from detail table and creating a master table. like that we are sepearting the data. In this case we are not completely removing the redudndant data from detail table we are maintaing a common column in both the tables that is in master table & detail table. that is in mastertable common column is called primary key column , detail table common column is forign key.this foreign key refers to primary key.

this is one stage normalization. normalization has 3 steps 1normalform,2,3 normalfrom.in first normal form we seperate redundant data from detail table and cerate the mastertables.
2 normalfrom we maintain data consistency by creating primarykey foreign key relations . in 3 normal from we remove any calculation columns from detail table.

this is normalization . opposite of this above normal forms is denormalization .
that is we maintain redudndant data in deail table in that case we maintain all the transaction data in one table .that is denormalization.


===============

haiii,

Normalization is a process of removing anomalies from database table by subdiving table in multiple child tables.Narmalization comes in 1NF,2NF,3NF,Boyce Codd Normalization and 5NF. In general we goes to 3NF , to normalize any table.

Denormalization is a process where we can remerge child tables in order to completed some specific tasks.


Normalization is a design procedure for representing data in tabular format. The five normal forms are progressive rules to represent the data with minimal redundancy

Denormalization is introcution of controlled redundancy in
the database.

Use denormalization generally to optimize performace of
select query. to minimize joins used in the query

 

===============

In this hour, you learn the process of taking a raw database and breaking it into logical units called tables. This process is referred to as normalization. The normalization process is used by database developers to design databases in which it is easy to organize and manage data while ensuring the accuracy of data throughout the database.

The advantages and disadvantages of both normalization and denormalization of a database are discussed, as well as data integrity versus performance issues that pertain to normalization.

The highlights of this hour include

What normalization is

Benefits of normalization

Advantages of denormalization

Normalization techniques

Guidelines of normalization

The three normal forms

Database design

Normalizing a Database
Normalization is a process of reducing redundancies of data in a database. Normalization is a technique that is used when designing and redesigning a database. Normalization is a process or set of guidelines used to optimally design a database to reduce redundant data. The actual guidelines of normalization, called normal forms, will be discussed later in this hour. It was a difficult decision to decide whether to cover normalization in this book because of the complexity involved in understanding the rules of the normal forms this early on in your SQL journey. However, normalization is an important process that, if understood, will increase your understanding of SQL. We have attempted to simplify the process of normalization as much as possible in this hour. At this point, don't be overly concerned with all the specifics of normalization; it is most important to understand the basic concepts.

The Raw Database
A database that is not normalized may include data that is contained in one or more different tables for no apparent reason. This could be bad for security reasons, disk space usage, speed of queries, efficiency of database updates, and, maybe most importantly, data integrity. A database before normalization is one that has not been broken down logically into smaller, more manageable tables. Figure 4.1 illustrates the database used for this book before it was normalized.

Figure 4.1 The raw database.

Logical Database Design
Any database should be designed with the end user in mind. Logical database design, also referred to as the logical model, is the process of arranging data into logical, organized groups of objects that can easily be maintained. The logical design of a database should reduce data repetition or go so far as to completely eliminate it. After all, why store the same data twice? Naming conventions used in a database should also be standard and logical.

What Are the End User's Needs?
The needs of the end user should be one of the top considerations when designing a database. Remember that the end user is the person who ultimately uses the database. There should be ease of use through the user's front-end tool (a client program that allows a user access to a database), but this, along with optimal performance, cannot be achieved if the user's needs are not taken into consideration.

Some user-related design considerations include the following:

What data should be stored in the database?

How will the user access the database?

What privileges does the user require?

How should the data be grouped in the database?

What data is the most commonly accessed?

How is all data related in the database?

What measures should be taken to ensure accurate data?

Data Redundancy
Data should not be redundant, which means that the duplication of data should be kept to a minimum for several reasons. For example, it is unnecessary to store an employee's home address in more than one table. With duplicate data, unnecessary space is used. Confusion is always a threat when, for instance, an address for an employee in one table does not match the address of the same employee in another table. Which table is correct? Do you have documentation to verify the employee's current address? As if data management were not difficult enough, redundancy of data could prove to be a disaster.

The Normal Forms
The next sections discuss the normal forms, an integral concept involved in the process of database normalization.

Normal form is a way of measuring the levels, or depth, to which a database has been normalized. A database's level of normalization is determined by the normal form.

The following are the three most common normal forms in the normalization process:

The first normal form

The second normal form

The third normal form

Of the three normal forms, each subsequent normal form depends on normalization steps taken in the previous normal form. For example, to normalize a database using the second normal form, the database must first be in the first normal form.

The First Normal Form
The objective of the first normal form is to divide the base data into logical units called tables. When each table has been designed, a primary key is assigned to most or all tables. Examine Figure 4.2, which illustrates how the raw database shown in the previous figure has been redeveloped using the first normal form.

Figure 4.2 The first normal form.

You can see that to achieve the first normal form, data had to be broken into logical units of related information, each having a primary key and ensuring that there are no repeated groups in any of the tables. Instead of one large table, there are now smaller, more manageable tables: EMPLOYEE_TBL, CUSTOMER_TBL, and PRODUCTS_TBL. The primary keys are normally the first columns listed in a table, in this case: EMP_ID, CUST_ID, and PROD_ID .

The Second Normal Form
The objective of the second normal form is to take data that is only partly dependent on the primary key and enter that data into another table. Figure 4.3 illustrates the second normal form.

Figure 4.3 The second normal form.

According to the figure, the second normal form is derived from the first normal form by further breaking two tables down into more specific units.

EMPLOYEE_TBL split into two tables called EMPLOYEE_TBL and EMPLOYEE_PAY_TBL. Personal employee information is dependent on the primary key (EMP_ID), so that information remained in the EMPLOYEE_TBL (EMP_ID, LAST_NAME, FIRST_NAME, MIDDLE_NAME, ADDRESS, CITY, STATE, ZIP, PHONE, and PAGER). On the other hand, the information that is only partly dependent on the EMP_ID (each individual employee) is used to populate EMPLOYEE_PAY_TBL (EMP_ID, POSITION, POSITION_DESC, DATE_HIRE, PAY_RATE, and DATE_LAST_RAISE). Notice that both tables contain the column EMP_ID. This is the primary key of each table and is used to match corresponding data between the two tables.

CUSTOMER_TBL split into two tables called CUSTOMER_TBL and ORDERS_TBL. What took place is similar to what occurred in the EMPLOYEE_TBL. Columns that were partly dependent on the primary key were directed to another table. The order information for a customer is dependent on each CUST_ID, but does not directly depend on the general customer information in the original table.

The Third Normal Form
The third normal form's objective is to remove data in a table that is not dependent on the primary key. Figure 4.4 illustrates the third normal form.

Figure 4.4 The third normal form.

Another table was created to display the use of the third normal form. EMPLOYEE_PAY_TBL is split into two tables, one table containing the actual employee pay information and the other containing the position descriptions, which really do not need to reside in EMPLOYEE_PAY_TBL. The POSITION_DESC column is totally independent of the primary key, EMP_ID .

Naming Conventions
Naming conventions are one of the foremost considerations when you're normalizing a database. Names are how you will refer to objects in the database. You want to give your tables names that are descriptive of the type of information they contain so that the data you are looking for is easy to find. Descriptive table names are especially important for users querying the database that had no part in the database design. A company-wide naming convention should be set, providing guidance in the naming of not only tables within the database, but users, filenames, and other related objects. Designing and enforcing naming conventions is one of a company's first steps toward a successful database implementation.

Benefits of Normalization
Normalization provides numerous benefits to a database. Some of the major benefits include the following :

Greater overall database organization

Reduction of redundant data

Data consistency within the database

A much more flexible database design

A better handle on database security

Organization is brought about by the normalization process, making everyone's job easier, from the user who accesses tables to the database administrator (DBA) who is responsible for the overall management of every object in the database. Data redundancy is reduced, which simplifies data structures and conserves disk space. Because duplicate data is minimized, the possibility of inconsistent data is greatly reduced. For example, in one table an individual's name could read STEVE SMITH, whereas the name of the same individual reads STEPHEN R. SMITH in another table. Because the database has been normalized and broken into smaller tables, you are provided with more flexibility as far as modifying existing structures. It is much easier to modify a small table with little data than to modify one big table that holds all the vital data in the database. Lastly, security is also provided in the sense that the DBA can grant access to limited tables to certain users. Security is easier to control when normalization has occurred.

Data integrity is the assurance of consistent and accurate data within a database.

Referential Integrity
Referential integrity simply means that the values of one column in a table depend on the values of a column in another table. For instance, in order for a customer to have a record in the ORDERS_TBL table, there must first be a record for that customer in the CUSTOMER_TBL table. Integrity constraints can also control values by restricting a range of values for a column. The integrity constraint should be created at the table's creation. Referential integrity is typically controlled through the use of primary and foreign keys.

In a table, a foreign key, normally a single field, directly references a primary key in another table to enforce referential integrity. In the preceding paragraph, the CUST_ID in ORDERS_TBL is a foreign key that references CUST_ID in CUSTOMER_TBL.

Drawbacks of Normalization
Although most successful databases are normalized to some degree, there is one substantial drawback of a normalized database: reduced database performance. The acceptance of reduced performance requires the knowledge that when a query or transaction request is sent to the database, there are factors involved, such as CPU usage, memory usage, and input/output (I/O). To make a long story short, a normalized database requires much more CPU, memory, and I/O to process transactions and database queries than does a denormalized database. A normalized database must locate the requested tables and then join the data from the tables to either get the requested information or to process the desired data. A more in-depth discussion concerning database performance occurs in Hour 18, "Managing Database Users."

Denormalizing a Database
Denormalization is the process of taking a normalized database and modifying table structures to allow controlled redundancy for increased database performance. Attempting to improve performance is the only reason to ever denormalize a database. A denormalized database is not the same as a database that has not been normalized. Denormalizing a database is the process of taking the level of normalization within the database down a notch or two. Remember, normalization can actually slow performance with its frequently occurring table join operations. (Table joins are discussed during Hour 13, "Joining Tables in Queries.") Denormalization may involve recombining separate tables or creating duplicate data within tables to reduce the number of tables that need to be joined to retrieve the requested data, which results in less I/O and CPU time.

There are costs to denormalization, however. Data redundancy is increased in a denormalized database, which can improve performance but requires more extraneous efforts to keep track of related data. Application coding renders more complications, because the data has been spread across various tables and may be more difficult to locate. In addition, referential integrity is more of a chore; related data has been divided among a number of tables. There is a happy medium in both normalization and denormalization, but both require a thorough knowledge of the actual data and the specific business requirements of the pertinent company.


=====================

Normalization
 
Record design process that identifies and avoids data problems and redundancy
Specifies the fields and the primary key
Normalization analyzes record structure through four stages
Unnormalized records
First normal form (1NF) records
Second normal form (2NF) records
Third normal form (3NF) records
Normalization
 
First normal form
Unnormalized records contain a repeating group
A repeating group refers to a single record that has multiple values in a particular field
Example: multiple product numbers in a single order record
A 1NF record cannot have a repeating group
Normalization
 
First normal form
To convert an unnormalized record to 1NF, the repeating group must be removed
Expand the primary key to include the primary key of the repeating group
The new primary key is a combination of the original primary key and the key of the repeating group
Instead of a single record with a repeating group, the result is many records, one for each instance of the repeating group
Normalization
 
Second normal form (2NF)
To be in second normal form, a record must be in 1 NF, and all nonkey fields must be functionally dependent on the entire primary key - not just part of it
Functional dependency means that a value in one field determines a value in another field
If the primary key is a single field, then any record in 1 NF is automatically in 2 NF
In 2NF, all nonkey fields are functionally dependent on the entire primary key
Normalization
 
Second normal form (2NF)
To convert a 1NF record to 2NF
Create a new record design for each field (or combination of fields) in the primary key
Place remaining fields with the appropriate record
The result will be several records, each with a primary key field (or combination of fields) that determines the values of the other fields in that record
Normalization
 
Third normal form (3NF)
To be in 3NF, a record must be in 2NF and no nonkey field is functionally dependent on another nonkey field
In 3NF, all nonkey fields are functionally dependent on the primary key, the entire key, and nothing but the key
Normalization
 
Third normal form (3NF)
To convert a 2NF record to 3NF
Remove all nonkey fields that depend on another nonkey field and place them in a new record that has the determining field as a primary key
Normalization
 
A normalization example
Identify the entities
ADVISOR
STUDENT
COURSE
Identify the relationships
One advisor advises many students (1:M)
Students take one or more courses, and courses have one or more students (M:N)
Normalization
 
A normalization example
Identify the entities
ADVISOR
STUDENT
COURSE
Identify the relationships
One advisor advises many students (1:M)
Students take one or more courses, and courses have one or more students (M:N)
Document the unnormalized record
Note the repeating group of courses
Normalization
 
A normalization example
Convert the unnormalized record to 1 NF
Remove the repeating group
Create a primary key composed of the original primary key (student number) and the primary key of the repeating group (course number)
The result is one record for each instance of the combination primary key
Normalization
 
A normalization example
Convert the 1 NF record to 2NF
Create a separate record design for each field and combination of fields in the primary key
Place functionally dependent fields with an appropriate primary key
The result is three records instead of one, each with a unique primary key
Now all nonkey fields are dependent on the entire primary key, not just a portion of it
Normalization
 
A normalization example
Convert the 2NF record to 3NF
The STUDENT record contains a nonkey field (advisor name) that is dependent on another nonkey field (advisor number)
Create a new record with advisor number as the primary key
Remove the dependent nonkey field (advisor name) and include it in the new record
Normalization
 
A normalization example
Convert the 2NF record to 3NF
The STUDENT record contains a nonkey field (advisor name) that is dependent on another nonkey field (advisor number)
Create a new record with advisor number as the primary key
Remove the dependent nonkey field (advisor name) and include it in the new record
Now all nonkey fields are dependent on the entire primary key, and nothing but the key

==================

Normalization : It is the process of decomposing a relation with anomalies into a well structured relation.

Basic Terminology of Normalization:-
Well-structured relation : A relation with minimum redundancy and no anomalies

Anolamy : is an error during either of insert,update or delete operation


Primary key attribute: An attribute(column) on which Primary key constraint is applied - so as to avoid duplicate and null values - Primary key attribute assists to identify rows uniquely in a relation :

Non-Key attributes : All other columns apart from primary key attribute are called Non key attributes:

multi-valued attribute : An attribute that contains more than one value per instance

Functional- dependency: - IF a non-key attributes completely rely or depend on Primay key attribute it is termed as functional dependency

Ex:   EMP (eno,ename,job,sal) - here eno is Primary key attribute and all other are Non key attributes - 

if eno is known - corresponding ename,job,sal can be identified - Hence i say - ename,job,sal are the non key attribute which completely rely on primary key attribute called eno;

Partial - functional dependency:

                      if a non-key attribute partially depends on primary key-attribute - then tat kind of dependency is called Partial functioan dependency.

Transitive dependency:    if a non key atttribute depends on another non-key attribute then that kind of dependency is called transitive dependency

multivalued dependency : Consider a relation with A,B,C as columns - the term multivalued dependency states that for each value of A there exists a set of values for the column B and a set of values for the cokumn C - but B values and C values are independent of each other


Normalization is achieved through stages and each stage corresponds to normal form- WE have basic normal forms and higher or advanced normal forms

Under basic

1NF : A relation is said to be in 1 NF - if it does not contain multivalued attributes
2NF : A relations is said to be in 2 NF - if it does not contain partial functional dependencies
3NF- A relation is said to be in 3 NF - if it does not contain transitive dependecies

Advanced

BCNF : Every determinant is a candidate key
4NF: A relation is said to be in 4NF if it does not contain multivalued dependencies
5NF - Domain key normal f orm - if any of the remaining anomalies have been removed


======================

Normalization

  • It is the process of organizing data into related table.
  • To normalize database, we divide database into tables and establish relationships between the tables.
  • It reduces redundancy. It is done to improve performance of query.

Steps of normalization:

First Normal form
Entities of the table must have unique identifier or entity key.
Second Normal Form
All the attributes of the table must depend on the entity key for that entity.
Third Normal Form
All attributes that are not part of the key must not depend on any other non-key attributes.

De-normalization

The process of adding redundant data to get rid of complex join, in order to optimize database performance. This is done to speed up database access by moving from higher to lower form of normalization.

Define Normalization and De- Normalization.

Normalization is the process of reducing data redundancy and maintains data integrity. This is performed by creating relationships among tables through primary and foreign keys. Normalization procedure includes 1NF, 2NF, 3NF, BCNF, and then the data is normalized.
Denomalization on the contrary is the process of adding redundant data to speed up complex queries involving multiple table JOINS. One might just go to a lower form of Normalization to achieve Denormalization and better performance. Data is included in one table from another in order to eliminate the second table which reduces the number of JOINS in a query and thus achieves performance.
   

No comments:

Post a Comment