Normalization
Normalization is the process of splitting relations into well-structured relations that allow users to insert, delete, and update tuples without introducing database inconsistencies. The focus of normalization is to reduce redundant data to the minimum. Normalization is also called “Bottom-up approach”, because this technique requires very minute details like every participating attribute and how it is dependant on the key attributes, is crucial. If you add new attributes after normalization, it may change the normal form itself.
Through Normalization, the collection of data in a single table is distributed into multiple tables with specific relation. Without normalization many problems can occur when trying to load an integrated conceptual model into the DBMS. These problems arise from relations that are generated directly from user views are called anomalies. There are three types of anomalies:
RollNo | StudentName | CourseNo | CourseName | Instructor |
120 | SKL | CS-75 | DBMS | SB |
89 | KBC | CS-75 | DBMS | SB |
25 | ABC | CS-75 | DBMS | SB |
48 | XYZ | CS-30 | CA | MC |
57 | TKC | CS-80 | OS | JB |
120 | SKL | CS-80 | OS | JB |
Update anomaly: An update anomaly is a data inconsistency that results from data redundancy and a partial update. For example, if there is a change in the name of the instructor for CS-75, we need to make change for all the rows. If we forget to update a single row the database will show two instructor names for the Course CS-75.
Deletion anomaly: This means loss of useful information. In some cases it may occur that some useful data is lost. For example, consider the row for RollNo 48. XYZ is the only student who has opted for CS-30. If XYZ leaves the institute and the data is deleted for XYZ, the associated data for Course will also be deleted.
Insertion anomaly: An insertion anomaly is the inability to add data to the database due to absence of other data. For example, assume that the above table is defined so that null values are not allowed. If a course is added but not immediately students opt for the course, then this course could not be entered into the database (68). This results in database inconsistencies due to omission.
In database tables are normalized for the following reasons:
To allow data retrieval at an optimal speed
Data maintenance through update, insertion and deletion.
To reduce the need to restructure tables for new applications
A functional dependency is a constraint between two sets of attributes in a relation from a database. It occurs when an attribute in a relation uniquely determines another attribute.
In a given relation R, X and Y are attributes. Attribute Y is functionally dependent on attribute X if each value of X determines EXACTLY ONE value of Y, which is represented as X → Y (X can be composite in nature).
We say here “X determines Y” or “y is functionally dependent on x”.
X→Y does not imply Y→X.
If the value of an attribute “Marks” is known then the value of an attribute “Grade” is determined since Marks→ Grade
RollNo | StudentName | Marks | Grade |
59 | ABC | 90 | A |
96 | XYZ | 70 | B |
Types of functional dependencies:
1. Full Functional dependency: X and Y are attributes. X Functionally determines Y (Note: Subset of X should not functionally determine Y)
2. Partial Functional dependency: X and Y are attributes. Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of attribute X.
3. Transitive dependency: X Y and Z are three attributes. X -> Y, Y-> Z => X -> Z
Now consider the following Relation
REPORT (STUDENT#, COURSE#, CourseName, IName, Room#, Marks, Grade)
• STUDENT# - Student Number
• COURSE# - Course Number
• CourseName - Course Name
• IName - Name of the Instructor who delivered the course
• Room# - Room number which is assigned to respective Instructor
• Marks - Scored in Course COURSE# by Student STUDENT#
• Grade - obtained by Student STUDENT# in Course COURSE#
The Functional Dependencies are:
• STUDENT# COURSE# → Marks
• COURSE# → CourseName,
• COURSE# → IName (Assuming one course is taught by one and only one
Instructor)
• IName → Room# (Assuming each Instructor has his/her own and non-shared room)
• Marks → Grade
In above example Marks is fully functionally dependent on STUDENT# COURSE# and not on subset of STUDENT# COURSE#. This means Marks cannot be determined either by STUDENT# OR COURSE# alone. It can be determined only using STUDENT# AND COURSE# together. Hence Marks is fully functionally dependent on STUDENT# COURSE#. CourseName is not fully functionally dependent on STUDENT# COURSE# because subset of STUDENT# COURSE# i.e only COURSE# determines the CourseName and STUDENT# does not have any role in deciding CourseName. Hence CourseName is not fully functionally dependent on STUDENT# COURSE#.
Now CourseName, IName, Room# are partially dependent on composite attributes STUDENT# COURSE# because COURSE# alone defines the CourseName, IName, Room#.
Again, Room# depends on IName and in turn IName depends on COURSE#. Hence Room# transitively depends on COURSE#. Similarly Grade depends on Marks, in turn Marks depends on STUDENT# COURSE# hence Grade depends Fully transitively on STUDENT# COURSE#.
Now consider this table:
Student Data | Course Details | Result | |||||
STUDENT# | StudentName | COURSE# | CourseName | IName | Room# | Marks | Grade |
First normal form (1NF)
A relation schema is in 1NF if:
· If and only if all the attributes of the relation R are atomic in nature.
· Atomic: the smallest level to which data may be broken down and remain meaningful.
In relational database design it is not practically possible to have a table which is not in 1NF.
STUDENT# | StudentName | COURSE# | CourseName | IName | Room# | Marks | Grade |
Second normal form (2NF)
A Relation is said to be in Second Normal Form if and only if:
· It is in the First normal form, and
· No partial dependency exists between non-key attributes and key attributes.
An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which doesn’t is a non-prime attribute To make a table 2NF compliant, we have to remove all the partial dependencies
Note: - All partial dependencies are eliminated
STUDENT# | StudentName |
COURSE# | CourseName | IName | Room# |
STUDENT# | COURSE# | Marks | Grade |
Third normal form (3 NF)
A relation R is said to be in the Third Normal Form (3NF) if and only if:
· It is in 2NF and
· No transitive dependency exists between non-key attributes andkey attributes.
• STUDENT# and COURSE# are the key attributes.
• All other attributes, except grade are nonpartially, non-transitively dependent on key attributes.
STUDENT# | StudentName |
COURSE# | CourseName | IName | Room# |
STUDENT# | COURSE# | Marks |
Marks (Lower Bound) | Marks (Upper Bound) | Grade |
Boyce-Codd Normal form (BCNF)
A relation is said to be in Boyce Codd Normal Form (BCNF)
· If and only if all the determinants are candidate keys.
BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.
Consider the table:
Candidate Keys for the relation are [C#, S#] and [C#, Email]. Since Course # is overlapping, it is referred as Overlapping Candidate Key. Valid Functional Dependencies are:
S# → EmailID
EmailID → S#
S#, C# → Marks
C#, EmailID →Marks
Student# | Course# | Marks |
ABC | CS-75 | 98 |
Comments
Post a Comment