Database Management System 10CS54 VTU unit 6
The Eduladder is a community of students, teachers, and programmers just interested to make you pass any exams. So we solve previous year question papers for you.
See Our team
Wondering how we keep quality?
Got unsolved questions?

Ask Questions
Hey! Want to work with us? Our youtube channel See our Jd Want to apply? Do it today

Use Me  ?

New searches

You are here:Open notes-->VTU-->Database-Management-System-10CS54-VTU-unit-6

Database Management System 10CS54 VTU unit-6

UNIT-6 Data Base design-1




6.1 Informal design guidelines for relation schemas
The four informal measures of quality for relation schema 
 Semantics of the attributes 
 Reducing the redundant values in tuples 
 Reducing the null values in tuples 
 Disallowing the possibility of generating spurious tuples 
6.1.1 Semantics of relations attributes
Specifies how to interpret the attributes values stored in a tuple of the relation. In other words, 
how the attribute value in a tuple relate to one another.
Guideline 1: Design a relation schema so that it is easy to explain its meaning. Do not combine 
attributes from multiple entity types and relationship types into a single relation. 
Reducing redundant values in tuples. Save storage space and avoid update anomalies. 
 Insertion anomalies. 
 Deletion anomalies. 
Insertion Anomalies
To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for 
that department that the employee works for, or nulls. 
It's difficult to insert a new department that has no employee as yet in the EMP_DEPT relation. 
The only way to do this is to place null values in the attributes for employee. This causes a 
problem because SSN is the primary key of EMP_DEPT, and each tuple is supposed to represent 
an employee entity - not a department entity. 
Deletion Anomalies
If we delete from EMP_DEPT an employee tuple that happens to represent the last employee 
working for a particular department, the information concerning that department is lost from the 
database. 
Modification Anomalies
In EMP_DEPT, if we change the value of one of the attributes of a particular department- say the 
manager of department 5- we must update the tuples of all employees who work in that 
department.
Guideline 2: Design the base relation schemas so that no insertion, deletion, or modification 
anomalies occur. Reducing the null values in tuples. e.g., if 10% of employees have offices, it is 
better to have a separate relation, EMP_OFFICE, rather than an attribute OFFICE_NUMBER in 
EMPLOYEE
Guideline 3: Avoid placing attributes in a base relation whose values are mostly null. 
Disallowing spurious tuples. 
Spurious tuples - tuples that are not in the original relation but generated by natural join of 
decomposed subrelations. 
Example: decompose EMP_PROJ into EMP_LOCS and EMP_PROJ1. 
Guideline 4: Design relation schemas so that they can be naturally JOINed on primary keys or foreign keys in a way that guarantees no spurious tuples are generated. 
6.2 A functional dependency (FD) is a constraint between two sets of attributes from the 
database. It is denoted by 
X Y 
We say that "Y is functionally dependent on X". Also, X is called the left-hand side of the FD. 
Y is called the right-hand side of the FD. 
A functional dependency is a property of the semantics or meaning of the attributes, i.e., a 
property of the relation schema. They must hold on all relation states (extensions) of R. Relation 
extensions r(R). A FD X Y is a full functional dependency if removal of any attribute from X 
means that the dependency does not hold any more; otherwise, it is a partial functional 
dependency.
Examples: 
1. SSN ENAME 
2. PNUMBER {PNAME, PLOCATION} 
3. {SSN, PNUMBER} HOURS 
FD is property of the relation schema R, not of a particular relation state/instance 
Let R be a relation schema, where X R and Y R 
t1, t2 r, t1[X] = t2[X] t1[Y] = t2[Y] 
The FD X Y holds on R if and only if for all possible relations r(R), whenever two tuples of r 
agree on the attributes of X, they also agree on the attributes of Y. 
 the single arrow denotes "functional dependency" 
 X Y can also be read as "X determines Y" 
 the double arrow denotes "logical implication" 
6.2.1 Inference Rules
IR1. Reflexivity e.g. X X
 a formal statement of trivial dependencies; useful for derivations 
IR2. Augmentation e.g. X Y XZ Y
 if a dependency holds, then we can freely expand its left hand side 
IR3. Transitivity e.g. X Y, Y Z X Z
 the "most powerful" inference rule; useful in multi-step derivations 
Armstrong inference rules are
 sound
meaning that given a set of functional dependencies F specified on a relation schema R, 
any dependency that we can infer from F by using IR1 through IR3 holds every relation 
state r of R that specifies the dependencies in F. In other words, rules can be used to 
derive precisely the closure or no additional FD can be derived. 
 complete
meaning that using IR1 through IR3 repeatedly to infer dependencies until no more 
dependencies can be inferred results in the complete set of all possible dependencies that 
can be inferred from F. In other words, given a set of FDs, all implied FDs can be derived 
using these 3 rules. 
Closure of a Set of Functional Dependencies
Given a set X of FDs in relation R, the set of all FDs that are implied by X is called the 
closure of X, and is denoted X+
Algorithms for determining X+
X
+
:= X;
repeat
oldX+ := X+
for each FD Y Z in F do
if Y  X
+
then X+
:= X+  Z;
until oldX+ = X+
;
Example:
A BC
E CF
B E
CD EF
Compute {A, B}+
of the set of attributes under this set of FDs.
Solution:
Step1: {A, B}+
:= {A, B}.
Go round the inner loop 4 time, once for each of the given FDs.
On the first iteration, for A BC 
 A {A, B}+
 {A, B}+
:= {A, B, C}. 
Step2: On the second iteration, for E CF,  {A, B, C} 
Step3 :On the third iteration, for B E 
 B {A, B,C}+
 {A, B}+
:= {A, B, C, E}. 
Step4: On the fourth iteration, for CD EF remains unchanged
Go round the inner loop 4 times again. On the first iteration result does not change; on the 
second it expands to {A,B,C,E,F}; On the third and forth it does not change. 
Now go round the inner loop 4 times. Closure does not change and so the whole process 
terminates, with
{A,B}+ = {A,B,C,E,F} 
Example. 
F = { SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN,PNUMBER} 
HOURS } 
{SSN}+ = {SSN, ENAME} 
{PNUMBER}+ = ? 
{SSN,PNUMBER}+ = ? 

6.3 Normalization
The purpose of normalization.
 The problems associated with redundant data. 
 The identification of various types of update anomalies such as insertion, deletion, and 
modification anomalies.
 How to recognize the appropriateness or quality of the design of relations. 
 The concept of functional dependency, the main tool for measuring the appropriateness 
of attribute groupings in relations. 
 How functional dependencies can be used to group attributes into relations that are in a 
known normal form. 
 How to define normal forms for relations.
 How to undertake the process of normalization. 
 How to identify the most commonly used normal forms, namely first (1NF), second 
(2NF), and third (3NF) normal forms, and Boyce-Codd normal form (BCNF). 
 How to identify fourth (4NF), and fifth (5NF) normal forms. 
Main objective in developing a logical data model for relational database systems is to create an 
accurate representation of the data, its relationships, and constraints. To achieve this objective, 
we must identify a suitable set of relations. A technique for producing a set of relations with 
desirable properties, given the data requirements of an enterprise

NORMAL FORMS

A relation is defined as a set of tuples. By definition, all elements of a set are distinct; hence, all 
tuples in a relation must also be distinct. This means that no two tuples can have the same 
combination of values for all their attributes.
Any set of attributes of a relation schema is called a superkey. Every relation has at least one 
superkey—the set of all its attributes. A key is a minimal superkey, i.e., a superkey from which 
we cannot remove any attribute and still have the uniqueness constraint hold.
In general, a relation schema may have more than one key. In this case, each of the keys is called 
a candidate key. It is common to designate one of the candidate keys as the primary key of the 
relation. A foreign key is a key in a relation R but it's not a key (just an attribute) in other 
relation R' of the same schema.
Integrity Constraints
The entity integrity constraint states that no primary key value can be null. This is because the 
primary key value is used to identify individual tuples in a relation; having null values for the 
primary key implies that we cannot identify some tuples. 
The referential integrity constraint is specified between two relations and is used to maintain 
the consistency among tuples of the two relations. Informally, the referential integrity constraint 
states that a tuple in one relation that refers to another relation must refer to an existing tuple in 
that relation.
An attribute of a relation schema R is called a prime attribute of the relation R if it is a member 
of any key of the relation R. An attribute is called nonprime if it is not a prime attribute—that is, 
if it is not a member of any candidate key.
The goal of normalization is to create a set of relational tables that are free of redundant data and 
that can be consistently and correctly modified. This means that all tables in a relational database 
should be in the in the third normal form (3 NF).
Normalization of data can be looked on as a process during which unsatisfactory relation 
schemas are decomposed by breaking up their attributes into smaller relation schemas that 
possess desirable properties. One objective of the original normalization process is to ensure that 
the update anomalies such as insertion, deletion, and modification anomalies do not occur.
The most commonly used normal forms
 First Normal Form (1NF)
 Second Normal Form (2NF)
 Third Normal Form (3NF)
 Boyce-Codd Normal Form
Other Normal Forms
 Fourth Normal Form
 Fifth Normal Form
Domain Key Normal Form
6.3.1 First Normal Form (1NF)
First normal form is now considered to be part of the formal definition of a relation; historically, it was defined to disallow multivalued attributes, composite attributes, and their combinations. It states that the domains of attributes must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.
Practical Rule: "Eliminate Repeating Groups," i.e., make a separate table for each set of related attributes, and give each table a primary key.
Formal Definition: A relation is in first normal form (1NF) if and only if all underlying simple 
domains contain atomic values only.

6.3.2 Second Normal Form (2NF)
Second normal form is based on the concept of fully functional dependency. A functional X Y 
is a fully functional dependency is removal of any attribute A from X means that the dependency 
does not hold any more. A relation schema is in 2NF if every nonprime attribute in relation is 
fully functionally dependent on the primary key of the relation. It also can be restated as: a 
relation schema is in 2NF if every nonprime attribute in relation is not partially dependent on any 
key of the relation.
Practical Rule: "Eliminate Redundant Data," i.e., if an attribute depends on only part of a 
multivalued key, remove it to a separate table.
Formal Definition: A relation is in second normal form (2NF) if and only if it is in 1NF and 
every nonkey attribute is fully dependent on the primary key.

6.3.3 Third Normal Form (3NF)

Third normal form is based on the concept of transitive dependency. A functional dependency 
X Y in a relation is a transitive dependency if there is a set of attributes Z that is not a subset 
of any key of the relation, and both X Z and Z Y hold. In other words, a relation is in 3NF 
if, whenever a functional dependency 
X A holds in the relation, either (a) X is a superkey of the relation, or (b) A is a prime 
attribute of the relation.
Practical Rule: "Eliminate Columns not Dependent on Key," i.e., if attributes do not contribute to 
a description of a key, remove them to a separate table.
Formal Definition: A relation is in third normal form (3NF) if and only if it is in 2NF and every 
nonkey attribute is nontransitively dependent on the primary key.

1NF: R is in 1NF iff all domain values are atomic. 
2NF: R is in 2 NF iff R is in 1NF and every nonkey attribute is fully dependent on the key. 
3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on the 
key.
6.4 Boyce-Codd Normal Form (BCNF)
A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever a FD X -> A holds in 
R, then X is a superkey of R
 Each normal form is strictly stronger than the previous one:
 Every 2NF relation is in 1NF Every 3NF relation is in 2NF
 Every BCNF relation is in 3NF 
 There exist relations that are in 3NF but not in BCNF 
A relation is in BCNF, if and only if every determinant is a candidate key. 
Additional criteria may be needed to ensure the the set of relations in a relational database are satisfactory




Editors




Join eduladder!