See Our team

Wondering how we keep quality?

Got unsolved questions?

Ask Questions

Engineering
GATE
CBSE
NCERT
Psychology
English
Computer
Constitution
Astrology
Yoga
Economics
Physics
Biology
Electronics
Microprocessor
Career
Interview
Anatomy
Botany

You are here:Open notes-->VTU-->Database-Management-System-10CS54-VTU-unit-6

**Database Management System 10CS54 VTU unit-6**

# UNIT-6 Data Base design-1

**6.1 Informal design guidelines for relation schemas**

The four informal measures of quality for relation schema

Semantics of the attributes

Reducing the redundant values in tuples

Reducing the null values in tuples

Disallowing the possibility of generating spurious tuples

**6.1.1 Semantics of relations attributes**

Specifies how to interpret the attributes values stored in a tuple of the relation. In other words,

how the attribute value in a tuple relate to one another.

**Guideline 1:**Design a relation schema so that it is easy to explain its meaning. Do not combine

attributes from multiple entity types and relationship types into a single relation.

Reducing redundant values in tuples. Save storage space and avoid update anomalies.

Insertion anomalies.

Deletion anomalies.

**Insertion Anomalies**

To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for

that department that the employee works for, or nulls.

It's difficult to insert a new department that has no employee as yet in the EMP_DEPT relation.

The only way to do this is to place null values in the attributes for employee. This causes a

problem because SSN is the primary key of EMP_DEPT, and each tuple is supposed to represent

an employee entity - not a department entity.

**Deletion Anomalies**

If we delete from EMP_DEPT an employee tuple that happens to represent the last employee

working for a particular department, the information concerning that department is lost from the

database.

**Modification Anomalies**

In EMP_DEPT, if we change the value of one of the attributes of a particular department- say the

manager of department 5- we must update the tuples of all employees who work in that

department.

**Guideline 2:**Design the base relation schemas so that no insertion, deletion, or modification

anomalies occur. Reducing the null values in tuples. e.g., if 10% of employees have offices, it is

better to have a separate relation, EMP_OFFICE, rather than an attribute OFFICE_NUMBER in

EMPLOYEE

**Guideline 3:**Avoid placing attributes in a base relation whose values are mostly null.

Disallowing spurious tuples.

Spurious tuples - tuples that are not in the original relation but generated by natural join of

decomposed subrelations.

Example: decompose EMP_PROJ into EMP_LOCS and EMP_PROJ1. **Guideline 4:**Design relation schemas so that they can be naturally JOINed on primary keys or foreign keys in a way that guarantees no spurious tuples are generated.

6.2 A functional dependency (FD) is a constraint between two sets of attributes from the

database. It is denoted by

X Y

We say that "Y is functionally dependent on X". Also, X is called the left-hand side of the FD.

Y is called the right-hand side of the FD.

A functional dependency is a property of the semantics or meaning of the attributes, i.e., a

property of the relation schema. They must hold on all relation states (extensions) of R. Relation

extensions r(R). A FD X Y is a full functional dependency if removal of any attribute from X

means that the dependency does not hold any more; otherwise, it is a partial functional

dependency.

Examples:

1. SSN ENAME

2. PNUMBER {PNAME, PLOCATION}

3. {SSN, PNUMBER} HOURS

FD is property of the relation schema R, not of a particular relation state/instance

Let R be a relation schema, where X R and Y R

t1, t2 r, t1[X] = t2[X] t1[Y] = t2[Y]

The FD X Y holds on R if and only if for all possible relations r(R), whenever two tuples of r

agree on the attributes of X, they also agree on the attributes of Y.

the single arrow denotes "functional dependency"

X Y can also be read as "X determines Y"

the double arrow denotes "logical implication"

6.2.1 Inference Rules

IR1. Reflexivity e.g. X X

a formal statement of trivial dependencies; useful for derivations

IR2. Augmentation e.g. X Y XZ Y

if a dependency holds, then we can freely expand its left hand side

IR3. Transitivity e.g. X Y, Y Z X Z

the "most powerful" inference rule; useful in multi-step derivations

Armstrong inference rules are

sound

meaning that given a set of functional dependencies F specified on a relation schema R,

any dependency that we can infer from F by using IR1 through IR3 holds every relation

state r of R that specifies the dependencies in F. In other words, rules can be used to

derive precisely the closure or no additional FD can be derived.

complete

meaning that using IR1 through IR3 repeatedly to infer dependencies until no more

dependencies can be inferred results in the complete set of all possible dependencies that

can be inferred from F. In other words, given a set of FDs, all implied FDs can be derived

using these 3 rules.

Closure of a Set of Functional Dependencies

Given a set X of FDs in relation R, the set of all FDs that are implied by X is called the

closure of X, and is denoted X+

.

**Algorithms for determining X+**

X

+

:= X;

repeat

oldX+ := X+

for each FD Y Z in F do

if Y X

+

then X+

:= X+ Z;

until oldX+ = X+

;

Example:

A BC

E CF

B E

CD EF

Compute {A, B}+

of the set of attributes under this set of FDs.

Solution:

Step1: {A, B}+

:= {A, B}.

Go round the inner loop 4 time, once for each of the given FDs.

On the first iteration, for A BC

A {A, B}+

{A, B}+

:= {A, B, C}.

Step2: On the second iteration, for E CF, {A, B, C}

Step3 :On the third iteration, for B E

B {A, B,C}+

{A, B}+

:= {A, B, C, E}.

**Step4:**On the fourth iteration, for CD EF remains unchanged

Go round the inner loop 4 times again. On the first iteration result does not change; on the

second it expands to {A,B,C,E,F}; On the third and forth it does not change.

Now go round the inner loop 4 times. Closure does not change and so the whole process

terminates, with

{A,B}+ = {A,B,C,E,F}

Example.

F = { SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN,PNUMBER}

HOURS }

{SSN}+ = {SSN, ENAME}

{PNUMBER}+ = ?

{SSN,PNUMBER}+ = ?

**6.3 Normalization**

The purpose of normalization.

The problems associated with redundant data.

The identification of various types of update anomalies such as insertion, deletion, and

modification anomalies.

How to recognize the appropriateness or quality of the design of relations.

The concept of functional dependency, the main tool for measuring the appropriateness

of attribute groupings in relations.

How functional dependencies can be used to group attributes into relations that are in a

known normal form.

How to define normal forms for relations.

How to undertake the process of normalization.

How to identify the most commonly used normal forms, namely first (1NF), second

(2NF), and third (3NF) normal forms, and Boyce-Codd normal form (BCNF).

How to identify fourth (4NF), and fifth (5NF) normal forms.

Main objective in developing a logical data model for relational database systems is to create an

accurate representation of the data, its relationships, and constraints. To achieve this objective,

we must identify a suitable set of relations. A technique for producing a set of relations with

desirable properties, given the data requirements of an enterprise

**NORMAL FORMS**

A relation is defined as a set of tuples. By definition, all elements of a set are distinct; hence, all

tuples in a relation must also be distinct. This means that no two tuples can have the same

combination of values for all their attributes.

Any set of attributes of a relation schema is called a superkey. Every relation has at least one

superkey—the set of all its attributes. A key is a minimal superkey, i.e., a superkey from which

we cannot remove any attribute and still have the uniqueness constraint hold.

In general, a relation schema may have more than one key. In this case, each of the keys is called

a candidate key. It is common to designate one of the candidate keys as the primary key of the

relation. A foreign key is a key in a relation R but it's not a key (just an attribute) in other

relation R' of the same schema.

Integrity Constraints

The entity integrity constraint states that no primary key value can be null. This is because the

primary key value is used to identify individual tuples in a relation; having null values for the

primary key implies that we cannot identify some tuples.

The referential integrity constraint is specified between two relations and is used to maintain

the consistency among tuples of the two relations. Informally, the referential integrity constraint

states that a tuple in one relation that refers to another relation must refer to an existing tuple in

that relation.

An attribute of a relation schema R is called a prime attribute of the relation R if it is a member

of any key of the relation R. An attribute is called nonprime if it is not a prime attribute—that is,

if it is not a member of any candidate key.

The goal of normalization is to create a set of relational tables that are free of redundant data and

that can be consistently and correctly modified. This means that all tables in a relational database

should be in the in the third normal form (3 NF).

Normalization of data can be looked on as a process during which unsatisfactory relation

schemas are decomposed by breaking up their attributes into smaller relation schemas that

possess desirable properties. One objective of the original normalization process is to ensure that

the update anomalies such as insertion, deletion, and modification anomalies do not occur.

The most commonly used normal forms

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form

Other Normal Forms

Fourth Normal Form

Fifth Normal Form

Domain Key Normal Form

**6.3.1 First Normal Form (1NF)**

First normal form is now considered to be part of the formal definition of a relation; historically, it was defined to disallow multivalued attributes, composite attributes, and their combinations. It states that the domains of attributes must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.

Practical Rule: "Eliminate Repeating Groups," i.e., make a separate table for each set of related attributes, and give each table a primary key.

Formal Definition: A relation is in first normal form (1NF) if and only if all underlying simple

domains contain atomic values only.

**6.3.2 Second Normal Form (2NF)**

Second normal form is based on the concept of fully functional dependency. A functional X Y

is a fully functional dependency is removal of any attribute A from X means that the dependency

does not hold any more. A relation schema is in 2NF if every nonprime attribute in relation is

fully functionally dependent on the primary key of the relation. It also can be restated as: a

relation schema is in 2NF if every nonprime attribute in relation is not partially dependent on any

key of the relation.

Practical Rule: "Eliminate Redundant Data," i.e., if an attribute depends on only part of a

multivalued key, remove it to a separate table.

Formal Definition: A relation is in second normal form (2NF) if and only if it is in 1NF and

every nonkey attribute is fully dependent on the primary key.

**6.3.3 Third Normal Form (3NF)**

Third normal form is based on the concept of transitive dependency. A functional dependency

X Y in a relation is a transitive dependency if there is a set of attributes Z that is not a subset

of any key of the relation, and both X Z and Z Y hold. In other words, a relation is in 3NF

if, whenever a functional dependency

X A holds in the relation, either (a) X is a superkey of the relation, or (b) A is a prime

attribute of the relation.

Practical Rule: "Eliminate Columns not Dependent on Key," i.e., if attributes do not contribute to

a description of a key, remove them to a separate table.

Formal Definition: A relation is in third normal form (3NF) if and only if it is in 2NF and every

nonkey attribute is nontransitively dependent on the primary key.

1NF: R is in 1NF iff all domain values are atomic.

2NF: R is in 2 NF iff R is in 1NF and every nonkey attribute is fully dependent on the key.

3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on the

key.

6.4 Boyce-Codd Normal Form (BCNF)

A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever a FD X -> A holds in

R, then X is a superkey of R

Each normal form is strictly stronger than the previous one:

Every 2NF relation is in 1NF Every 3NF relation is in 2NF

Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

A relation is in BCNF, if and only if every determinant is a candidate key.

Additional criteria may be needed to ensure the the set of relations in a relational database are satisfactory

## Use Me ?