Database normalization and denormalization rules are discussed here.
Codd described the objectives of normalization as follows:
1) To free the collection of relations from undesirable insertion, update and deletion dependencies.
2) To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increases the lifespan of application programs.
3) To make the relational program more informative to users
4) To make the collection of results neutral to query statistics, where these statistics are liable to change as time goes by.
The undesired side effects of insert, update or delete may include the following:
- Multiple rows for the same information which are updated independently and get out of sync
- no rows such as when a new user is added but not assigned to anything
- inconsitent deletes such as where one table deletion implies and requires a deletion in a completely different table.
If the addition of new data, requires changes to existing structure, then such changes can cause regressions
Tables when normalized are immediately correspond to real world concepts and their relationships.
The normalized tables are suited for general querying across any set of tables.
Some common normalization terms:
Functional dependency : FD: X->Y where Y attribute has a functional dependency on a set of X attributes if and only if each X value is associated with one and only one Y value.
Trivial Functional dependency : is a FD of an attribute on a superset of itself.
Full Functional dependency : when an attribute is FD on X but not on any subset of X
Transitive dependency : X->Y and Y->Z implies X->Z
Multivalued dependency : presence of some rows implies presence of some others
Join dependency - table can be recreated with joins
Superkey : A superkey is a combination of attributes that can be used to identify a database record.
Candidate key is a minimal superkey such as the Social Security Number
Non-prime attribute is one that does not occur in any candidate key. A prime attribute is one which occurs in some candidate key
A candidate key may be designated as a primary key but is unsually not talked about in with respect to other candidate keys.
Normal forms include the following:
1) First normal form - table has one candidate key
2) Second - no non-prime attribute is FD on a proper subset of any candidate key
3) Third - every non-prime attribute is non-transitively dependent on every candidate key in the table ( no transistive dependency is allowed)
4) EKNF - Every non-trivial FD is either the dependency of an elementary key or a dependency of a superkey
5) BCNF - Every non-trivial FD in the table is a dependency on a super key
6) Fourth - Every non-trivial multivalued dependency is a dependency on a super key
7) Fifth - Every non-trivial join dependency in the table is implied by the super keys of the table.
8) Domain key - Every constraint is a logical consequence of the tables domain constraints or key constraints.
9) Sixth - no non-trivial join dependencies
Denormalization - OLAP is denormalized as compared to OLTP. The redundant data is carefully controlled during ETL. Normalized snowflake schema becomes denormalized star schema. Non-first normal form is formed by nesting 1NF and the reverse is unnesting.
Codd described the objectives of normalization as follows:
1) To free the collection of relations from undesirable insertion, update and deletion dependencies.
2) To reduce the need for restructuring the collection of relations, as new types of data are introduced, and thus increases the lifespan of application programs.
3) To make the relational program more informative to users
4) To make the collection of results neutral to query statistics, where these statistics are liable to change as time goes by.
The undesired side effects of insert, update or delete may include the following:
- Multiple rows for the same information which are updated independently and get out of sync
- no rows such as when a new user is added but not assigned to anything
- inconsitent deletes such as where one table deletion implies and requires a deletion in a completely different table.
If the addition of new data, requires changes to existing structure, then such changes can cause regressions
Tables when normalized are immediately correspond to real world concepts and their relationships.
The normalized tables are suited for general querying across any set of tables.
Some common normalization terms:
Functional dependency : FD: X->Y where Y attribute has a functional dependency on a set of X attributes if and only if each X value is associated with one and only one Y value.
Trivial Functional dependency : is a FD of an attribute on a superset of itself.
Full Functional dependency : when an attribute is FD on X but not on any subset of X
Transitive dependency : X->Y and Y->Z implies X->Z
Multivalued dependency : presence of some rows implies presence of some others
Join dependency - table can be recreated with joins
Superkey : A superkey is a combination of attributes that can be used to identify a database record.
Candidate key is a minimal superkey such as the Social Security Number
Non-prime attribute is one that does not occur in any candidate key. A prime attribute is one which occurs in some candidate key
A candidate key may be designated as a primary key but is unsually not talked about in with respect to other candidate keys.
Normal forms include the following:
1) First normal form - table has one candidate key
2) Second - no non-prime attribute is FD on a proper subset of any candidate key
3) Third - every non-prime attribute is non-transitively dependent on every candidate key in the table ( no transistive dependency is allowed)
4) EKNF - Every non-trivial FD is either the dependency of an elementary key or a dependency of a superkey
5) BCNF - Every non-trivial FD in the table is a dependency on a super key
6) Fourth - Every non-trivial multivalued dependency is a dependency on a super key
7) Fifth - Every non-trivial join dependency in the table is implied by the super keys of the table.
8) Domain key - Every constraint is a logical consequence of the tables domain constraints or key constraints.
9) Sixth - no non-trivial join dependencies
Denormalization - OLAP is denormalized as compared to OLTP. The redundant data is carefully controlled during ETL. Normalized snowflake schema becomes denormalized star schema. Non-first normal form is formed by nesting 1NF and the reverse is unnesting.
No comments:
Post a Comment