Sunday, February 8, 2009

MCA 203

MCA 203 - Data Base Management System

Brief Summary

LESSON – 1

Database
A collection of related data.

More definitions of File, Record, Data and Information.

Characteristics of DBMS

Self describing nature
Insulation between programs and data and data abstraction
Support of multiple views of data.
Sharing of data and multi-user transaction processing.
etc.

Uses of DBMS
Reducing redundancy and inconsistency
Authorized Access
Multiple user interfaces
Enforcing integrity constraints
Provides backup and recovery etc.

Implications of database approach

Potential for enforcing standards
Reduced application development time
Flexibility
Availability of latest updates.
Economies of scale etc.















LESSON – 2

Architecture
Schema: Description of the database.
Instance: The data in the database stored at a particular moment.
Two-tier
Consists of Client and Server.
ODBC and JDBC used for interaction.

Three-tier
Internal schema
Conceptual schema
External schema

Data Independence

Logical data independence
Capacity to change conceptual schema without changing external schema.

Physical data independence
Capacity to change internal schema without changing conceptual schema.

Database Languages
Data Definition Language
Data Dictionary
Metadata

Data Manipulation Language
Enables user to access or manipulate data.
Procedural DML
Non- Procedural DML
Query Language
A statement requesting retrieval of information.







LESSON – 3

Database Administrator
Responsibilities:
§ Schema definition
§ Authorizing access to the database
Coordinating and monitoring its use.
Maintain database integrity.
Acquiring software & Hardware resources etc.
Database Designers
Responsibilities:
To identify data to be stored.
Choose appropriate structure to represent and store the data
Create a view that meets data & Processing requirements
End Users
Casual users
Naive users
Sophisticated end user
Stand-alone users
System Analysts and Application Programmer
Analysts determine requirements of end users
Programmers implement the specifications in the form of programs.
Data Models
The Entity- Relationship Model
Represents data as set of Entities, Attributes, relationships among them etc.
Relational
Represents data as collection of tables.
Network
Represents data as record types and set type.
Hierarchical
Represents data as tree structure.
Object Oriented
Defines data in terms of objects, their properties and their operations.
LESSON – 4 & 5

ER Model
Data models are tools used in analysis to describe the data requirements and assumptions in the system from a top-down perspective. They also set the stage for the design of databases later on in the SDLC.
There are three basic elements in ER models:
Entity: Basic object represented by the ER model.
§ Attributes are the data we collect about the entities.

§ Relationships provide the structure needed to draw information from multiple entities.

Attributes: Characteristics/ Properties of an entity.
Composite attributes
Single valued
Multivalued
Derived

Entity type
Value sets
Key attributes
Relationships
Cardinality
Degree of relation
Weak entity types

Notations
Rectangle – Entity type
Rhombus – Relationship
Oval – Attribute
etc.
Relationships
§ Binary and Tertiary relations
§ one to one
§ one to many
§ many to many
§ degree binary or ternary etc.



Fig: An example of an ER Diagram































Normalization

First normal form (1NF) sets the very basic rules for an organized database:
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Second normal form (2NF) further addresses the concept of removing duplicative data:
Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.
Third normal form (3NF) goes one large step further:
Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.
Fourth normal form (4NF) has one additional requirement:
Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.




Relational Model

§ A relational database stores all its data inside tables. All operations on data are done on the tables themselves or produces another tables as the result.
§ A table is a set of rows and columns.
§ Each row is a set of columns with only one value for each.
§ All rows from the same table have the same set of columns, although some columns may have NULL values, i.e. the values for that rows was not initialized.
§ A NULL value for a string column is different from an empty string.
§ The rows from a relational table is analogous to a record, and the columns to a field. Here's an example of a table and the SQL statement that creates the table: CREATE TABLE ADDR_BOOK ( NAME char(30), COMPANY char(20), E_MAIL char (25) ) +-------------------+---------------+-----------------------+ NAME COMPANY E_MAIL +===================+===============+=======================+ Nandita IBM nand@ibm.com +-------------------+---------------+-----------------------+ Bill Gates Microsoft bill@microsoft.com +-------------------+---------------+-----------------------+
Two basic operations you can perform on a relational table:
§ The first one is retrieving a subset of its columns.
§ The second is retrieving a subset of its rows.
Operations between two tables:
§ cartesian product of the tables
§ intersection between two tables
§ add one table to another and







Relational Algebra

Relational algebra is a procedural data manipulation language (DML) for the relational model.
§ Standard relational algebra consists of:
a set of operators that map relations to relations
rules for combining those operations into expressions
rules for evaluating such expressions
§ The basic components of relational algebra (RA) expressions are relations
The semantics of RA operations are determined using:
regular "conditional set" expressions e.g. { X condition }
various tuple notations: t[AB] (extract attributes) (x,y,z) (enumerated tuples)
quantifiers, set operations, boolean operators
Notational Conventions
The standard treatment of relational algebra uses Greek symbols. The following notation are used:


Operation

StandardNotation
Selection

expr(Rel)
Projection

A,B,C(Rel)
Join

Rel1 expr Rel2



Selection
The selection operation returns a subset of the tuples in a relation that satisfy a specified condition.
C(r) = Select[C](r) = { t t r C(t) }, where r(R)
The selection condition C is a boolean expression that involves attributes from the relation and constant values.

Operationally:
§ C is evaluated for each tuple in r, with the attribute "variables" in C set to the their values in that tuple
§ any tuple for which C evaluates to true is returned in the result
§ all other tuples are not included in the result
Projection
The projection operation returns a set of tuples containing a subset of the attributes in the original relation.
X(r) = Project[X](r) = { t[X] t r }, where r(R)
The X specifies a subset of the attributes of R.
Operationally:
§ take each tuple in the original relation
§ extract values of the specified subset of the attributes
§ form a new tuple from these values
§ place this new tuple in the result, if not already there
Union
The union operation combines two compatible relations into a single relation via set union of sets of tuples.
r1 r2 = { t t r1 t r2 }, where r1(R), r2(R)

Operationally:
§ make a copy of r1
§ examine each tuple in r2
§ if it's not already in the result, place it there
Intersection
The intersection operation combines two compatible relations into a single relation via set intersection of sets of tuples.
r1 r2 = { t t r1 t r2 }, where r1(R), r2(R)

Operationally:
§ examine each tuple in r1
§ check whether it also appears in r2
§ if it does, place it in the result, otherwise ignore it
Difference
The difference operation finds the set of tuples that exist in one relation but do not occur in a second compatible relation.
r1 - r2 = { t t r1 t r2 }, where r1(R), r2(R)

Operationally:
§ examine each tuple in r1
§ check whether it also appears in r2
§ if it does not, place it in the result, otherwise ignore it
Product
The product (Cartesian product) combines information from two relations pairwise on tuples.
r s = { (t1 : t2) t1 r t2 s }, where r(R), s(S)

Operationally:
§ examine each tuple in r
§ form new tuples by pairing it with each tuple in s
§ place all of these tuples in the result
Natural Join
The natural join operation is a specialised product:
containing only pairs that match on their common attributes
with one of each pair of common attributes eliminated
Consider two relation schemas R(ABC..JKLM) and S(KLMN..XYZ), the natural join of the relations r(R) and s(S) is defined as:
R S = R Join S = { (t1[ABC..J] : t2[K..XYZ]) t1 r t2 s match }
where match = t1[K] = t2[K] t1[L] = t2[L] t1[M] = t2[M]
Join can also be defined in terms of the other relational algebra operations:
r Join s = Project[R S] ( Select[match] ( r s) )


Division
Consider two relation schemas R and S where S R.
The division operation is defined on two instances r(R) and s(S) as:
r / s = r Divide s = { t t r[R-S] satisfy }
where satisfy = ts S ( tr R ( tr[S] = ts tr[R-S] = t ) )

Operationally:
§ consider each subset of tuples in R that match on t[R-S]
§ for this subset of tuples, take the t[S] values from each
§ if this covers all tuples in S, then include t[R-S] in the result

No comments: