Week8: Lecture
Contents
What is meant by….?
·
Query
·
Basic concepts related to
query processing
What is a query?
·
Query is a question/request
to database for some action.
o INSERT query for data entry
o SELECT query for data retrieval
o UPDATE query for data modification
o DELETE query for data removal
·
Query is always written in
some database language like SQL.
·
Query is a way to interact
with database.
·
A client submits an SQL
query while server generates response after executing the requested operation.
** ADDBMS consists of both client
and server components. These components may reside on samemachine or even on
two different machines separately.
Query processing
DB server is responsible for actual execution of SQL query
however SQL query can’t be executed as it is.The reason is that it tells only
high level descriptions about data, speaks nothing about its hidden low level physical
details like fragmentation, replication, distribution details etc. Therefore, an input SQL query undergoes certain
transformations before its actual execution takes place.
Query processing is a general term and involves basically
following steps
·
Syntax check
·
Semantics verification
·
Translation into relational
algebraic expression
·
Optimization
·
Execution
First 3 steps refer to query parsing. Query Parserchecks the
syntax and validates the semantics of the query (it means verifying the
relation names, the attribute names in the query are the names of relations and
attributes in the database). it constructs a parse tree and then translates
into relational algebra expression.
Why optimization?
A relational algebraic
expression may have many equivalent expressions e.g.
Πage
(σage<20 (Student))
=σage<20(Πage(Student))
It means that a query may
be executed in a number of ways. The goal of optimization is to find best
execution plan such that disk accesses, amount of memory, processing time and
communication cost could be minimized.
Some important
terminologies
Catalog
In a centralized database system the catalog is used to
primarily store the schema, which contains information regarding relations,
indices and views. The meta-data stored for relations includes the relation
name, attribute names and data types and integrity constraints. Various statistics
are also stored in the catalog such as the number of distinct keys in a given
attribute and the cardinality of a particular relation. These statistics aid
the query optimizer in estimating the size of intermediate results of execution
plans, which allows the optimizer to determine an estimated cost of plans.
In a distributed DBMS the catalog has to store additional
information including the location of relations and their replicas. The catalog
must also include system wide information such as the number of sites in the
system along with their identifiers.
An issue that arises in a
distributed setting is where to place the catalog in the system. There are two
alternatives:eitherstore the catalog at a single site or replicate it at all
sites. Storing the catalog at a single site introduces a single point of failure
in the system. If the site containing the catalog fails then the entire system
cannot access the catalog, which may cause the entire system to come to a halt.
Placing a copy of the catalog at a single site causes the site to become
burdened by many catalog queries from other sites, which may lead to
performance degradation. Creating replicas of the catalog at each site
eliminates the single point of failure problem and also reduces the network
communication required as all sites do not have to access the catalog remotely.
However, in a system where catalog updates are frequent, much time may be
invested in synchronizing the catalog replicas, which may degrade system
performance
Query tree:
It refers to a tree data structure that corresponds to some
relational algebra expression. It has 3 types of nodes: root (at top), internal
nodes (middle) and leaf nodes (at bottom). Its leaves represent input relations
in query;internal nodes represent intermediate relations while root represents
the result.
What is relational algebra?
·
The formal description of
how a relational database operates
·
An algebra whose operands
or relations and operators are designed to perform some basic operations on the
relations.
·
Operators may be either
unary, ( single operand) or binary ( two operands)
It
has following 5 basic operations, purely based on set theory.
I.
Selection
II.
Projection
III.
Union
IV.
Set difference
V.
Cross product
·
Selection : σ
o
Selects a subset of rows
satisfying certain criteria e.gσage<20 (Student)
- Unary operator
- represents where clause criteria
- Projection: Π
- Selects a subset of attributes of a relation e.g. Πid, sname(BSCS)
- Unary operator
- Union: U
- for union operation ( combination) of two relations
- binary operator
- Eliminates duplicate rows
- Input relations must have same schema
- Corresponding fields must have same data type
- BSCS U MCS combines all the students of BSCS and MCS together.
- Set different: -
- Returns those tuples which belong to one relation and not the other
- Binary operator
- Excludes common tuples
- BSCS - FEEPAID returns all BSCS students which are fee defaulters
- Cartesian product: x
- Combines the information of two different relations
- Pairs each tuple of a relation to each tuple of other relation
- Binary operator
Other important additional operator are join and rename
operator is
- Join: ⋈
- Binary operator
- Combines tuples of both relations where they have common attribute values.
- Rename: ρ
- Renames a relation
- Unary operator
- Just like assignment operator.
SQL
|
Relational Algebraic Expression
|
|
SELECT id, sname
FROM Student
|
Πid,sname(Student)
|
|
SELECT id, sname
FROM Student
WHERE id=10
|
Πid,sname(σid=10 (Student))
Or
σid=10(Πid,sname(Student))
|
A = σid=10 (Student)
R = Πid,sname(A)
or
A = Πid,sname (Student)
R = σid=10 (A)
|
SELECT id, sname, amount
FROM Student, Fee
|
Πid,sname(Student x Fee)
|
|
SELECT *
FROM Student, Fee
WHERE Student.id=Fee.StudentId
|
σid=studentId (Student x Fee)
or
Student ⋈idFee
|
A = Student x Fee
B = σid=studentId (A)
|
Post a Comment