Superior University Depalpur

 


Week8:           Lecture Contents

What is meant by….?
·         Query
·         Query processing
·         Basic concepts related to query processing

What is a query?
·         Query is a question/request to database for some action.
o   INSERT query for data entry
o   SELECT query for data retrieval
o   UPDATE query for data modification
o   DELETE query for data removal
·         Query is always written in some database language like SQL.
·         Query is a way to interact with database.
·         A client submits an SQL query while server generates response after executing the requested operation.
** ADDBMS consists of both client and server components. These components may reside on samemachine or even on two different machines separately.

Query processing
DB server is responsible for actual execution of SQL query however SQL query can’t be executed as it is.The reason is that it tells only high level descriptions about data, speaks nothing about its hidden low level physical details like fragmentation, replication, distribution details etc.  Therefore, an input SQL query undergoes certain transformations before its actual execution takes place.

Query processing is a general term and involves basically following steps
·         Syntax check
·         Semantics verification
·         Translation into relational algebraic expression
·         Optimization
·         Execution

First 3 steps refer to query parsing. Query Parserchecks the syntax and validates the semantics of the query (it means verifying the relation names, the attribute names in the query are the names of relations and attributes in the database). it constructs a parse tree and then translates into relational algebra expression.

Why optimization?
A relational algebraic expression may have many equivalent expressions e.g.
Π­age (σ­­age<20 (Student))
=σ­­age<20(Π­age(Student))

It means that a query may be executed in a number of ways. The goal of optimization is to find best execution plan such that disk accesses, amount of memory, processing time and communication cost could be minimized






Some important terminologies

Catalog

In a centralized database system the catalog is used to primarily store the schema, which contains information regarding relations, indices and views. The meta-data stored for relations includes the relation name, attribute names and data types and integrity constraints. Various statistics are also stored in the catalog such as the number of distinct keys in a given attribute and the cardinality of a particular relation. These statistics aid the query optimizer in estimating the size of intermediate results of execution plans, which allows the optimizer to determine an estimated cost of plans.

In a distributed DBMS the catalog has to store additional information including the location of relations and their replicas. The catalog must also include system wide information such as the number of sites in the system along with their identifiers.
An issue that arises in a distributed setting is where to place the catalog in the system. There are two alternatives:eitherstore the catalog at a single site or replicate it at all sites. Storing the catalog at a single site introduces a single point of failure in the system. If the site containing the catalog fails then the entire system cannot access the catalog, which may cause the entire system to come to a halt. Placing a copy of the catalog at a single site causes the site to become burdened by many catalog queries from other sites, which may lead to performance degradation. Creating replicas of the catalog at each site eliminates the single point of failure problem and also reduces the network communication required as all sites do not have to access the catalog remotely. However, in a system where catalog updates are frequent, much time may be invested in synchronizing the catalog replicas, which may degrade system performance

Query tree:
It refers to a tree data structure that corresponds to some relational algebra expression. It has 3 types of nodes: root (at top), internal nodes (middle) and leaf nodes (at bottom). Its leaves represent input relations in query;internal nodes represent intermediate relations while root represents the result.

What is relational algebra?
·         The formal description of how a relational database operates
·         An algebra whose operands or relations and operators are designed to perform some basic operations on the relations.
·         Operators may be either unary, ( single operand) or binary ( two operands)
It has following 5 basic operations, purely based on set theory.
        I.            Selection
      II.            Projection
    III.            Union
    IV.            Set difference
      V.            Cross product

·         Selection : σ   
o    Selects a subset of rows satisfying certain criteria e.gσ­­age<20 (Student)
    • ­­Unary operator
    • represents where clause criteria
  • Projection: Π­              
    • Selects a subset of attributes of a relation e.g. Π­id, sname(BSCS)
    • Unary operator
  • Union:  U                     
    • for union operation ( combination) of two relations
    • binary operator
    • Eliminates duplicate rows
    • Input relations must have same schema
    • Corresponding fields must have same data type
    • BSCS U MCS combines all the students of BSCS and MCS together.
  • Set different: -
    • Returns those tuples which belong to one relation and not the other
    • Binary operator
    • Excludes common tuples
    • BSCS - FEEPAID returns all BSCS students which are fee defaulters
  • Cartesian product: x
    • Combines the information of two different relations
    • Pairs each tuple of a relation to each tuple of other relation
    • Binary operator
Other important additional operator are join and rename operator is
  • Join:
    • Binary operator
    • Combines tuples of both relations where they have common attribute values.
  • Rename: ρ
    • Renames  a relation
    • Unary operator
    • Just like assignment operator.
SQL
Relational Algebraic Expression
SELECT id, sname
FROM Student
Π­id,sname(Student)

SELECT id, sname
FROM Student
WHERE id=10
Π­id,sname(σ­­id=10 (Student))

Or

σ­­id=10(Π­id,sname(Student))
A =  σ­­id=10 (Student)
R =  Π­id,sname(A)
or
A = Π­id,sname (Student)
R =  σ­­id=10 (A)
SELECT id, sname, amount
FROM Student, Fee
Π­id,sname(Student x Fee)

SELECT *
FROM Student, Fee
WHERE Student.id=Fee.StudentId
σ­­id=studentId (Student x Fee)
or

Student idFee

A = Student x Fee
B = σ­­id=studentId  (A)



Post a Comment

 
Top