Data Warehousing Midterm
Essay by pravin mukare • March 10, 2018 • Term Paper • 3,983 Words (16 Pages) • 1,047 Views
MIS636 Individual Exam – Due as shown on the Syllabus
Answer 5 of 20 Essay Questions
STUDENT NAME: Anil Thadani
Request: Although I have answered few other questions as well I request you to please evaluate me on the basis of questions marked in yellow (Q1, 3, 5, 9, 14).
- What are the distinguishing characteristics of OLAP and data mining analyses? Compare and contrast OLTP and OLAP queries.
Ans.
OLAP (On line Analytical Processing): This type of processing involves low level analysis of historical data available in database. In short, it is used to see what is the data all about and “what” is happening in it (what trend can be found in data).
Data Mining (also known as Knowledge Discovery of Database(KDD)): This type of processing involves high level analysis of historical data available in database. In short, it is used to find “why” is a particular observed in the data or sometimes even to predict future based on such trends. OLAP and Data mining complement each other.
Features that distinguish OLAP and Data Mining: -
- OLAP is used by organizations to improve current operational efficiency while Data mining is used for short and long-term growth prospects.
- OLAP tools provide multidimensional data analysis to get a summary of data while Data mining tools focus more on ratios and patterns in the set of data.
- Example of OLAP include finding a trend in sales of cars in a decade while example of data mining includes finding the factors that affect this sales trend of cars and predicting how can they be changed to increase sales in future.
- Most companies prefer to use OLAP as compared to Data Mining as OLAP transactions are faster to operate and in many cases, can be used to predict future trends as well whereas Data Mining on the other hand require more skilled users and take more time and analysis to operate. Having said that it is important to note that Data mining results in better predictions as it involves a deeper analysis as compared to OLAP.
OLTP (On line Transactional Processing): This type of processing involves transactions for regular update of current data in database. It can include transactions like inserting new data and updating or deleting existing data.
Example of an OLTP query can include adding new information about car sales each day in the databases while example of OLAP can include analyzing the trend in sales of cars in a decade. So OLTP involves frequent transactions to update databases while OLAP involves less frequent transaction (may be on weekly, monthly or quarterly basis) to analyze the past data.
Following are the differences between OLAP and Data Mining Analyses: -
Features | OLTP | OLAP |
Queries | Simple/standard queries used but generally more frequently | Relatively complex queries involving aggregations are used but less frequently |
Processing speed | Very fast | Varies depending on the amount of data involved. Generally, a lot slower as compared to OLTP |
Database Design | Highly normalized tables for faster update of data and reducing redundancy | De-normalized tables as focus is more on increasing data access speed by the use of star schema. |
Data Source | Data comes from original source of OLTP | Data comes from OLTP databases and is available in read-only form. |
Following is a flow chart taken from Google depicting connection between OLTP, OLAP and Data Mining:-
[pic 1]
[pic 2]
- How does a DBMS ensure application-data independence? Give examples.
Ans.
A DBMS ensure application-data independence with the help of its three schema architecture. This schema provides three levels/layers of data abstraction:-
- External schema: This is the view as seen by the end user while accessing the application that has access to database. This schema allows end user to view only the required data in the desired format (and not necessarily in the format physically stored). Moreover this represents a virtual of the requested data which is temporarily stored till it is required by the user and is discarded after that.
- Conceptual schema: This schema hides the physical storage structure of data and shows only the description of entities, their format, data types, relationships, constraints, etc.
- Internal schema: This schema shows the physical storage structure of the database showing the way they are linked to each other (by the use of access paths and location addresses).
A DBMS transforms the end users’ request from external to conceptual to internal schema and transforms the results of those requests back towards external schema. With the help of these three schemas the DBMS provides two types of data independence:-
- Logical Independence: This is the ability to change the conceptual view without having to change the application programs stored within. This means that if we need to change any relationship or attribute it should be possible without having to make changes in external view or application programs. Moreover if the data stored in physical storage changes it does not affect the format of attributes. E.g. an employee table Emp(ID#, salary#, …) is not affected in terms of its attributes when any employee details are modified/updated and similarly we can add or remove any attribute or relationship without having to change external view or program stored.
- Physical Independence: This is the ability to change the internal schema without having to change the conceptual or external schema. E.g. if we change the physical storage space in order to optimize storage area this does not affect the logical structure of the table or its attributes.
Following diagrams shows how DBMS separates User and data and a three level architecture of DBMS:-
[pic 3] DBMS separating user and data | [pic 4] 3 level architecture of DBMS |
[pic 5]
- What is relational (entity & referential) integrity? What are the possible concerns when updating or deleting foreign keys?
Ans.
Relational Integrity is the property of a DBMS by virtue of which it ensures an efficient data and relationship management in database. It ensures that there is a consistency and accuracy maintained among different interconnected relational tables. E.g. if a table contains author and book written by that author then it should always link the two the way were defined while entering the data. Relational integrity may be divided into following types :-
...
...