Diapositiva 1 - … · Diapositiva 1 Author: usuario Created Date: 20110623152803Z ...
Diapositiva 1
Transcript of Diapositiva 1
![Page 1: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/1.jpg)
Data Wharehousing OLAP Data Wharehousing OLAP Data MiningData Mining
S. CostantiniS. CostantiniUniversità degli Studi di L’AquilaUniversità degli Studi di L’Aquila
[email protected]@di.univaq.it
![Page 2: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/2.jpg)
S. Costantini / Data Wharehousing
2
Ringraziamenti (Acknowledgment)
• Part of this material is taken from: Database Systems: The Complete Book, by Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom, edited by Prentice-Hall.
• URL: http://www-db.stanford.edu/~ullman/dscb.html
![Page 3: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/3.jpg)
S. Costantini / Data Wharehousing
3
Cos’è in sostanza un Data Wharehouse?
• E’ una vista materializzata• Aggiornata a intervalli stabiliti (a
seconda dell’applicazione)• E’ un cosiddetto “sistema di
integrazione di dati” perché può contenere dati provenienti da vari database (detti “sorgenti”)
![Page 4: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/4.jpg)
S. Costantini / Data Wharehousing
4
Perché i Data Warehouse?
• Perché le query di analisi statistica ed esame dei dati per estrarne varie informazioni (dette query “OLAP”, vedi seguito) sono pesanti e diminuiscono troppo la performance del sistema. Però non necessitano della versione più aggiornata dei dati.
![Page 5: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/5.jpg)
S. Costantini / Data Wharehousing
5
Perché i Data Wharehouse
• Allora conviene separare le query usuali dalle query OLAP, creando per queste ultime un Data Wharehouse
• Per le query OLAP il modello relazionale non è ottimale, quindi nel creare un Data Wharehouse il modello dei dati viene modificato.
![Page 6: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/6.jpg)
S. Costantini / Data Wharehousing
6
Observation• Traditional database systems are tuned
to many, small, simple queries.• Some new applications use fewer, more
time-consuming, complex queries.• New architectures have been
developed to handle complex “analytic” queries efficiently.
![Page 7: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/7.jpg)
S. Costantini / Data Wharehousing
7
The Data Warehouse• The most common form of data
integration.– Copy sources into a single DB (warehouse)
and try to keep it up-to-date.– Usual method: periodic reconstruction of
the warehouse, perhaps overnight.– Frequently essential for analytic queries.
![Page 8: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/8.jpg)
S. Costantini / Data Wharehousing
8
OLTP• Most database operations involve On-
Line Transaction Processing (OTLP).– Short, simple, frequent queries and/or
modifications, each involving a small number of tuples.
– Examples: Answering queries from a Web interface, sales at cash registers, selling airline tickets.
![Page 9: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/9.jpg)
S. Costantini / Data Wharehousing
9
OLAP• Of increasing importance are On-Line
Application Processing (OLAP) queries.– Few, but complex queries --- may run for
hours.– Queries do not depend on having an
absolutely up-to-date database.
![Page 10: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/10.jpg)
S. Costantini / Data Wharehousing
10
OLAP Examples1. Amazon analyzes purchases by its
customers to come up with an individual screen with products of likely interest to the customer.
2. Analysts at Wal-Mart look for items with increasing sales in some region.
![Page 11: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/11.jpg)
S. Costantini / Data Wharehousing
11
Data Warehouses• Doing OLTP and OLAP in the same database
system is often impractical– Different performance requirements– Analysis queries require data from many sources
• Solution: Build a “data warehouse”– Copy data from various OLTP systems– Optimize data organization, system tuning for OLAP– Transactions aren’t slowed by big analysis queries– Periodically refresh the data in the warehouse
![Page 12: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/12.jpg)
S. Costantini / Data Wharehousing
12
Common Architecture• Relational Databases handle OLTP.• Local databases copied to a central
warehouse overnight.• Analysts use the warehouse for OLAP.
![Page 13: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/13.jpg)
S. Costantini / Data Wharehousing
13
Definition of data warehousing
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.
![Page 14: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/14.jpg)
S. Costantini / Data Wharehousing
14
Loading the Data Warehouse
Source Systems
Data Staging Area
Data Warehouse(OLTP)
Data is periodically extracted
Data is cleansed and transformed
Users query the data warehouse
![Page 15: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/15.jpg)
S. Costantini / Data Wharehousing
15
Data Mining• Data mining is a popular term for
queries that summarize big data sets in useful ways.
• Examples:1. Clustering all Web pages by topic.2. Finding characteristics of fraudulent
credit-card use.
![Page 16: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/16.jpg)
S. Costantini / Data Wharehousing
16
Data Warehouse
Customers
Etc…
Vendors Etc…
Orders
DataWarehouse
Enterprise“Database”
Transactions
Copied, organizedsummarized
Data Mining
Data Miners:• “Farmers” – they know• “Explorers” - unpredictable
![Page 17: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/17.jpg)
S. Costantini / Data Wharehousing
17
Market-Basket Data• An important form of mining from
relational data involves market baskets = sets of “items” that are purchased together as a customer leaves a store.
• Summary of basket data is frequent itemsets = sets of items that often appear together in baskets.
![Page 18: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/18.jpg)
S. Costantini / Data Wharehousing
18
Data Mining Flavors• Directed – Attempts to explain or
categorize some particular target field such as income or response.
• Undirected – Attempts to find patterns or similarities among groups of records without the use of a particular target field or collection of predefined classes.
![Page 19: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/19.jpg)
S. Costantini / Data Wharehousing
19
Data Mining Examples in Enterprises
• Government– Track down criminals (Police also)– Treasury Dept – suspicious int’l funds
transfer• Phone companies• Supermarkets & Superstores • Mail-Order, On-Line Order
![Page 20: Diapositiva 1](https://reader036.fdocuments.co/reader036/viewer/2022082722/58f3866e1a28ab60358b45cf/html5/thumbnails/20.jpg)
S. Costantini / Data Wharehousing
20
Data Mining Examples in Enterprises
• Financial Institutions • Insurance Companies • Web sites• Many others…