【Oracle】数据仓库_oracle做实时数仓
个人主页:Guiat
归属专栏:Oracle
文章目录
- 1. 数据仓库概述
-
- 1.1 为什么需要数据仓库
- 1.2 Oracle数据仓库架构
- 1.3 Oracle数据仓库关键技术
- 2. 数据仓库建模
-
- 2.1 维度建模基础
- 2.2 星形模式设计
- 2.3 雪花模式设计
- 2.4 缓慢变化维度(SCD)处理
- 3. ETL数据集成
-
- 3.1 ETL流程设计
- 3.2 数据抽取(Extract)
- 3.3 数据转换(Transform)
- 3.4 数据加载(Load)
- 4. 物化视图优化
-
- 4.1 物化视图基础
- 4.2 聚合物化视图
- 4.3 复杂聚合物化视图
- 4.4 物化视图刷新策略
- 5. 分区策略
-
- 5.1 分区类型选择
- 5.2 复合分区策略
- 5.3 分区维护操作
- 6. 索引策略
-
- 6.1 数据仓库索引设计
- 6.2 位图索引应用
- 6.3 函数索引优化
- 7. 查询优化
-
- 7.1 星形转换优化
- 7.2 并行查询优化
- 7.3 分区消除优化
- 8. 数据压缩
-
- 8.1 表压缩策略
- 8.2 索引压缩
- 8.3 压缩效果监控
- 9. 实时数据仓库
-
- 9.1 实时ETL架构
- 9.2 Oracle GoldenGate集成
- 9.3 Lambda架构实现
- 10. 数据仓库治理
-
- 10.1 数据质量管理
- 10.2 元数据管理
- 10.3 数据血缘追踪
正文
1. 数据仓库概述
数据仓库就像是企业的\"数据金库\",把散落在各个系统中的数据统一收集起来,经过清洗、整理后,为决策分析提供统一的数据视图。Oracle数据仓库不仅仅是存储数据,更是一个完整的分析平台。
1.1 为什么需要数据仓库
- 统一数据源,避免\"数据孤岛\"问题
- 历史数据保存,支持趋势分析
- 优化查询性能,专门为分析而设计
- 提供一致的数据定义和业务规则
- 支持复杂的分析和报表需求
1.2 Oracle数据仓库架构
Oracle数据仓库采用分层架构,就像盖房子一样,一层一层往上建:
#mermaid-svg-l8DYi08zZGaeoqf0 {font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-l8DYi08zZGaeoqf0 .error-icon{fill:#552222;}#mermaid-svg-l8DYi08zZGaeoqf0 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-l8DYi08zZGaeoqf0 .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-l8DYi08zZGaeoqf0 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-l8DYi08zZGaeoqf0 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-l8DYi08zZGaeoqf0 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-l8DYi08zZGaeoqf0 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-l8DYi08zZGaeoqf0 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-l8DYi08zZGaeoqf0 .marker.cross{stroke:#333333;}#mermaid-svg-l8DYi08zZGaeoqf0 svg{font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-l8DYi08zZGaeoqf0 .label{font-family:\"trebuchet ms\",verdana,arial,sans-serif;color:#333;}#mermaid-svg-l8DYi08zZGaeoqf0 .cluster-label text{fill:#333;}#mermaid-svg-l8DYi08zZGaeoqf0 .cluster-label span{color:#333;}#mermaid-svg-l8DYi08zZGaeoqf0 .label text,#mermaid-svg-l8DYi08zZGaeoqf0 span{fill:#333;color:#333;}#mermaid-svg-l8DYi08zZGaeoqf0 .node rect,#mermaid-svg-l8DYi08zZGaeoqf0 .node circle,#mermaid-svg-l8DYi08zZGaeoqf0 .node ellipse,#mermaid-svg-l8DYi08zZGaeoqf0 .node polygon,#mermaid-svg-l8DYi08zZGaeoqf0 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-l8DYi08zZGaeoqf0 .node .label{text-align:center;}#mermaid-svg-l8DYi08zZGaeoqf0 .node.clickable{cursor:pointer;}#mermaid-svg-l8DYi08zZGaeoqf0 .arrowheadPath{fill:#333333;}#mermaid-svg-l8DYi08zZGaeoqf0 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-l8DYi08zZGaeoqf0 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-l8DYi08zZGaeoqf0 .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-l8DYi08zZGaeoqf0 .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-l8DYi08zZGaeoqf0 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-l8DYi08zZGaeoqf0 .cluster text{fill:#333;}#mermaid-svg-l8DYi08zZGaeoqf0 .cluster span{color:#333;}#mermaid-svg-l8DYi08zZGaeoqf0 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-l8DYi08zZGaeoqf0 :root{--mermaid-font-family:\"trebuchet ms\",verdana,arial,sans-serif;} 数据源层 数据集成层 数据存储层 数据服务层 数据应用层 OLTP系统 外部数据 文件系统 第三方API ODI/GoldenGate ETL工具 数据清洗 数据转换 ODS操作数据存储 数据仓库核心 数据集市 元数据库 OLAP服务 报表服务 数据挖掘 API接口 BI工具 报表平台 自助分析 移动应用
1.3 Oracle数据仓库关键技术
2. 数据仓库建模
2.1 维度建模基础
维度建模就像是搭积木,把复杂的业务用简单的\"事实\"和\"维度\"来表达:
#mermaid-svg-egv8CHWmlRuIdDvm {font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-egv8CHWmlRuIdDvm .error-icon{fill:#552222;}#mermaid-svg-egv8CHWmlRuIdDvm .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-egv8CHWmlRuIdDvm .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-egv8CHWmlRuIdDvm .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-egv8CHWmlRuIdDvm .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-egv8CHWmlRuIdDvm .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-egv8CHWmlRuIdDvm .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-egv8CHWmlRuIdDvm .marker{fill:#333333;stroke:#333333;}#mermaid-svg-egv8CHWmlRuIdDvm .marker.cross{stroke:#333333;}#mermaid-svg-egv8CHWmlRuIdDvm svg{font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-egv8CHWmlRuIdDvm .label{font-family:\"trebuchet ms\",verdana,arial,sans-serif;color:#333;}#mermaid-svg-egv8CHWmlRuIdDvm .cluster-label text{fill:#333;}#mermaid-svg-egv8CHWmlRuIdDvm .cluster-label span{color:#333;}#mermaid-svg-egv8CHWmlRuIdDvm .label text,#mermaid-svg-egv8CHWmlRuIdDvm span{fill:#333;color:#333;}#mermaid-svg-egv8CHWmlRuIdDvm .node rect,#mermaid-svg-egv8CHWmlRuIdDvm .node circle,#mermaid-svg-egv8CHWmlRuIdDvm .node ellipse,#mermaid-svg-egv8CHWmlRuIdDvm .node polygon,#mermaid-svg-egv8CHWmlRuIdDvm .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-egv8CHWmlRuIdDvm .node .label{text-align:center;}#mermaid-svg-egv8CHWmlRuIdDvm .node.clickable{cursor:pointer;}#mermaid-svg-egv8CHWmlRuIdDvm .arrowheadPath{fill:#333333;}#mermaid-svg-egv8CHWmlRuIdDvm .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-egv8CHWmlRuIdDvm .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-egv8CHWmlRuIdDvm .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-egv8CHWmlRuIdDvm .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-egv8CHWmlRuIdDvm .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-egv8CHWmlRuIdDvm .cluster text{fill:#333;}#mermaid-svg-egv8CHWmlRuIdDvm .cluster span{color:#333;}#mermaid-svg-egv8CHWmlRuIdDvm div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-egv8CHWmlRuIdDvm :root{--mermaid-font-family:\"trebuchet ms\",verdana,arial,sans-serif;} 维度建模 事实表 维度表 度量值 外键 粒度 属性 层次结构 缓慢变化维度 销售额 数量 利润 产品名称 类别 品牌
2.2 星形模式设计
星形模式是数据仓库中最常用的建模方式:
-- 时间维度表CREATE TABLE dim_time ( time_key NUMBER PRIMARY KEY, date_value DATE NOT NULL, year_number NUMBER(4) NOT NULL, quarter_number NUMBER(1) NOT NULL, month_number NUMBER(2) NOT NULL, month_name VARCHAR2(20) NOT NULL, day_number NUMBER(2) NOT NULL, day_name VARCHAR2(20) NOT NULL, week_number NUMBER(2) NOT NULL, is_weekend CHAR(1) DEFAULT \'N\', is_holiday CHAR(1) DEFAULT \'N\', fiscal_year NUMBER(4), fiscal_quarter NUMBER(1), fiscal_month NUMBER(2)) COMPRESS FOR OLTP;-- 产品维度表CREATE TABLE dim_product ( product_key NUMBER PRIMARY KEY, product_id VARCHAR2(20) NOT NULL, product_name VARCHAR2(200) NOT NULL, brand VARCHAR2(100), category VARCHAR2(100) NOT NULL, subcategory VARCHAR2(100), unit_cost NUMBER(10,2), unit_price NUMBER(10,2), product_status VARCHAR2(20) DEFAULT \'ACTIVE\', created_date DATE DEFAULT SYSDATE, updated_date DATE DEFAULT SYSDATE) COMPRESS FOR OLTP;-- 客户维度表CREATE TABLE dim_customer ( customer_key NUMBER PRIMARY KEY, customer_id VARCHAR2(20) NOT NULL, customer_name VARCHAR2(200) NOT NULL, customer_type VARCHAR2(50), segment VARCHAR2(50), region VARCHAR2(100), country VARCHAR2(100), city VARCHAR2(100), postal_code VARCHAR2(20), registration_date DATE, credit_limit NUMBER(12,2), customer_status VARCHAR2(20) DEFAULT \'ACTIVE\') COMPRESS FOR OLTP;-- 门店维度表CREATE TABLE dim_store ( store_key NUMBER PRIMARY KEY, store_id VARCHAR2(20) NOT NULL, store_name VARCHAR2(200) NOT NULL, store_type VARCHAR2(50), region VARCHAR2(100), district VARCHAR2(100), city VARCHAR2(100), address VARCHAR2(500), manager_name VARCHAR2(100), opening_date DATE, store_size NUMBER(10,2), store_status VARCHAR2(20) DEFAULT \'OPEN\') COMPRESS FOR OLTP;-- 销售事实表(分区表)CREATE TABLE fact_sales ( time_key NUMBER NOT NULL, product_key NUMBER NOT NULL, customer_key NUMBER NOT NULL, store_key NUMBER NOT NULL, transaction_id VARCHAR2(50), quantity NUMBER(10,2) NOT NULL, unit_price NUMBER(10,2) NOT NULL, total_amount NUMBER(12,2) NOT NULL, discount_amount NUMBER(12,2) DEFAULT 0, tax_amount NUMBER(12,2) DEFAULT 0, profit_amount NUMBER(12,2), cost_amount NUMBER(12,2), CONSTRAINT fk_sales_time FOREIGN KEY (time_key) REFERENCES dim_time(time_key), CONSTRAINT fk_sales_product FOREIGN KEY (product_key) REFERENCES dim_product(product_key), CONSTRAINT fk_sales_customer FOREIGN KEY (customer_key) REFERENCES dim_customer(customer_key), CONSTRAINT fk_sales_store FOREIGN KEY (store_key) REFERENCES dim_store(store_key))COMPRESS FOR QUERY LOWPARTITION BY RANGE (time_key) ( PARTITION p2023q1 VALUES LESS THAN (20230401), PARTITION p2023q2 VALUES LESS THAN (20230701), PARTITION p2023q3 VALUES LESS THAN (20231001), PARTITION p2023q4 VALUES LESS THAN (20240101), PARTITION p2024q1 VALUES LESS THAN (20240401));
2.3 雪花模式设计
雪花模式是星形模式的规范化版本,减少数据冗余:
-- 产品类别表(雪花模式的一部分)CREATE TABLE dim_product_category ( category_key NUMBER PRIMARY KEY, category_id VARCHAR2(20) NOT NULL, category_name VARCHAR2(100) NOT NULL, parent_category_key NUMBER, category_level NUMBER(2), description VARCHAR2(500));-- 产品品牌表CREATE TABLE dim_product_brand ( brand_key NUMBER PRIMARY KEY, brand_id VARCHAR2(20) NOT NULL, brand_name VARCHAR2(100) NOT NULL, country_of_origin VARCHAR2(100), brand_website VARCHAR2(200));-- 修改后的产品维度表(引用类别和品牌表)CREATE TABLE dim_product_snowflake ( product_key NUMBER PRIMARY KEY, product_id VARCHAR2(20) NOT NULL, product_name VARCHAR2(200) NOT NULL, brand_key NUMBER, category_key NUMBER, unit_cost NUMBER(10,2), unit_price NUMBER(10,2), CONSTRAINT fk_product_brand FOREIGN KEY (brand_key) REFERENCES dim_product_brand(brand_key), CONSTRAINT fk_product_category FOREIGN KEY (category_key) REFERENCES dim_product_category(category_key));
2.4 缓慢变化维度(SCD)处理
缓慢变化维度是数据仓库中的经典问题,客户信息、产品信息会随时间变化:
-- SCD Type 1: 直接覆盖(不保留历史)UPDATE dim_customer SET customer_name = \'新公司名称\', updated_date = SYSDATEWHERE customer_id = \'CUST001\';-- SCD Type 2: 保留历史版本CREATE TABLE dim_customer_scd2 ( customer_key NUMBER PRIMARY KEY, customer_id VARCHAR2(20) NOT NULL, customer_name VARCHAR2(200) NOT NULL, customer_type VARCHAR2(50), region VARCHAR2(100), effective_date DATE NOT NULL, expiry_date DATE DEFAULT DATE \'9999-12-31\', is_current CHAR(1) DEFAULT \'Y\', version_number NUMBER DEFAULT 1);-- 处理SCD Type 2的存储过程CREATE OR REPLACE PROCEDURE update_customer_scd2( p_customer_id VARCHAR2, p_new_name VARCHAR2, p_new_type VARCHAR2, p_new_region VARCHAR2) AS v_customer_key NUMBER;BEGIN -- 关闭当前记录 UPDATE dim_customer_scd2 SET expiry_date = SYSDATE - 1, is_current = \'N\' WHERE customer_id = p_customer_id AND is_current = \'Y\'; -- 生成新的键值 SELECT customer_seq.NEXTVAL INTO v_customer_key FROM DUAL; -- 插入新记录 INSERT INTO dim_customer_scd2 ( customer_key, customer_id, customer_name, customer_type, region, effective_date ) VALUES ( v_customer_key, p_customer_id, p_new_name, p_new_type, p_new_region, SYSDATE ); COMMIT;END;/-- SCD Type 3: 保留有限历史(增加字段)CREATE TABLE dim_customer_scd3 ( customer_key NUMBER PRIMARY KEY, customer_id VARCHAR2(20) NOT NULL, current_name VARCHAR2(200), previous_name VARCHAR2(200), current_region VARCHAR2(100), previous_region VARCHAR2(100), last_updated DATE);
3. ETL数据集成
3.1 ETL流程设计
ETL就像是数据的\"流水线\",把原始数据加工成高质量的分析数据:
#mermaid-svg-V1sixqUqKjWP0YvL {font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-V1sixqUqKjWP0YvL .error-icon{fill:#552222;}#mermaid-svg-V1sixqUqKjWP0YvL .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-V1sixqUqKjWP0YvL .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-V1sixqUqKjWP0YvL .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-V1sixqUqKjWP0YvL .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-V1sixqUqKjWP0YvL .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-V1sixqUqKjWP0YvL .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-V1sixqUqKjWP0YvL .marker{fill:#333333;stroke:#333333;}#mermaid-svg-V1sixqUqKjWP0YvL .marker.cross{stroke:#333333;}#mermaid-svg-V1sixqUqKjWP0YvL svg{font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-V1sixqUqKjWP0YvL .label{font-family:\"trebuchet ms\",verdana,arial,sans-serif;color:#333;}#mermaid-svg-V1sixqUqKjWP0YvL .cluster-label text{fill:#333;}#mermaid-svg-V1sixqUqKjWP0YvL .cluster-label span{color:#333;}#mermaid-svg-V1sixqUqKjWP0YvL .label text,#mermaid-svg-V1sixqUqKjWP0YvL span{fill:#333;color:#333;}#mermaid-svg-V1sixqUqKjWP0YvL .node rect,#mermaid-svg-V1sixqUqKjWP0YvL .node circle,#mermaid-svg-V1sixqUqKjWP0YvL .node ellipse,#mermaid-svg-V1sixqUqKjWP0YvL .node polygon,#mermaid-svg-V1sixqUqKjWP0YvL .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-V1sixqUqKjWP0YvL .node .label{text-align:center;}#mermaid-svg-V1sixqUqKjWP0YvL .node.clickable{cursor:pointer;}#mermaid-svg-V1sixqUqKjWP0YvL .arrowheadPath{fill:#333333;}#mermaid-svg-V1sixqUqKjWP0YvL .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-V1sixqUqKjWP0YvL .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-V1sixqUqKjWP0YvL .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-V1sixqUqKjWP0YvL .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-V1sixqUqKjWP0YvL .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-V1sixqUqKjWP0YvL .cluster text{fill:#333;}#mermaid-svg-V1sixqUqKjWP0YvL .cluster span{color:#333;}#mermaid-svg-V1sixqUqKjWP0YvL div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-V1sixqUqKjWP0YvL :root{--mermaid-font-family:\"trebuchet ms\",verdana,arial,sans-serif;} Extract 抽取 Transform 转换 Load 加载 源系统连接 增量抽取 全量抽取 数据清洗 格式转换 业务规则 数据验证 维度加载 事实加载 索引重建 统计信息
3.2 数据抽取(Extract)
-- 创建数据库链接到源系统CREATE DATABASE LINK source_dbCONNECT TO source_user IDENTIFIED BY source_passwordUSING \'source_db_tns\';-- 增量抽取策略CREATE TABLE etl_control ( table_name VARCHAR2(100) PRIMARY KEY, last_extract_time TIMESTAMP, last_extract_key NUMBER, extract_status VARCHAR2(20), updated_date DATE DEFAULT SYSDATE);-- 基于时间戳的增量抽取CREATE OR REPLACE PROCEDURE extract_orders_incremental AS v_last_time TIMESTAMP; v_current_time TIMESTAMP := SYSTIMESTAMP;BEGIN -- 获取上次抽取时间 SELECT last_extract_time INTO v_last_time FROM etl_control WHERE table_name = \'ORDERS\'; -- 抽取增量数据 INSERT INTO staging_orders SELECT * FROM orders@source_db WHERE last_modified > v_last_time; -- 更新控制表 UPDATE etl_control SET last_extract_time = v_current_time, extract_status = \'SUCCESS\', updated_date = SYSDATE WHERE table_name = \'ORDERS\'; COMMIT;EXCEPTION WHEN OTHERS THEN UPDATE etl_control SET extract_status = \'FAILED\' WHERE table_name = \'ORDERS\'; COMMIT; RAISE;END;/-- 基于CDC(Change Data Capture)的抽取CREATE TABLE cdc_orders ( order_id NUMBER, customer_id NUMBER, order_date DATE, total_amount NUMBER, operation_type VARCHAR2(10), -- INSERT, UPDATE, DELETE change_timestamp TIMESTAMP);-- 使用Oracle Streams或GoldenGate进行实时数据捕获-- 这里展示手工创建触发器的方式CREATE OR REPLACE TRIGGER orders_cdc_trigger AFTER INSERT OR UPDATE OR DELETE ON orders FOR EACH ROWBEGIN IF INSERTING THEN INSERT INTO cdc_orders VALUES ( :NEW.order_id, :NEW.customer_id, :NEW.order_date, :NEW.total_amount, \'INSERT\', SYSTIMESTAMP ); ELSIF UPDATING THEN INSERT INTO cdc_orders VALUES ( :NEW.order_id, :NEW.customer_id, :NEW.order_date, :NEW.total_amount, \'UPDATE\', SYSTIMESTAMP ); ELSIF DELETING THEN INSERT INTO cdc_orders VALUES ( :OLD.order_id, :OLD.customer_id, :OLD.order_date, :OLD.total_amount, \'DELETE\', SYSTIMESTAMP ); END IF;END;/
3.3 数据转换(Transform)
-- 创建数据清洗和转换的存储过程CREATE OR REPLACE PROCEDURE transform_customer_data ASBEGIN -- 数据清洗:去除空格、统一大小写 UPDATE staging_customers SET customer_name = TRIM(UPPER(customer_name)), email = TRIM(LOWER(email)), phone = REGEXP_REPLACE(phone, \'[^0-9]\', \'\'); -- 只保留数字 -- 数据验证:标记无效记录 UPDATE staging_customers SET data_quality_flag = \'INVALID\' WHERE customer_name IS NULL OR LENGTH(customer_name) < 2 OR email NOT LIKE \'%@%.%\' OR LENGTH(phone) NOT BETWEEN 10 AND 15; -- 业务规则转换:客户分级 UPDATE staging_customers SET customer_segment = CASE WHEN total_purchases > 100000 THEN \'VIP\' WHEN total_purchases > 50000 THEN \'PREMIUM\' WHEN total_purchases > 10000 THEN \'STANDARD\' ELSE \'BASIC\' END; -- 地址标准化 UPDATE staging_customers SET region = CASE WHEN UPPER(city) IN (\'BEIJING\', \'SHANGHAI\', \'SHENZHEN\', \'GUANGZHOU\') THEN \'TIER1\' WHEN UPPER(city) IN (\'HANGZHOU\', \'NANJING\', \'SUZHOU\', \'WUHAN\') THEN \'TIER2\' ELSE \'OTHER\' END; COMMIT;END;/-- 复杂的数据转换:销售数据聚合CREATE OR REPLACE PROCEDURE transform_sales_summary ASBEGIN -- 删除当天的汇总数据(重新计算) DELETE FROM fact_sales_daily WHERE date_key = TO_NUMBER(TO_CHAR(SYSDATE, \'YYYYMMDD\')); -- 重新计算当天汇总 INSERT INTO fact_sales_daily ( date_key, product_key, store_key, total_quantity, total_amount, total_profit, avg_unit_price, transaction_count ) SELECT TO_NUMBER(TO_CHAR(s.sale_date, \'YYYYMMDD\')) as date_key, s.product_key, s.store_key, SUM(s.quantity) as total_quantity, SUM(s.total_amount) as total_amount, SUM(s.profit_amount) as total_profit, AVG(s.unit_price) as avg_unit_price, COUNT(*) as transaction_count FROM staging_sales s WHERE TRUNC(s.sale_date) = TRUNC(SYSDATE) GROUP BY TO_NUMBER(TO_CHAR(s.sale_date, \'YYYYMMDD\')), s.product_key, s.store_key; COMMIT;END;/
3.4 数据加载(Load)
-- 维度表加载策略CREATE OR REPLACE PROCEDURE load_dim_customer AS v_count NUMBER;BEGIN -- 处理新增客户 INSERT INTO dim_customer ( customer_key, customer_id, customer_name, customer_type, segment, region, country, customer_status ) SELECT customer_seq.NEXTVAL, sc.customer_id, sc.customer_name, sc.customer_type, sc.customer_segment, sc.region, sc.country, \'ACTIVE\' FROM staging_customers sc WHERE sc.data_quality_flag != \'INVALID\' AND NOT EXISTS ( SELECT 1 FROM dim_customer dc WHERE dc.customer_id = sc.customer_id ); v_count := SQL%ROWCOUNT; DBMS_OUTPUT.PUT_LINE(\'插入新客户: \' || v_count || \' 条记录\'); -- 处理客户信息更新(SCD Type 1) UPDATE dim_customer dc SET (customer_name, customer_type, segment, region, updated_date) = ( SELECT sc.customer_name, sc.customer_type, sc.customer_segment, sc.region, SYSDATE FROM staging_customers sc WHERE sc.customer_id = dc.customer_id AND sc.data_quality_flag != \'INVALID\' ) WHERE EXISTS ( SELECT 1 FROM staging_customers sc WHERE sc.customer_id = dc.customer_id AND sc.data_quality_flag != \'INVALID\' ); v_count := SQL%ROWCOUNT; DBMS_OUTPUT.PUT_LINE(\'更新客户: \' || v_count || \' 条记录\'); COMMIT;END;/-- 事实表批量加载CREATE OR REPLACE PROCEDURE load_fact_sales ASBEGIN -- 禁用索引以提高加载性能 EXECUTE IMMEDIATE \'ALTER INDEX idx_fact_sales_time UNUSABLE\'; EXECUTE IMMEDIATE \'ALTER INDEX idx_fact_sales_product UNUSABLE\'; -- 批量插入事实数据 INSERT /*+ APPEND PARALLEL(fact_sales, 4) */ INTO fact_sales ( time_key, product_key, customer_key, store_key, transaction_id, quantity, unit_price, total_amount, discount_amount, tax_amount, profit_amount ) SELECT /*+ PARALLEL(ss, 4) */ TO_NUMBER(TO_CHAR(ss.sale_date, \'YYYYMMDD\')) as time_key, dp.product_key, dc.customer_key, ds.store_key, ss.transaction_id, ss.quantity, ss.unit_price, ss.total_amount, ss.discount_amount, ss.tax_amount, ss.total_amount - ss.cost_amount as profit_amount FROM staging_sales ss JOIN dim_product dp ON ss.product_id = dp.product_id JOIN dim_customer dc ON ss.customer_id = dc.customer_id JOIN dim_store ds ON ss.store_id = ds.store_id WHERE ss.data_quality_flag != \'INVALID\'; COMMIT; -- 重建索引 EXECUTE IMMEDIATE \'ALTER INDEX idx_fact_sales_time REBUILD PARALLEL 4\'; EXECUTE IMMEDIATE \'ALTER INDEX idx_fact_sales_product REBUILD PARALLEL 4\'; -- 收集统计信息 DBMS_STATS.GATHER_TABLE_STATS( ownname => USER, tabname => \'FACT_SALES\', estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE, cascade => TRUE, degree => 4 ); END;/
4. 物化视图优化
4.1 物化视图基础
物化视图就像是\"预制菜\",把常用的复杂查询结果提前计算好存储起来:
#mermaid-svg-uJMJoilISu3bkIWg {font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-uJMJoilISu3bkIWg .error-icon{fill:#552222;}#mermaid-svg-uJMJoilISu3bkIWg .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-uJMJoilISu3bkIWg .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-uJMJoilISu3bkIWg .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-uJMJoilISu3bkIWg .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-uJMJoilISu3bkIWg .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-uJMJoilISu3bkIWg .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-uJMJoilISu3bkIWg .marker{fill:#333333;stroke:#333333;}#mermaid-svg-uJMJoilISu3bkIWg .marker.cross{stroke:#333333;}#mermaid-svg-uJMJoilISu3bkIWg svg{font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-uJMJoilISu3bkIWg .label{font-family:\"trebuchet ms\",verdana,arial,sans-serif;color:#333;}#mermaid-svg-uJMJoilISu3bkIWg .cluster-label text{fill:#333;}#mermaid-svg-uJMJoilISu3bkIWg .cluster-label span{color:#333;}#mermaid-svg-uJMJoilISu3bkIWg .label text,#mermaid-svg-uJMJoilISu3bkIWg span{fill:#333;color:#333;}#mermaid-svg-uJMJoilISu3bkIWg .node rect,#mermaid-svg-uJMJoilISu3bkIWg .node circle,#mermaid-svg-uJMJoilISu3bkIWg .node ellipse,#mermaid-svg-uJMJoilISu3bkIWg .node polygon,#mermaid-svg-uJMJoilISu3bkIWg .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-uJMJoilISu3bkIWg .node .label{text-align:center;}#mermaid-svg-uJMJoilISu3bkIWg .node.clickable{cursor:pointer;}#mermaid-svg-uJMJoilISu3bkIWg .arrowheadPath{fill:#333333;}#mermaid-svg-uJMJoilISu3bkIWg .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-uJMJoilISu3bkIWg .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-uJMJoilISu3bkIWg .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-uJMJoilISu3bkIWg .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-uJMJoilISu3bkIWg .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-uJMJoilISu3bkIWg .cluster text{fill:#333;}#mermaid-svg-uJMJoilISu3bkIWg .cluster span{color:#333;}#mermaid-svg-uJMJoilISu3bkIWg div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-uJMJoilISu3bkIWg :root{--mermaid-font-family:\"trebuchet ms\",verdana,arial,sans-serif;} 物化视图类型 简单聚合视图 连接聚合视图 嵌套聚合视图 UNION ALL视图 SUM/COUNT/AVG 单表GROUP BY 星形连接 多表聚合 ROLLUP CUBE GROUPING SETS 分区视图 历史数据合并
4.2 聚合物化视图
-- 月度销售汇总物化视图CREATE MATERIALIZED VIEW mv_monthly_salesBUILD IMMEDIATEREFRESH FAST ON COMMITENABLE QUERY REWRITEASSELECT t.year_number, t.month_number, p.category, s.region, SUM(fs.quantity) as total_quantity, SUM(fs.total_amount) as total_sales, SUM(fs.profit_amount) as total_profit, COUNT(*) as transaction_count, COUNT(DISTINCT fs.customer_key) as unique_customersFROM fact_sales fsJOIN dim_time t ON fs.time_key = t.time_keyJOIN dim_product p ON fs.product_key = p.product_keyJOIN dim_store s ON fs.store_key = s.store_keyGROUP BY t.year_number, t.month_number, p.category, s.region;-- 创建物化视图日志(支持快速刷新)CREATE MATERIALIZED VIEW LOG ON fact_salesWITH ROWID, SEQUENCE (time_key, product_key, store_key, customer_key, quantity, total_amount, profit_amount)INCLUDING NEW VALUES;CREATE MATERIALIZED VIEW LOG ON dim_timeWITH ROWID, SEQUENCE (time_key, year_number, month_number)INCLUDING NEW VALUES;CREATE MATERIALIZED VIEW LOG ON dim_productWITH ROWID, SEQUENCE (product_key, category)INCLUDING NEW VALUES;CREATE MATERIALIZED VIEW LOG ON dim_storeWITH ROWID, SEQUENCE (store_key, region)INCLUDING NEW VALUES;
4.3 复杂聚合物化视图
-- 多维度分析物化视图(支持ROLLUP和CUBE)CREATE MATERIALIZED VIEW mv_sales_cubeBUILD IMMEDIATEREFRESH COMPLETE ON DEMANDENABLE QUERY REWRITEASSELECT t.year_number, t.quarter_number, t.month_number, p.category, p.brand, c.segment, s.region, SUM(fs.quantity) as total_quantity, SUM(fs.total_amount) as total_sales, SUM(fs.profit_amount) as total_profit, AVG(fs.unit_price) as avg_unit_price, COUNT(*) as transaction_countFROM fact_sales fsJOIN dim_time t ON fs.time_key = t.time_keyJOIN dim_product p ON fs.product_key = p.product_keyJOIN dim_customer c ON fs.customer_key = c.customer_keyJOIN dim_store s ON fs.store_key = s.store_keyGROUP BY CUBE( (t.year_number, t.quarter_number, t.month_number), (p.category, p.brand), c.segment, s.region);-- 分区物化视图(提高管理效率)CREATE MATERIALIZED VIEW mv_sales_by_quarterPARTITION BY RANGE (year_number, quarter_number) ( PARTITION p2023q1 VALUES LESS THAN (2023, 2), PARTITION p2023q2 VALUES LESS THAN (2023, 3), PARTITION p2023q3 VALUES LESS THAN (2023, 4), PARTITION p2023q4 VALUES LESS THAN (2024, 1), PARTITION p2024q1 VALUES LESS THAN (2024, 2))BUILD IMMEDIATEREFRESH FAST ON DEMANDENABLE QUERY REWRITEASSELECT t.year_number, t.quarter_number, p.category, SUM(fs.total_amount) as total_sales, COUNT(*) as transaction_countFROM fact_sales fsJOIN dim_time t ON fs.time_key = t.time_keyJOIN dim_product p ON fs.product_key = p.product_keyGROUP BY t.year_number, t.quarter_number, p.category;
4.4 物化视图刷新策略
-- 设置自动刷新作业BEGIN DBMS_REFRESH.MAKE( name => \'SALES_MV_GROUP\', list => \'MV_MONTHLY_SALES,MV_SALES_CUBE\', next_date => SYSDATE + 1/24, -- 1小时后开始 interval => \'SYSDATE + 1/24\' -- 每小时刷新一次 );END;/-- 手动刷新物化视图EXEC DBMS_MVIEW.REFRESH(\'MV_MONTHLY_SALES\', \'F\'); -- 快速刷新EXEC DBMS_MVIEW.REFRESH(\'MV_SALES_CUBE\', \'C\'); -- 完全刷新-- 监控物化视图刷新状态SELECT mview_name, last_refresh_date, refresh_method, build_mode, fast_refreshable, compile_stateFROM user_mviews;-- 查看物化视图日志统计SELECT log_table, log_owner, rowids, primary_key, object_id, filter_columnsFROM user_mview_logs;
5. 分区策略
5.1 分区类型选择
分区就像是给数据分类整理,让查询更快,管理更方便:
-- 范围分区(最常用于时间维度)CREATE TABLE fact_sales_range_partition ( sale_id NUMBER, sale_date DATE, product_id NUMBER, amount NUMBER)PARTITION BY RANGE (sale_date) ( PARTITION p2023_01 VALUES LESS THAN (DATE \'2023-02-01\'), PARTITION p2023_02 VALUES LESS THAN (DATE \'2023-03-01\'), PARTITION p2023_03 VALUES LESS THAN (DATE \'2023-04-01\'), PARTITION p2023_04 VALUES LESS THAN (DATE \'2023-05-01\'), PARTITION p_max VALUES LESS THAN (MAXVALUE));-- 列表分区(适合离散值)CREATE TABLE fact_sales_list_partition ( sale_id NUMBER, region VARCHAR2(50), product_id NUMBER, amount NUMBER)PARTITION BY LIST (region) ( PARTITION p_north VALUES (\'NORTH\', \'NORTHEAST\'), PARTITION p_south VALUES (\'SOUTH\', \'SOUTHEAST\'), PARTITION p_east VALUES (\'EAST\'), PARTITION p_west VALUES (\'WEST\', \'SOUTHWEST\'), PARTITION p_default VALUES (DEFAULT));-- 哈希分区(数据均匀分布)CREATE TABLE fact_sales_hash_partition ( sale_id NUMBER, customer_id NUMBER, product_id NUMBER, amount NUMBER)PARTITION BY HASH (customer_id) PARTITIONS 8;-- 间隔分区(自动创建新分区)CREATE TABLE fact_sales_interval_partition ( sale_id NUMBER, sale_date DATE, product_id NUMBER, amount NUMBER)PARTITION BY RANGE (sale_date)INTERVAL (NUMTOYMINTERVAL(1, \'MONTH\')) ( PARTITION p_start VALUES LESS THAN (DATE \'2023-01-01\'));
5.2 复合分区策略
-- 范围-哈希复合分区CREATE TABLE fact_sales_range_hash ( sale_id NUMBER, sale_date DATE, customer_id NUMBER, product_id NUMBER, amount NUMBER)PARTITION BY RANGE (sale_date)SUBPARTITION BY HASH (customer_id) SUBPARTITIONS 4 ( PARTITION p2023q1 VALUES LESS THAN (DATE \'2023-04-01\'), PARTITION p2023q2 VALUES LESS THAN (DATE \'2023-07-01\'), PARTITION p2023q3 VALUES LESS THAN (DATE \'2023-10-01\'), PARTITION p2023q4 VALUES LESS THAN (DATE \'2024-01-01\'));-- 范围-列表复合分区CREATE TABLE fact_sales_range_list ( sale_id NUMBER, sale_date DATE, region VARCHAR2(50), product_id NUMBER, amount NUMBER)PARTITION BY RANGE (sale_date)SUBPARTITION BY LIST (region) ( PARTITION p2023 VALUES LESS THAN (DATE \'2024-01-01\') ( SUBPARTITION p2023_north VALUES (\'NORTH\'), SUBPARTITION p2023_south VALUES (\'SOUTH\'), SUBPARTITION p2023_east VALUES (\'EAST\'), SUBPARTITION p2023_west VALUES (\'WEST\') ), PARTITION p2024 VALUES LESS THAN (DATE \'2025-01-01\') ( SUBPARTITION p2024_north VALUES (\'NORTH\'), SUBPARTITION p2024_south VALUES (\'SOUTH\'), SUBPARTITION p2024_east VALUES (\'EAST\'), SUBPARTITION p2024_west VALUES (\'WEST\') ));
5.3 分区维护操作
-- 添加新分区ALTER TABLE fact_sales_range_partitionADD PARTITION p2023_05 VALUES LESS THAN (DATE \'2023-06-01\');-- 删除旧分区(数据归档)ALTER TABLE fact_sales_range_partitionDROP PARTITION p2023_01;-- 截断分区(清空数据但保留结构)ALTER TABLE fact_sales_range_partitionTRUNCATE PARTITION p2023_02;-- 分区交换(快速数据迁移)-- 创建临时表CREATE TABLE temp_partition_data ASSELECT * FROM fact_sales_range_partition WHERE 1=0;-- 加载数据到临时表INSERT INTO temp_partition_data SELECT * FROM external_data_source;-- 交换分区ALTER TABLE fact_sales_range_partitionEXCHANGE PARTITION p2023_03 WITH TABLE temp_partition_data;-- 分区合并ALTER TABLE fact_sales_range_partitionMERGE PARTITIONS p2023_01, p2023_02 INTO PARTITION p2023_q1;-- 分区拆分ALTER TABLE fact_sales_range_partitionSPLIT PARTITION p_max AT (DATE \'2024-01-01\')INTO (PARTITION p2023_12, PARTITION p_max);
6. 索引策略
6.1 数据仓库索引设计
数据仓库的索引策略与OLTP系统有很大不同:
#mermaid-svg-w80pGbIrXVByJ2My {font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-w80pGbIrXVByJ2My .error-icon{fill:#552222;}#mermaid-svg-w80pGbIrXVByJ2My .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-w80pGbIrXVByJ2My .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-w80pGbIrXVByJ2My .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-w80pGbIrXVByJ2My .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-w80pGbIrXVByJ2My .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-w80pGbIrXVByJ2My .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-w80pGbIrXVByJ2My .marker{fill:#333333;stroke:#333333;}#mermaid-svg-w80pGbIrXVByJ2My .marker.cross{stroke:#333333;}#mermaid-svg-w80pGbIrXVByJ2My svg{font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-w80pGbIrXVByJ2My .label{font-family:\"trebuchet ms\",verdana,arial,sans-serif;color:#333;}#mermaid-svg-w80pGbIrXVByJ2My .cluster-label text{fill:#333;}#mermaid-svg-w80pGbIrXVByJ2My .cluster-label span{color:#333;}#mermaid-svg-w80pGbIrXVByJ2My .label text,#mermaid-svg-w80pGbIrXVByJ2My span{fill:#333;color:#333;}#mermaid-svg-w80pGbIrXVByJ2My .node rect,#mermaid-svg-w80pGbIrXVByJ2My .node circle,#mermaid-svg-w80pGbIrXVByJ2My .node ellipse,#mermaid-svg-w80pGbIrXVByJ2My .node polygon,#mermaid-svg-w80pGbIrXVByJ2My .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-w80pGbIrXVByJ2My .node .label{text-align:center;}#mermaid-svg-w80pGbIrXVByJ2My .node.clickable{cursor:pointer;}#mermaid-svg-w80pGbIrXVByJ2My .arrowheadPath{fill:#333333;}#mermaid-svg-w80pGbIrXVByJ2My .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-w80pGbIrXVByJ2My .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-w80pGbIrXVByJ2My .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-w80pGbIrXVByJ2My .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-w80pGbIrXVByJ2My .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-w80pGbIrXVByJ2My .cluster text{fill:#333;}#mermaid-svg-w80pGbIrXVByJ2My .cluster span{color:#333;}#mermaid-svg-w80pGbIrXVByJ2My div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-w80pGbIrXVByJ2My :root{--mermaid-font-family:\"trebuchet ms\",verdana,arial,sans-serif;} 数据仓库索引策略 位图索引 B树索引 函数索引 分区索引 低基数列 AND/OR查询优化 计数查询优化 高基数列 范围查询 排序操作 计算列索引 表达式索引 本地分区索引 全局分区索引
6.2 位图索引应用
-- 为低基数列创建位图索引CREATE BITMAP INDEX bmp_idx_product_category ON fact_sales (product_key) LOCAL; -- 分区表使用本地索引CREATE BITMAP INDEX bmp_idx_customer_segment ON fact_sales (customer_key)LOCAL;CREATE BITMAP INDEX bmp_idx_time_quarter ON fact_sales (time_key)LOCAL;-- 位图连接索引(直接索引维度属性)CREATE BITMAP INDEX bmp_idx_product_category_joinON fact_sales (p.category)FROM fact_sales fs, dim_product pWHERE fs.product_key = p.product_keyLOCAL;-- 查看位图索引使用情况SELECT fs.product_key, COUNT(*) as sales_count, SUM(fs.total_amount) as total_salesFROM fact_sales fsWHERE fs.time_key BETWEEN 20230101 AND 20230331 -- 使用位图索引 AND fs.product_key IN ( SELECT product_key FROM dim_product WHERE category = \'ELECTRONICS\' -- 位图连接索引 )GROUP BY fs.product_key;
6.3 函数索引优化
-- 为常用的日期计算创建函数索引CREATE INDEX idx_sales_year_month ON fact_sales (EXTRACT(YEAR FROM TO_DATE(time_key, \'YYYYMMDD\')), EXTRACT(MONTH FROM TO_DATE(time_key, \'YYYYMMDD\')));-- 为金额范围查询创建函数索引CREATE INDEX idx_sales_amount_range ON fact_sales ( CASE WHEN total_amount < 100 THEN \'LOW\' WHEN total_amount < 1000 THEN \'MEDIUM\' ELSE \'HIGH\' END);-- 为复杂计算创建函数索引CREATE INDEX idx_profit_margin ON fact_sales ((profit_amount / total_amount * 100));-- 这些索引支持相应的查询SELECT product_key, SUM(total_amount)FROM fact_salesWHERE EXTRACT(YEAR FROM TO_DATE(time_key, \'YYYYMMDD\')) = 2023 AND EXTRACT(MONTH FROM TO_DATE(time_key, \'YYYYMMDD\')) = 6GROUP BY product_key;SELECT COUNT(*)FROM fact_salesWHERE (profit_amount / total_amount * 100) > 20;
7. 查询优化
7.1 星形转换优化
星形转换是Oracle专门为数据仓库设计的优化技术:
-- 启用星形转换ALTER SESSION SET star_transformation_enabled = TRUE;-- 典型的星形查询SELECT /*+ STAR_TRANSFORMATION */ p.category, t.year_number, s.region, SUM(fs.total_amount) as total_sales, COUNT(*) as transaction_countFROM fact_sales fsJOIN dim_product p ON fs.product_key = p.product_keyJOIN dim_time t ON fs.time_key = t.time_keyJOIN dim_store s ON fs.store_key = s.store_keyWHERE p.category IN (\'ELECTRONICS\', \'CLOTHING\') AND t.year_number = 2023 AND s.region = \'NORTH\'GROUP BY p.category, t.year_number, s.region;-- 查看执行计划EXPLAIN PLAN FORSELECT p.category, SUM(fs.total_amount)FROM fact_sales fsJOIN dim_product p ON fs.product_key = p.product_keyWHERE p.category = \'ELECTRONICS\';SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);
7.2 并行查询优化
-- 设置并行度ALTER TABLE fact_sales PARALLEL 8;ALTER TABLE dim_product PARALLEL 4;-- 使用并行查询SELECT /*+ PARALLEL(fs, 8) PARALLEL(p, 4) */ p.category, SUM(fs.total_amount) as total_sales, AVG(fs.unit_price) as avg_priceFROM fact_sales fsJOIN dim_product p ON fs.product_key = p.product_keyWHERE fs.time_key BETWEEN 20230101 AND 20231231GROUP BY p.category;-- 并行DML操作ALTER SESSION ENABLE PARALLEL DML;INSERT /*+ PARALLEL(fact_sales_summary, 4) */ INTO fact_sales_summarySELECT /*+ PARALLEL(fs, 8) */ time_key, product_key, SUM(quantity) as total_quantity, SUM(total_amount) as total_amountFROM fact_sales fsWHERE time_key BETWEEN 20230101 AND 20230131GROUP BY time_key, product_key;-- 监控并行执行SELECT px_servers_requested, px_servers_allocated, px_qcsid, px_server_group, px_server_setFROM v$px_sessionWHERE px_qcsid = (SELECT sid FROM v$session WHERE audsid = USERENV(\'SESSIONID\'));
7.3 分区消除优化
-- 分区消除示例SELECT product_key, SUM(total_amount) as monthly_salesFROM fact_salesWHERE time_key BETWEEN 20230601 AND 20230630 -- 只访问6月分区GROUP BY product_key;-- 动态分区消除SELECT p.category, SUM(fs.total_amount) as total_salesFROM fact_sales fsJOIN dim_product p ON fs.product_key = p.product_keyJOIN dim_time t ON fs.time_key = t.time_keyWHERE t.year_number = :year_param -- 根据参数动态消除分区 AND t.quarter_number = :quarter_paramGROUP BY p.category;-- 查看分区消除效果SELECT sql_id, plan_hash_value, executions, px_servers_executions, elapsed_time, cpu_timeFROM v$sqlWHERE sql_text LIKE \'%fact_sales%\'ORDER BY elapsed_time DESC;
8. 数据压缩
8.1 表压缩策略
Oracle提供多种压缩技术来节省存储空间:
-- 基本表压缩CREATE TABLE fact_sales_compressed ( time_key NUMBER, product_key NUMBER, customer_key NUMBER, total_amount NUMBER) COMPRESS FOR OLTP;-- 高级压缩(需要Advanced Compression选项)CREATE TABLE fact_sales_advanced_compressed ( time_key NUMBER, product_key NUMBER, customer_key NUMBER, total_amount NUMBER) COMPRESS FOR QUERY LOW;-- 混合列压缩(Exadata环境)CREATE TABLE fact_sales_hybrid_compressed ( time_key NUMBER, product_key NUMBER, customer_key NUMBER, total_amount NUMBER) COMPRESS FOR QUERY HIGH;-- 为现有表启用压缩ALTER TABLE fact_sales COMPRESS FOR QUERY LOW;-- 压缩特定分区ALTER TABLE fact_sales MODIFY PARTITION p2023q1 COMPRESS FOR QUERY HIGH;
8.2 索引压缩
-- B树索引压缩CREATE INDEX idx_fact_sales_compressed ON fact_sales (time_key, product_key) COMPRESS 2;-- 重建现有索引并压缩ALTER INDEX idx_fact_sales_time REBUILD COMPRESS;-- 查看压缩效果SELECT table_name, compression, compress_for, num_rows, blocks, avg_row_lenFROM user_tablesWHERE table_name LIKE \'FACT_SALES%\';
8.3 压缩效果监控
-- 创建压缩监控脚本CREATE OR REPLACE PROCEDURE monitor_compression AS v_uncompressed_size NUMBER; v_compressed_size NUMBER; v_compression_ratio NUMBER;BEGIN -- 获取压缩前大小 SELECT SUM(bytes)/1024/1024 INTO v_uncompressed_size FROM user_segments WHERE segment_name = \'FACT_SALES_UNCOMPRESSED\'; -- 获取压缩后大小 SELECT SUM(bytes)/1024/1024 INTO v_compressed_size FROM user_segments WHERE segment_name = \'FACT_SALES_COMPRESSED\'; -- 计算压缩比 v_compression_ratio := v_uncompressed_size / v_compressed_size; -- 记录结果 INSERT INTO compression_log VALUES ( SYSDATE, \'FACT_SALES\', v_uncompressed_size, v_compressed_size, v_compression_ratio ); COMMIT;END;/
9. 实时数据仓库
9.1 实时ETL架构
现代数据仓库需要支持实时或近实时的数据处理:
#mermaid-svg-saDLZ7kzMlIhhGEY {font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;fill:#333;}#mermaid-svg-saDLZ7kzMlIhhGEY .error-icon{fill:#552222;}#mermaid-svg-saDLZ7kzMlIhhGEY .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-saDLZ7kzMlIhhGEY .edge-thickness-normal{stroke-width:2px;}#mermaid-svg-saDLZ7kzMlIhhGEY .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-saDLZ7kzMlIhhGEY .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-saDLZ7kzMlIhhGEY .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-saDLZ7kzMlIhhGEY .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-saDLZ7kzMlIhhGEY .marker{fill:#333333;stroke:#333333;}#mermaid-svg-saDLZ7kzMlIhhGEY .marker.cross{stroke:#333333;}#mermaid-svg-saDLZ7kzMlIhhGEY svg{font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-saDLZ7kzMlIhhGEY .label{font-family:\"trebuchet ms\",verdana,arial,sans-serif;color:#333;}#mermaid-svg-saDLZ7kzMlIhhGEY .cluster-label text{fill:#333;}#mermaid-svg-saDLZ7kzMlIhhGEY .cluster-label span{color:#333;}#mermaid-svg-saDLZ7kzMlIhhGEY .label text,#mermaid-svg-saDLZ7kzMlIhhGEY span{fill:#333;color:#333;}#mermaid-svg-saDLZ7kzMlIhhGEY .node rect,#mermaid-svg-saDLZ7kzMlIhhGEY .node circle,#mermaid-svg-saDLZ7kzMlIhhGEY .node ellipse,#mermaid-svg-saDLZ7kzMlIhhGEY .node polygon,#mermaid-svg-saDLZ7kzMlIhhGEY .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-saDLZ7kzMlIhhGEY .node .label{text-align:center;}#mermaid-svg-saDLZ7kzMlIhhGEY .node.clickable{cursor:pointer;}#mermaid-svg-saDLZ7kzMlIhhGEY .arrowheadPath{fill:#333333;}#mermaid-svg-saDLZ7kzMlIhhGEY .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-saDLZ7kzMlIhhGEY .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-saDLZ7kzMlIhhGEY .edgeLabel{background-color:#e8e8e8;text-align:center;}#mermaid-svg-saDLZ7kzMlIhhGEY .edgeLabel rect{opacity:0.5;background-color:#e8e8e8;fill:#e8e8e8;}#mermaid-svg-saDLZ7kzMlIhhGEY .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-saDLZ7kzMlIhhGEY .cluster text{fill:#333;}#mermaid-svg-saDLZ7kzMlIhhGEY .cluster span{color:#333;}#mermaid-svg-saDLZ7kzMlIhhGEY div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:\"trebuchet ms\",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-saDLZ7kzMlIhhGEY :root{--mermaid-font-family:\"trebuchet ms\",verdana,arial,sans-serif;} 实时数据源 消息队列 流处理引擎 实时数据仓库 实时分析 在线交易系统 物联网设备 日志文件 API接口 Oracle Streams Kafka OGG Oracle GoldenGate Stream Analytics Spark Streaming 实时事实表 实时汇总表 缓存层 实时仪表板 告警系统 实时报表
9.2 Oracle GoldenGate集成
-- 配置GoldenGate复制-- 在源端创建补充日志ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;ALTER TABLE orders ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;-- 创建GoldenGate用户CREATE USER ggs_user IDENTIFIED BY ggs_password;GRANT DBA TO ggs_user;GRANT EXECUTE ON DBMS_FLASHBACK TO ggs_user;-- 在目标端创建实时加载表CREATE TABLE real_time_sales ( sale_id NUMBER, customer_id NUMBER, product_id NUMBER, sale_amount NUMBER, sale_timestamp TIMESTAMP, operation_type VARCHAR2(10), processed_flag CHAR(1) DEFAULT \'N\');-- 创建实时处理存储过程CREATE OR REPLACE PROCEDURE process_real_time_sales AS CURSOR c_new_sales IS SELECT * FROM real_time_sales WHERE processed_flag = \'N\' ORDER BY sale_timestamp;BEGIN FOR sale_rec IN c_new_sales LOOP -- 更新实时汇总表 MERGE INTO real_time_sales_summary rss USING (SELECT TO_CHAR(sale_rec.sale_timestamp, \'YYYYMMDDHH24\') as hour_key, sale_rec.product_id, sale_rec.sale_amount FROM DUAL) src ON (rss.hour_key = src.hour_key AND rss.product_id = src.product_id) WHEN MATCHED THEN UPDATE SET total_amount = total_amount + src.sale_amount, transaction_count = transaction_count + 1, last_updated = SYSTIMESTAMP WHEN NOT MATCHED THEN INSERT (hour_key, product_id, total_amount, transaction_count, last_updated) VALUES (src.hour_key, src.product_id, src.sale_amount, 1, SYSTIMESTAMP); -- 标记为已处理 UPDATE real_time_sales SET processed_flag = \'Y\' WHERE sale_id = sale_rec.sale_id; -- 每1000条提交一次 IF MOD(c_new_sales%ROWCOUNT, 1000) = 0 THEN COMMIT; END IF; END LOOP; COMMIT;END;/
9.3 Lambda架构实现
-- 批处理层(传统数据仓库)CREATE TABLE batch_sales_summary ( date_key NUMBER, product_key NUMBER, total_amount NUMBER, transaction_count NUMBER, batch_timestamp TIMESTAMP);-- 实时处理层(速度层)CREATE TABLE speed_sales_summary ( hour_key NUMBER, product_key NUMBER, total_amount NUMBER, transaction_count NUMBER, real_time_timestamp TIMESTAMP);-- 服务层(合并批处理和实时结果)CREATE OR REPLACE VIEW unified_sales_summary ASSELECT TRUNC(hour_key/100) as date_key, product_key, SUM(total_amount) as total_amount, SUM(transaction_count) as transaction_count, \'REAL_TIME\' as data_sourceFROM speed_sales_summaryWHERE hour_key > (SELECT MAX(date_key) * 100 FROM batch_sales_summary)GROUP BY TRUNC(hour_key/100), product_keyUNION ALLSELECT date_key, product_key, total_amount, transaction_count, \'BATCH\' as data_sourceFROM batch_sales_summary;
10. 数据仓库治理
10.1 数据质量管理
数据质量是数据仓库成功的关键因素:
-- 创建数据质量规则表CREATE TABLE data_quality_rules ( rule_id NUMBER PRIMARY KEY, table_name VARCHAR2(100), column_name VARCHAR2(100), rule_type VARCHAR2(50), -- NOT_NULL, RANGE, FORMAT, REFERENCE rule_expression VARCHAR2(1000), rule_description VARCHAR2(500), is_active CHAR(1) DEFAULT \'Y\', created_date DATE DEFAULT SYSDATE);-- 插入数据质量规则INSERT INTO data_quality_rules VALUES (1, \'FACT_SALES\', \'TOTAL_AMOUNT\', \'RANGE\', \'total_amount > 0\', \'销售金额必须大于0\', \'Y\', SYSDATE);INSERT INTO data_quality_rules VALUES (2, \'DIM_CUSTOMER\', \'EMAIL\', \'FORMAT\', \'email LIKE \'\'%@%.%\'\'\', \'邮箱格式验证\', \'Y\', SYSDATE);INSERT INTO data_quality_rules VALUES (3, \'FACT_SALES\', \'CUSTOMER_KEY\', \'REFERENCE\', \'EXISTS (SELECT 1 FROM dim_customer WHERE customer_key = ?)\', \'客户外键验证\', \'Y\', SYSDATE);-- 数据质量检查存储过程CREATE OR REPLACE PROCEDURE check_data_quality(p_table_name VARCHAR2) AS v_sql VARCHAR2(4000); v_error_count NUMBER; v_total_count NUMBER;BEGIN -- 获取表的总记录数 v_sql := \'SELECT COUNT(*) FROM \' || p_table_name; EXECUTE IMMEDIATE v_sql INTO v_total_count; -- 检查每个质量规则 FOR rule_rec IN ( SELECT * FROM data_quality_rules WHERE table_name = p_table_name AND is_active = \'Y\' ) LOOP IF rule_rec.rule_type = \'NOT_NULL\' THEN v_sql := \'SELECT COUNT(*) FROM \' || p_table_name ||\' WHERE \' || rule_rec.column_name || \' IS NULL\'; ELSIF rule_rec.rule_type = \'RANGE\' THEN v_sql := \'SELECT COUNT(*) FROM \' || p_table_name ||\' WHERE NOT (\' || rule_rec.rule_expression || \')\'; ELSIF rule_rec.rule_type = \'FORMAT\' THEN v_sql := \'SELECT COUNT(*) FROM \' || p_table_name ||\' WHERE NOT (\' || rule_rec.rule_expression || \')\'; END IF; EXECUTE IMMEDIATE v_sql INTO v_error_count; -- 记录检查结果 INSERT INTO data_quality_log VALUES ( quality_log_seq.NEXTVAL, SYSDATE, p_table_name, rule_rec.rule_id, v_total_count, v_error_count, ROUND((v_total_count - v_error_count) / v_total_count * 100, 2) ); END LOOP; COMMIT;END;/
10.2 元数据管理
-- 创建元数据管理表CREATE TABLE metadata_tables ( table_id NUMBER PRIMARY KEY, table_name VARCHAR2(100) NOT NULL, table_type VARCHAR2(20), -- FACT, DIMENSION, STAGING business_description VARCHAR2(1000), data_source VARCHAR2(200), refresh_frequency VARCHAR2(100), data_owner VARCHAR2(100), created_date DATE DEFAULT SYSDATE);CREATE TABLE metadata_columns ( column_id NUMBER PRIMARY KEY, table_id NUMBER, column_name VARCHAR2(100), data_type VARCHAR2(50), is_nullable CHAR(1), business_description VARCHAR2(500), data_format VARCHAR2(100), FOREIGN KEY (table_id) REFERENCES metadata_tables(table_id));-- 插入元数据信息INSERT INTO metadata_tables VALUES ( 1, \'FACT_SALES\', \'FACT\', \'销售事实表,记录每笔销售交易的详细信息\', \'CRM系统、POS系统\', \'每日凌晨2点\', \'销售部门\', SYSDATE);INSERT INTO metadata_columns VALUES ( 1, 1, \'TIME_KEY\', \'NUMBER\', \'N\', \'时间维度键,格式YYYYMMDD\', \'YYYYMMDD\');-- 自动收集表统计元数据CREATE OR REPLACE PROCEDURE collect_table_metadata ASBEGIN -- 收集表级元数据 INSERT INTO table_statistics_log ( log_date, table_name, num_rows, num_blocks, avg_row_len ) SELECT SYSDATE, table_name, num_rows, blocks, avg_row_len FROM user_tables WHERE table_name LIKE \'FACT_%\' OR table_name LIKE \'DIM_%\'; -- 收集列级统计 INSERT INTO column_statistics_log ( log_date, table_name, column_name, num_distinct, density ) SELECT SYSDATE, table_name, column_name, num_distinct, density FROM user_tab_columns WHERE table_name LIKE \'FACT_%\' OR table_name LIKE \'DIM_%\'; COMMIT;END;/
10.3 数据血缘追踪
-- 创建数据血缘跟踪表CREATE TABLE data_lineage ( lineage_id NUMBER PRIMARY KEY, source_table VARCHAR2(100), source_column VARCHAR2(100), target_table VARCHAR2(100), target_column VARCHAR2(100), transformation_rule VARCHAR2(1000), etl_job_name VARCHAR2(200), created_date DATE DEFAULT SYSDATE);-- 记录数据血缘关系INSERT INTO data_lineage VALUES ( lineage_seq.NEXTVAL, \'ORDERS\', \'ORDER_AMOUNT\', \'FACT_SALES\', \'TOTAL_AMOUNT\', \'SUM(order_amount) GROUP BY date, product\', \'DAILY_SALES_ETL\', SYSDATE);-- 查询数据血缘的视图CREATE OR REPLACE VIEW v_data_lineage ASSELECT dl.source_table || \'.\' || dl.source_column as source, dl.target_table || \'.\' || dl.target_column as target, dl.transformation_rule, dl.etl_job_name, mt_source.business_description as source_description, mt_target.business_description as target_descriptionFROM data_lineage dlLEFT JOIN metadata_tables mt_source ON dl.source_table = mt_source.table_nameLEFT JOIN metadata_tables mt_target ON dl.target_table = mt_target.table_name;-- 影响分析:查找某个表的所有下游依赖CREATE OR REPLACE FUNCTION get_downstream_tables(p_table_name VARCHAR2)RETURN SYS_REFCURSOR AS v_cursor SYS_REFCURSOR;BEGIN OPEN v_cursor FOR WITH downstream_tables AS ( SELECT target_table, 1 as level_num FROM data_lineage WHERE source_table = p_table_name UNION ALL SELECT dl.target_table, dt.level_num + 1 FROM data_lineage dl JOIN downstream_tables dt ON dl.source_table = dt.target_table WHERE dt.level_num < 10 -- 防止无限递归 ) SELECT DISTINCT target_table, MIN(level_num) as min_level FROM downstream_tables GROUP BY target_table ORDER BY min_level, target_table; RETURN v_cursor;END;/
这份Oracle数据仓库指南涵盖了从基础概念到高级应用的全方位内容。希望这个完整的指南能帮助你更好地理解和实施Oracle数据仓库项目!
结语
感谢您的阅读!期待您的一键三连!欢迎指正!