MySQL 分组取数高效之道:时间、最值与相邻日期处理实战方案
引言
本文整合了MySQL分组查询的三大核心场景:分组取时间最近数据、分组取组内某字段最大值、相邻日期处理,结合权威资料与实战案例,提供一站式解决方案。
一、分组取时间最近数据
1. 使用 NOT EXISTS
方法
SELECT am1.Id, am1.AccountId, am1.Mark, am1.CreateTimeFROM AccountMark am1WHERE NOT EXISTS ( SELECT 1 FROM AccountMark am2 WHERE am2.AccountId = am1.AccountId AND am2.CreateTime > am1.CreateTime);
原理:通过子查询排除当前分组中时间更晚的记录,确保返回每组最新数据。
2. 使用 COUNT
方法
SELECT am.Id, am.AccountId, am.Mark, am.CreateTimeFROM AccountMark amWHERE ( SELECT COUNT(*) FROM AccountMark am2 WHERE am2.AccountId = am.AccountId AND am2.CreateTime >= am.CreateTime) < 2;
优化点:通过统计当前分组中时间大于等于当前记录的数量,确保返回最新记录。
二、分组取组内某字段最大值
1. GROUP BY + MAX函数
SELECT customer_id, MAX(order_amount) AS max_amountFROM ordersGROUP BY customer_id;
扩展应用:若需关联其他字段,可通过子查询关联原表:
SELECT o.customer_id, o.order_amount, o.order_dateFROM orders oJOIN ( SELECT customer_id, MAX(order_amount) AS max_amount FROM orders GROUP BY customer_id) m ON o.customer_id = m.customer_id AND o.order_amount = m.max_amount;
2. 子查询关联法
WITH class_teacher_count AS ( SELECT class_id, teacher_id, COUNT(*) AS student_count FROM students GROUP BY class_id, teacher_id),max_student_count_per_class AS ( SELECT class_id, MAX(student_count) AS max_student_count FROM class_teacher_count GROUP BY class_id)SELECT ctc.class_id, ctc.teacher_id, ctc.student_count, t.teacher_nameFROM class_teacher_count ctcJOIN max_student_count_per_class mscc ON ctc.class_id = mscc.class_id AND ctc.student_count = mscc.max_student_countJOIN teachers t ON ctc.teacher_id = t.teacher_id;
三、相邻日期处理
1. 获取前一天数据(LAG函数)
SELECT date, sales, LAG(sales, 1) OVER (ORDER BY date) AS previous_day_salesFROM sales_data;
适用版本:MySQL 8.0+
2. 获取连续日期范围
SELECT MIN(date) AS start_date, MAX(date) AS end_date, TIMESTAMPDIFF(DAY, MIN(date), MAX(date)) + 1 AS total_daysFROM sales_dataGROUP BY user_id;
3. 查找缺失日期
WITH RECURSIVE dates AS ( SELECT \'2023-01-01\' AS date UNION ALL SELECT DATE_ADD(date, INTERVAL 1 DAY) FROM dates WHERE date < \'2023-01-31\')SELECT d.dateFROM dates dLEFT JOIN sales_data s ON d.date = s.dateWHERE s.date IS NULL;
四、综合案例:分组取最大值并关联相邻日期
WITH grouped_data AS ( SELECT user_id, MAX(sales) AS max_sales, DATE(MAX(order_date)) AS max_date FROM sales_data GROUP BY user_id)SELECT g.user_id, g.max_sales, g.max_date, LAG(g.max_sales, 1) OVER (ORDER BY g.user_id) AS previous_maxFROM grouped_data g;
五、性能优化指南
- 索引策略:
CREATE INDEX idx_account_time ON AccountMark(AccountId, CreateTime);CREATE INDEX idx_sales_date ON sales_data(date);
- 覆盖索引:
CREATE INDEX idx_covering ON sales_data(user_id, date, sales);
- 避免全表扫描:
-- 优化前SELECT * FROM sales_data WHERE date BETWEEN \'2023-01-01\' AND \'2023-01-31\';-- 优化后SELECT date, sales FROM sales_data WHERE date BETWEEN \'2023-01-01\' AND \'2023-01-31\';
权威资料参考
- MySQL窗口函数官方文档
- MySQL递归CTE详解
总结
本文完整覆盖了MySQL分组查询的三大核心场景,提供从基础到高级的解决方案。通过合理运用NOT EXISTS
、LAG
、WITH RECURSIVE
等语法,结合索引优化策略,可高效解决业务中复杂的数据处理需求。建议根据实际场景选择最适合的方案,并定期进行性能监控与优化。