题目描述

(通过次数11,578 | 提交次数28,075,通过率41.24%)

Traffic表:
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| activity | enum |
| activity_date | date |
+---------------+---------+
该表没有主键,它可能有重复的行。
activity 列是 ENUM 类型,可能取 ('login', 'logout', 'jobs', 'groups', 'homepage') 几个值之一。
编写一个 SQL 查询,以查询从今天起最多 90 天内,每个日期该日期首次登录的用户数。假设今天是2019-06-30.
查询结果格式如下例所示:
Traffic 表:
+---------+----------+---------------+
| user_id | activity | activity_date |
+---------+----------+---------------+
| 1 | login | 2019-05-01 |
| 1 | homepage | 2019-05-01 |
| 1 | logout | 2019-05-01 |
| 2 | login | 2019-06-21 |
| 2 | logout | 2019-06-21 |
| 3 | login | 2019-01-01 |
| 3 | jobs | 2019-01-01 |
| 3 | logout | 2019-01-01 |
| 4 | login | 2019-06-21 |
| 4 | groups | 2019-06-21 |
| 4 | logout | 2019-06-21 |
| 5 | login | 2019-03-01 |
| 5 | logout | 2019-03-01 |
| 5 | login | 2019-06-21 |
| 5 | logout | 2019-06-21 |
+---------+----------+---------------+
Result 表:
+------------+-------------+
| login_date | user_count |
+------------+-------------+
| 2019-05-01 | 1 |
| 2019-06-21 | 2 |
+------------+-------------+
请注意,我们只关心用户数非零的日期.
ID5 的用户第一次登陆于 2019-03-01,因此他不算在 2019-06-21 的的统计内。
来源:力扣(LeetCode)
链接:https://leetcode.cn/problems/new-users-daily-count
Traffic表: +---------------+---------+ | Column Name | Type | +---------------+---------+ | user_id | int | | activity | enum | | activity_date | date | +---------------+---------+ 该表没有主键,它可能有重复的行。 activity 列是 ENUM 类型,可能取 ('login', 'logout', 'jobs', 'groups', 'homepage') 几个值之一。 编写一个 SQL 查询,以查询从今天起最多 90 天内,每个日期该日期首次登录的用户数。假设今天是2019-06-30. 查询结果格式如下例所示: Traffic 表: +---------+----------+---------------+ | user_id | activity | activity_date | +---------+----------+---------------+ | 1 | login | 2019-05-01 | | 1 | homepage | 2019-05-01 | | 1 | logout | 2019-05-01 | | 2 | login | 2019-06-21 | | 2 | logout | 2019-06-21 | | 3 | login | 2019-01-01 | | 3 | jobs | 2019-01-01 | | 3 | logout | 2019-01-01 | | 4 | login | 2019-06-21 | | 4 | groups | 2019-06-21 | | 4 | logout | 2019-06-21 | | 5 | login | 2019-03-01 | | 5 | logout | 2019-03-01 | | 5 | login | 2019-06-21 | | 5 | logout | 2019-06-21 | +---------+----------+---------------+ Result 表: +------------+-------------+ | login_date | user_count | +------------+-------------+ | 2019-05-01 | 1 | | 2019-06-21 | 2 | +------------+-------------+ 请注意,我们只关心用户数非零的日期. ID 为 5 的用户第一次登陆于 2019-03-01,因此他不算在 2019-06-21 的的统计内。 来源:力扣(LeetCode) 链接:https://leetcode.cn/problems/new-users-daily-count
Traffic表:
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| activity      | enum    |
| activity_date | date    |
+---------------+---------+
该表没有主键,它可能有重复的行。
activity 列是 ENUM 类型,可能取 ('login', 'logout', 'jobs', 'groups', 'homepage') 几个值之一。

编写一个 SQL 查询,以查询从今天起最多 90 天内,每个日期该日期首次登录的用户数。假设今天是2019-06-30.

查询结果格式如下例所示:
Traffic 表:
+---------+----------+---------------+
| user_id | activity | activity_date |
+---------+----------+---------------+
| 1       | login    | 2019-05-01    |
| 1       | homepage | 2019-05-01    |
| 1       | logout   | 2019-05-01    |
| 2       | login    | 2019-06-21    |
| 2       | logout   | 2019-06-21    |
| 3       | login    | 2019-01-01    |
| 3       | jobs     | 2019-01-01    |
| 3       | logout   | 2019-01-01    |
| 4       | login    | 2019-06-21    |
| 4       | groups   | 2019-06-21    |
| 4       | logout   | 2019-06-21    |
| 5       | login    | 2019-03-01    |
| 5       | logout   | 2019-03-01    |
| 5       | login    | 2019-06-21    |
| 5       | logout   | 2019-06-21    |
+---------+----------+---------------+

Result 表:
+------------+-------------+
| login_date | user_count  |
+------------+-------------+
| 2019-05-01 | 1           |
| 2019-06-21 | 2           |
+------------+-------------+
请注意,我们只关心用户数非零的日期.
ID 为 5 的用户第一次登陆于 2019-03-01,因此他不算在 2019-06-21 的的统计内。

来源:力扣(LeetCode)
链接:https://leetcode.cn/problems/new-users-daily-count
//测试数据
Create table If Not Exists Traffic (user_id int, activity ENUM('login', 'logout', 'jobs', 'groups', 'homepage'), activity_date date);
insert into Traffic (user_id, activity, activity_date) values ('1', 'login', '2019-05-01');
insert into Traffic (user_id, activity, activity_date) values ('1', 'homepage', '2019-05-01');
insert into Traffic (user_id, activity, activity_date) values ('1', 'logout', '2019-05-01');
insert into Traffic (user_id, activity, activity_date) values ('2', 'login', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('2', 'logout', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('3', 'login', '2019-01-01');
insert into Traffic (user_id, activity, activity_date) values ('3', 'jobs', '2019-01-01');
insert into Traffic (user_id, activity, activity_date) values ('3', 'logout', '2019-01-01');
insert into Traffic (user_id, activity, activity_date) values ('4', 'login', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('4', 'groups', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('4', 'logout', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('5', 'login', '2019-03-01');
insert into Traffic (user_id, activity, activity_date) values ('5', 'logout', '2019-03-01');
insert into Traffic (user_id, activity, activity_date) values ('5', 'login', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('5', 'logout', '2019-06-21');
//测试数据 Create table If Not Exists Traffic (user_id int, activity ENUM('login', 'logout', 'jobs', 'groups', 'homepage'), activity_date date); insert into Traffic (user_id, activity, activity_date) values ('1', 'login', '2019-05-01'); insert into Traffic (user_id, activity, activity_date) values ('1', 'homepage', '2019-05-01'); insert into Traffic (user_id, activity, activity_date) values ('1', 'logout', '2019-05-01'); insert into Traffic (user_id, activity, activity_date) values ('2', 'login', '2019-06-21'); insert into Traffic (user_id, activity, activity_date) values ('2', 'logout', '2019-06-21'); insert into Traffic (user_id, activity, activity_date) values ('3', 'login', '2019-01-01'); insert into Traffic (user_id, activity, activity_date) values ('3', 'jobs', '2019-01-01'); insert into Traffic (user_id, activity, activity_date) values ('3', 'logout', '2019-01-01'); insert into Traffic (user_id, activity, activity_date) values ('4', 'login', '2019-06-21'); insert into Traffic (user_id, activity, activity_date) values ('4', 'groups', '2019-06-21'); insert into Traffic (user_id, activity, activity_date) values ('4', 'logout', '2019-06-21'); insert into Traffic (user_id, activity, activity_date) values ('5', 'login', '2019-03-01'); insert into Traffic (user_id, activity, activity_date) values ('5', 'logout', '2019-03-01'); insert into Traffic (user_id, activity, activity_date) values ('5', 'login', '2019-06-21'); insert into Traffic (user_id, activity, activity_date) values ('5', 'logout', '2019-06-21');
//测试数据
Create table If Not Exists Traffic (user_id int, activity ENUM('login', 'logout', 'jobs', 'groups', 'homepage'), activity_date date);

insert into Traffic (user_id, activity, activity_date) values ('1', 'login', '2019-05-01');
insert into Traffic (user_id, activity, activity_date) values ('1', 'homepage', '2019-05-01');
insert into Traffic (user_id, activity, activity_date) values ('1', 'logout', '2019-05-01');
insert into Traffic (user_id, activity, activity_date) values ('2', 'login', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('2', 'logout', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('3', 'login', '2019-01-01');
insert into Traffic (user_id, activity, activity_date) values ('3', 'jobs', '2019-01-01');
insert into Traffic (user_id, activity, activity_date) values ('3', 'logout', '2019-01-01');
insert into Traffic (user_id, activity, activity_date) values ('4', 'login', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('4', 'groups', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('4', 'logout', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('5', 'login', '2019-03-01');
insert into Traffic (user_id, activity, activity_date) values ('5', 'logout', '2019-03-01');
insert into Traffic (user_id, activity, activity_date) values ('5', 'login', '2019-06-21');
insert into Traffic (user_id, activity, activity_date) values ('5', 'logout', '2019-06-21');

解题思路

这道题的题目描述不是太清楚,容易引起歧义。
Traffic表中记录了所有用户每天的行为(登录、登出、访问的页面等)。同一种行为同一个用户每天可以完成多次。
题目要求:查出所有用户中,首次登录时间在最近90天内的用户,按天统计每天的人数。
根据题目要求,首先,需要计算出每个用户的首次登录时间。
这是一个分组取最小或者说开窗排序的操作。可以使用GROUP BY+MIN或开窗函数实现。
然后,筛选出首次登录时间在最近90天内的用户。
直接使用WHERE条件筛选即可。
最后,按天统计每天首次登录的用户数。
使用GROUP BY+COUNT实现。

参考SQL

未特别说明的情况下,参考SQL为基于MySQL8.0实现。
select
b.min_activity_date login_date,
count(1) user_count
from (
select
a.user_id,
min(activity_date) min_activity_date
from Traffic a
where a.activity = 'login'
group by a.user_id
)b
where b.min_activity_date >= date_sub('2019-06-30',interval 90 day)
and b.min_activity_date <= '2019-06-30'
group by b.min_activity_date
order by b.min_activity_date;
select b.min_activity_date login_date, count(1) user_count from ( select a.user_id, min(activity_date) min_activity_date from Traffic a where a.activity = 'login' group by a.user_id )b where b.min_activity_date >= date_sub('2019-06-30',interval 90 day) and b.min_activity_date <= '2019-06-30' group by b.min_activity_date order by b.min_activity_date;
select
    b.min_activity_date login_date,
    count(1) user_count
from (
    select
        a.user_id,
        min(activity_date) min_activity_date
    from Traffic a
    where a.activity = 'login'
    group by a.user_id
)b
where b.min_activity_date >= date_sub('2019-06-30',interval 90 day)
and b.min_activity_date <= '2019-06-30' 
group by b.min_activity_date
order by b.min_activity_date;
picture loss