已知有用户账户表,包含年份,用户ID和值,请按照年份分组,取出值前两小和前两大对应的用户ID。需要保持值最小和最大的用户ID排首位。

样例数据

+-------+----------+--------+
| year  | user_id  | value  |
+-------+----------+--------+
| 2022  | A        | 30     |
| 2022  | B        | 10     |
| 2022  | C        | 20     |
| 2023  | A        | 40     |
| 2023  | B        | 50     |
| 2023  | C        | 20     |
| 2023  | D        | 30     |
+-------+----------+--------+

期望结果

+-------+-----------------+-----------------+
| year  | min_user_list   | max_user_list   |
+-------+-----------------+-----------------+
| 2022  | B,C             | A,C             |
| 2023  | C,D             | B,A             |
+-------+-----------------+-----------------+

分析

属于取最大最小记录的升级版,最大难点在于拼接用户要保证有序。

关于 CONCAT_WS 函数的详细用法,详见 MaxCompute 官方文档

SQL

使用 ROW_NUMBER() 函数根据年份分组,VALUE 字段正排和倒排得到两个序列

使用 row_number() 函数根据年份分组,根据 value 字段正序得到 asc_rn 用于取出 value 最小的两行记录;根据 value 倒序得到 desc_rn 用于取出最大的两行记录。

  • 执行 SQL
select 
    year
    ,user_id
    ,value
    ,row_number() over(partition by year order by value asc)    as asc_rn
    ,row_number() over(partition by year order by value desc)   as desc_rn
from user_account;
  • 返回结果

image

取出 VALUE 值最大的 USER_ID、第二大 USER_ID、最小 USER_ID、第二小 USER_ID

使用 if 函数对 asc_rn 和 desc_rn 字段进行判断,对符合条件的数据取出 user_id,不满足则取 null。

  • 执行 SQL
select 
    year
    ,user_id 
    ,value 
    ,if(asc_rn  = 1, user_id, null) as min_user_id_1
    ,if(asc_rn  = 2, user_id, null) as min_user_id_2
    ,if(desc_rn = 1, user_id, null) as max_user_id_1
    ,if(desc_rn = 2, user_id, null) as max_user_id_2
from (
    select 
        year
        ,user_id
        ,value
        ,row_number() over(partition by year order by value asc)    as asc_rn
        ,row_number() over(partition by year order by value desc)   as desc_rn
    from user_account
);
  • 返回结果

image

根据 YEAR 字段分组聚合

根据 year 字段分组聚合,将多行记录合并为单行记录。

  • 执行 SQL
select 
    year 
    ,max(if(asc_rn  = 1, user_id, null)) as min_user_id_1
    ,max(if(asc_rn  = 2, user_id, null)) as min_user_id_2
    ,max(if(desc_rn = 1, user_id, null)) as max_user_id_1
    ,max(if(desc_rn = 2, user_id, null)) as max_user_id_2
from (
    select 
        year
        ,user_id
        ,value
        ,row_number() over(partition by year order by value asc)    as asc_rn
        ,row_number() over(partition by year order by value desc)   as desc_rn
    from user_account
)
group by year;
  • 返回结果

image

拼接字符串,得到最终结果

拼接 min1_user_id、min2_user_id 字段为 min_user_list;拼接 max1_user_id、max2_user_id 字段为 max_user_list。

  • 执行 SQL
select 
    year 
    ,concat_ws(',', min_user_id_1, min_user_id_2) as min_user_list
    ,concat_ws(',', max_user_id_1, max_user_id_2) as max_user_list
from (
    select 
        year 
        ,max(if(asc_rn  = 1, user_id, null)) as min_user_id_1
        ,max(if(asc_rn  = 2, user_id, null)) as min_user_id_2
        ,max(if(desc_rn = 1, user_id, null)) as max_user_id_1
        ,max(if(desc_rn = 2, user_id, null)) as max_user_id_2
    from (
        select 
            year
            ,user_id
            ,value
            ,row_number() over(partition by year order by value asc)    as asc_rn
            ,row_number() over(partition by year order by value desc)   as desc_rn
        from user_account
    )
    group by year
);
  • 返回结果

image

建表语句

-- drop table user_account;
create table if not exists user_account (
    year    string,
    user_id string,
    value   bigint
)
comment '用户账户表';

示例数据

-- 样例数据
insert into user_account(year, user_id, value) values 
    ('2022', 'A', 30),
    ('2022', 'B', 10),
    ('2022', 'C', 20),
    ('2023', 'A', 40),
    ('2023', 'B', 50),
    ('2023', 'C', 20),
    ('2023', 'D', 30);

参考资料

常见大数据面试SQL-查询前2大和前2小用户并有序拼接

原创文章,转载请注明出处:http://www.opcoder.cn/article/76/