现有一张股票价格表 stock_data 有三个字段分别是股票代码(stock_code)、日期(trade_date)、收盘价格(closing_price) ,请找出满足连续5天以上(含)每天上涨超过5%的股票,并给出连续满足天数及开始和结束日期。 备注:不考虑停牌或其他情况,仅仅关注每天连续5天上涨超过5%的股票。

样例数据

image

分析

首先应该计算每天相对上一天的涨幅,这个使用 lag() 函数; 然后判断是否符合涨幅超过5%; 然后处理连续问题,使用 row_number() 函数; 最后计算天数、开始日期、结束日期,返回结果。

SQL

使用 lag() 函数计算每天的涨幅

  • 执行 SQL
select 
    stock_code
    ,trade_date
    ,closing_price
    ,closing_price / lag(closing_price) over (partition by stock_code order by trade_date) - 1  as daily_return
from stock_data 
order by stock_code, trade_date;
  • 返回结果

image

将涨幅换算为是否符合涨幅>=5%

  • 执行 SQL
select 
    stock_code
    ,trade_date
    ,closing_price
    ,case 
        when closing_price / lag(closing_price) over (partition by stock_code order by trade_date) - 1 >= 0.05 
            then 'T'
        else 
            'F'
    end as flag
from stock_data 
order by stock_code, trade_date;
  • 返回结果

image

按照 stock_code 分组,使用 row_number() 函数,获取日期编号

  • 执行 SQL
with 
a as (
    select 
        stock_code
        ,trade_date
        ,closing_price
        ,case 
            when closing_price / lag(closing_price) over (partition by stock_code order by trade_date) - 1 >= 0.05 
                then 'T'
            else 
                'F'
        end as flag
    from stock_data 
    order by stock_code, trade_date
)
select 
    stock_code
    ,trade_date 
    ,closing_price
    ,flag
    ,row_number() over(partition by stock_code order by trade_date) as rn
from a
order by stock_code, trade_date;
  • 返回结果

image

按照 stock_code、flag 分组,使用 row_number() 函数,获取日期编号

  • 执行 SQL
with 
a as (
    select 
        stock_code
        ,trade_date
        ,closing_price
        ,case 
            when closing_price / lag(closing_price) over (partition by stock_code order by trade_date) - 1 >= 0.05 
                then 'T'
            else 
                'F'
        end as flag
    from stock_data 
    order by stock_code, trade_date
)
select 
    stock_code
    ,trade_date 
    ,closing_price
    ,flag
    ,row_number() over(partition by stock_code order by trade_date) as rn
from a where flag = 'T'
order by stock_code, trade_date;
  • 返回结果

image

将上述两个结果集 left join,获取日期间隔

  • 执行 SQL
with 
a as (
    select 
        stock_code
        ,trade_date
        ,closing_price
        ,case 
            when closing_price / lag(closing_price) over (partition by stock_code order by trade_date) - 1 >= 0.05 
                then 'T'
            else 
                'F'
        end as flag
    from stock_data 
    order by stock_code, trade_date
),
b as (
    select 
        stock_code
        ,trade_date 
        ,closing_price
        ,flag
        ,row_number() over(partition by stock_code order by trade_date) as rn
    from a
    order by stock_code, trade_date
)
,c as (
    select 
        stock_code
        ,trade_date 
        ,closing_price
        ,flag
        ,row_number() over(partition by stock_code order by trade_date) as rn
    from a where flag = 'T'
    order by stock_code, trade_date    
)
select  
    b.stock_code
    ,b.trade_date 
    ,b.closing_price
    ,b.flag
    ,b.rn           as rnb
    ,c.rn           as rnc
    ,b.rn - c.rn    as rn_diff
from b left outer join c 
    on  b.stock_code = c.stock_code 
    and b.trade_date = c.trade_date
order by b.stock_code, b.trade_date;
  • 返回结果

image

从上述返回的结果集可知,当 rn_diff=1 时,表示当日涨幅超过5%,且与上一条满足条件的日期为连续日期。

合并脚本

with 
a as (
    select 
        stock_code
        ,trade_date
        ,closing_price
        ,case 
            when closing_price / lag(closing_price) over (partition by stock_code order by trade_date) - 1 >= 0.05 
                then 'T'
            else 
                'F'
        end as flag
    from stock_data 
    order by stock_code, trade_date
),
b as (
    select 
        stock_code
        ,trade_date 
        ,closing_price
        ,flag
        ,row_number() over(partition by stock_code order by trade_date) as rn
    from a
    order by stock_code, trade_date
)
,c as (
    select 
        stock_code
        ,trade_date 
        ,closing_price
        ,flag
        ,row_number() over(partition by stock_code order by trade_date) as rn
    from a where flag = 'T'
    order by stock_code, trade_date    
)
,d as (
    select  
        b.stock_code
        ,b.trade_date 
        ,b.closing_price
        ,b.flag
        ,b.rn           as rnb
        ,c.rn           as rnc
        ,b.rn - c.rn    as rn_diff
    from b left outer join c 
        on  b.stock_code = c.stock_code 
        and b.trade_date = c.trade_date
    order by b.stock_code, b.trade_date
)
select  
    stock_code
    ,min(trade_date)
    ,max(trade_date)
    ,count(*)
from d 
where rn_diff = 1
group by stock_code
having count(*) >= 5;

image

简化脚本,将 row_number() 函数放在一起

with 
a as (
    select 
        stock_code
        ,trade_date
        ,closing_price
        ,case 
            when closing_price / lag(closing_price) over (partition by stock_code order by trade_date) - 1 >= 0.05 
                then 'T'
            else 
                'F'
        end as flag
    from stock_data 
    order by stock_code, trade_date
),
b as (
    select 
        stock_code
        ,trade_date 
        ,closing_price
        ,flag
        ,row_number() over(partition by stock_code order by trade_date) as rn1
        ,row_number() over(partition by stock_code, flag order by trade_date) as rn2
        ,row_number() over(partition by stock_code order by trade_date) - 
            row_number() over(partition by stock_code, flag order by trade_date) as date_diff
    from a
    order by stock_code, trade_date
)
select 
    stock_code
    ,min(trade_date)    as start_date 
    ,max(trade_date)    as end_date
    ,count(*)           as cnt
from b
Where flag = 'T'
group by stock_code, date_diff
having count(*) >= 5;

image

建表语句

-- drop table stock_data;
set odps.sql.decimal.odps2=true;
create table if not exists stock_data (
    stock_code      string,
    trade_date      date,
    closing_price   decimal(10, 2)
)
comment '股票价格表';

示例数据

-- 样例数据
set odps.sql.decimal.odps2=true;
insert into table stock_data values  
    ('AAPL', date'2023-02-26', 100.00),
    ('AAPL', date'2023-02-27', 105.00),
    ('AAPL', date'2023-02-28', 110.25),
    ('AAPL', date'2023-03-01', 115.78),
    ('AAPL', date'2023-03-02', 121.59),
    ('AAPL', date'2023-03-03', 128.73),
    ('AAPL', date'2023-03-04', 137.00),
    ('AAPL', date'2023-03-05', 144.67),
    ('AAPL', date'2023-03-06', 147.64),
    ('GOOG', date'2023-02-26', 2000.00),
    ('GOOG', date'2023-02-27', 2100.00),
    ('GOOG', date'2023-02-28', 2205.00),
    ('GOOG', date'2023-03-01', 2313.25),
    ('GOOG', date'2023-03-02', 2431.01),
    ('GOOG', date'2023-03-03', 2547.56),
    ('GOOG', date'2023-03-04', 2680.19),
    ('GOOG', date'2023-03-05', 2814.20),
    ('GOOG', date'2023-03-06', 2955.91);

参考资料

腾讯大数据面试SQL-连续5天涨幅超过5%的股票

原创文章,转载请注明出处:http://www.opcoder.cn/article/72/