分析函数用来计算一组行的聚合值,与聚集函数不同的是分析函数为每组行返回多条记录。由分析函数语法定义的行集就叫做窗口,窗口决定了对于当前行而言进行计算的行的范围,窗口的范围则可能基于物理的行数或者逻辑间隔。

函数语法

analytic_function([ arguments ]) OVER (analytic_clause)
  • OVER() 是开窗函数,这是开启分析函数的起点
  • PARTITION BY 是窗口
  • ORDER BY 是窗口规则(排序)
  • ROW|RANGE BETWEEN 是窗口范围

analytic_function

analytic_function 指定分析函数的名称,如 RANK 、LAG 等 ...

analytic_clause

analytic_clause 用来确定分析函数的操作规则,包括 query_partition_clause 、order_by_clause 和 windowing_clause 三个子句。

# analytic_clause 语法:
[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
  • query_partition_clause 记录集分组子句,类似于 GROUP BY 分组 query_partition_clause

  • order_by_clause 记录集排序子句, 类似于 ORDER BY 排序 order_by_clause

  • windowing_clause 记录集窗口范围子句 windowing_clause


函数目录

序号 函数分类 函数名称 函数简介
1 常规统计函数 MAX 在一个分组的数据窗口中查找表达式的最大值
2 MIN 在一个分组的数据窗口中查找表达式的最小值
3 AVG 计算分组中表达式的平均值
4 SUM 计算分组中表达式的累积和
5 COUNT 对一组内发生的事情进行累计计数
6 数据排序函数 RANK 根据 order by 排序结果,计算组间相对位置,有跳号
7 DENSE_RANK 根据 order by 排序结果,计算组间相对位置,无跳号
8 ROW_NUMBER 返回有序组中一行的偏移量,按特定的顺序分配行号
9 FIRST 从 DENSE_RANK 结果中返回最前面的一行
10 FIRST_VALUE 返回分组中数据窗口的第一个值
11 LAST 从 DENSE_RANK 结果中返回最后面的一行
12 LAST_VALUE 返回分组中数据窗口的最后一个值
13 LAG 访问结果集中的其他行,即当前行之前的行
14 LEAD 访问结果集中的其他行,即当前行之后的行
15 数据分布函数 NTILE 将一个分组分为"表达式"的散列表示
16 CUME_DIST 计算一行在分组中的相对位置
17 PERCENT_RANK 与 CUME_DIST 类似,计算规则稍有差别
18 PERCENTILE_CONT 返回一个与输入的百分比值相对应的数据值
19 PERCENTILE_DISC 返回一个与输入的百分比值相对应的数据值
20 RATIO_TO_REPORT 计算当前值对应于总数据的百分比
21 REGR_SLOPE 线性回归函数(适合最小二乘法回归线)
22 REGR_INTERCEPT
23 REGR_COUNT
24 REGR_R2
25 REGR_AVGX
26 REGR_AVGY
27 REGR_SXX
28 REGR_SYY
29 REGR_SXY
30 统计分析函数 CORR 返回一堆表达式的相对系数
31 COVAR_POP 返回一堆表达式的总体协方差
32 COVAR_SAMP 返回一堆表达式的样本协方差
33 STDDEV 计算当前行关于分组的标准偏离
34 STDDEV_POP 计算总体标准偏离,并返回总体变量的平方根
35 STDDEV_SAMP 计算累计样本标准偏离,并返回总体变量的平方根
36 VAR_POP 返回非空集合的总体变量
37 VAR_SAMP 返回非空集合的样本变量
38 VARIANCE 返回表达式的变量
39 其它函数 LISTAGG 字符串拼接
40 MEDIAN 返回一组数据的中间值(中位数)
41 NTH_VALUE 返回排序后结果集中任意一行


窗口范围

windowing_clause 用来指定分组中当前行的计算范围,无论是 rows 还是 range 窗口,窗口总是在分组中从上至下滑动的,窗口范围由 betwen ... and ... 限定,而不用的话都表示窗口到当前行结束。
windowing_clause

  • rows 关键字,指定窗口由 物理行 构成,即物理座位排数;
  • range 关键字,指定窗口由 逻辑偏移量 构成,即符合指定的逻辑条件的范围;
  • between ... and ... 关键字,用来指定窗口的起始点和终结点;
  • unbonded preceding 关键字,指明窗口开始于分组的第一行;
  • current row 如果作为起始点,用来指明窗口开始于当前行或当前值;如果作为终结点,用来指明窗口结束于当前行或当前值;
  • unbonded following 关键字,指明窗口结束于分组的最后一行;
  • value_expr 为物理或逻辑偏移量表达式

默认窗口范围

select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date) sum_sal1,  --分组第一行和当前行
  sum(t.salary) over(partition by t.dept_id) sum_sal2,  --分组第一行和分组最后一行
  sum(t.salary) over(order by t.hire_date) sum_sal3  --未分组,结果集第一行和最后一行  
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY   SUM_SAL1   SUM_SAL2   SUM_SAL3
---------- --------------- ---------- ---------- ---------- ---------- ---------- ----------
       101 Kochhar Neena           90 1989-09-21      17000      17000      34000      17000
       102 De Haan Lex             90 1993-01-13      17000      34000      34000      34000
       122 Kaufling Payam          50 1995-05-01       7900       7900      35400      41900
       120 Weiss Matthew           50 1996-07-18       8000      15900      35400      49900
       121 Fripp Adam              50 1997-04-10       8200      24100      35400      58100
       123 Vollman Shanta          50 1997-10-10       6500      30600      35400      64600
       124 Mourgos Kevin           50 1999-11-16       4800      35400      35400      69400

rows

rows 窗口,是由分组排序后分组中若干连续的行构成的窗口
rows 窗口目录树:

序号 窗口范围
1 该分组的第一行和该分组的最后一行
2 该分组的第一行和当前行
3 该分组的第一行和当前行的前 value_expr 行
4 该分组的第一行和当前行的后 value_expr 行
5 当前行和该分组的最后一行
6 仅当前行
7 当前行和当前行的后 value_expr 行
8 当前行的前 value_expr 行和分组最后一行
9 当前行的前 value_expr 行和当前行
10 当前行的前 value_expr1 行和当前行的前 value_expr2 行
11 当前行的前 value_expr1 行和当前行的后 value_expr2 行
12 当前行的后 value_expr 行和分组最后一行
13 当前行的后 value_expr1 行和当前行的后 value_expr2 行
  • rows between unbounded preceding and unbounded following
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between unbounded preceding and unbounded following) rows01 
from EMPLOYEE t 
where t.dept_id in (50, 90);
    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS01
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      35400
       120 Weiss Matthew           50 1996-07-18       8000      35400
       121 Fripp Adam              50 1997-04-10       8200      35400
       123 Vollman Shanta          50 1997-10-10       6500      35400
       124 Mourgos Kevin           50 1999-11-16       4800      35400
       101 Kochhar Neena           90 1989-09-21      17000      34000
       102 De Haan Lex             90 1993-01-13      17000      34000
  • rows between unbounded preceding and current row / rows unbounded preceding
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between unbounded preceding and current row) rows02,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows unbounded preceding) rows02 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS02     ROWS02
---------- --------------- ---------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900       7900       7900
       120 Weiss Matthew           50 1996-07-18       8000      15900      15900
       121 Fripp Adam              50 1997-04-10       8200      24100      24100
       123 Vollman Shanta          50 1997-10-10       6500      30600      30600
       124 Mourgos Kevin           50 1999-11-16       4800      35400      35400
       101 Kochhar Neena           90 1989-09-21      17000      17000      17000
       102 De Haan Lex             90 1993-01-13      17000      34000      34000
  • rows between unbounded preceding and value_expr preceding
# 该分组的第一行和当前行的前 1 行
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between unbounded preceding and 1 preceding) rows03 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS03
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900
       120 Weiss Matthew           50 1996-07-18       8000       7900
       121 Fripp Adam              50 1997-04-10       8200      15900
       123 Vollman Shanta          50 1997-10-10       6500      24100
       124 Mourgos Kevin           50 1999-11-16       4800      30600
       101 Kochhar Neena           90 1989-09-21      17000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • rows between current row and value_expr following
# 该分组的第一行和当前行的后 1 行
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between current row and 1 following) rows04 from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS04
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      15900
       120 Weiss Matthew           50 1996-07-18       8000      16200
       121 Fripp Adam              50 1997-04-10       8200      14700
       123 Vollman Shanta          50 1997-10-10       6500      11300
       124 Mourgos Kevin           50 1999-11-16       4800       4800
       101 Kochhar Neena           90 1989-09-21      17000      34000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • rows between current row and unbounded following
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between current row and unbounded following) rows05 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS05
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      35400
       120 Weiss Matthew           50 1996-07-18       8000      27500
       121 Fripp Adam              50 1997-04-10       8200      19500
       123 Vollman Shanta          50 1997-10-10       6500      11300
       124 Mourgos Kevin           50 1999-11-16       4800       4800
       101 Kochhar Neena           90 1989-09-21      17000      34000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • rows between current row and current row / rows current row
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between current row and current row) rows06,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows current row) rows06 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS06     ROWS06
---------- --------------- ---------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900       7900       7900
       120 Weiss Matthew           50 1996-07-18       8000       8000       8000
       121 Fripp Adam              50 1997-04-10       8200       8200       8200
       123 Vollman Shanta          50 1997-10-10       6500       6500       6500
       124 Mourgos Kevin           50 1999-11-16       4800       4800       4800
       101 Kochhar Neena           90 1989-09-21      17000      17000      17000
       102 De Haan Lex             90 1993-01-13      17000      17000      17000
  • rows between current row and value_expr following
# 当前行和当前行的后 1 行
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between current row and 1 following) rows07 from EMPLOYEE t  
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS07
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      15900
       120 Weiss Matthew           50 1996-07-18       8000      16200
       121 Fripp Adam              50 1997-04-10       8200      14700
       123 Vollman Shanta          50 1997-10-10       6500      11300
       124 Mourgos Kevin           50 1999-11-16       4800       4800
       101 Kochhar Neena           90 1989-09-21      17000      34000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • rows between value_expr preceding and unbounded following
# 当前行的前 1 行和分组最后一行
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 preceding and unbounded following) rows08 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS08
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      35400
       120 Weiss Matthew           50 1996-07-18       8000      35400
       121 Fripp Adam              50 1997-04-10       8200      27500
       123 Vollman Shanta          50 1997-10-10       6500      19500
       124 Mourgos Kevin           50 1999-11-16       4800      11300
       101 Kochhar Neena           90 1989-09-21      17000      34000
       102 De Haan Lex             90 1993-01-13      17000      34000
  • rows between value_expr preceding and current row / rows value_expr preceding
# 当前行的前 1 行和当前行
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 preceding and current row) rows09 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS09
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900       7900
       120 Weiss Matthew           50 1996-07-18       8000      15900
       121 Fripp Adam              50 1997-04-10       8200      16200
       123 Vollman Shanta          50 1997-10-10       6500      14700
       124 Mourgos Kevin           50 1999-11-16       4800      11300
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000      34000
  • rows between value_expr1 preceding and value_expr2 preceding
# 当前行的前 2 行和当前行的前 1 行(value_expr1 >= value_expr2)
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 2 preceding and 1 preceding) rows10 
from EMPLOYEE t  
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS10
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900
       120 Weiss Matthew           50 1996-07-18       8000       7900
       121 Fripp Adam              50 1997-04-10       8200      15900
       123 Vollman Shanta          50 1997-10-10       6500      16200
       124 Mourgos Kevin           50 1999-11-16       4800      14700
       101 Kochhar Neena           90 1989-09-21      17000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • rows between value_expr1 preceding and value_expr2 following
# 当前行的前 1 行和当前行的后 1 行
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 preceding and 1 following) rows11 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS11
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      15900
       120 Weiss Matthew           50 1996-07-18       8000      24100
       121 Fripp Adam              50 1997-04-10       8200      22700
       123 Vollman Shanta          50 1997-10-10       6500      19500
       124 Mourgos Kevin           50 1999-11-16       4800      11300
       101 Kochhar Neena           90 1989-09-21      17000      34000
       102 De Haan Lex             90 1993-01-13      17000      34000
  • rows between value_expr following and unbounded following
# 当前行的后 1 行和分组最后一行
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 following and unbounded following) rows12
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS12
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      27500
       120 Weiss Matthew           50 1996-07-18       8000      19500
       121 Fripp Adam              50 1997-04-10       8200      11300
       123 Vollman Shanta          50 1997-10-10       6500       4800
       124 Mourgos Kevin           50 1999-11-16       4800
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000
  • rows between value_expr1 following and value_expr2 following
# 当前行的后 1 行和当前行的后 2 行(value_expr1 <= value_expr2)
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 following and 2 following) rows13 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY     ROWS13
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      16200
       120 Weiss Matthew           50 1996-07-18       8000      14700
       121 Fripp Adam              50 1997-04-10       8200      11300
       123 Vollman Shanta          50 1997-10-10       6500       4800
       124 Mourgos Kevin           50 1999-11-16       4800
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000

range

range 窗口,相当于给 order_by_clause 中的 expr 加一个 where 限定条件,分组中满足条件:

  • 当 order by expr asc 时,where expr between a and b
  • 当 order by expr desc 时, where expr between b and a

满足以上条件的所有行构成一个逻辑窗口,其中 a 由分组中第 ra 行的值计算而来,b 由分组中第 rb 行的值计算而来,且 ra <= rb
range 窗口目录树(默认升序排序):

序号 窗口范围
1 该分组的第一行和该分组的最后一行
2 该分组的第一行和当前行
3 该分组的第一行和当前行的值-value_expr
4 该分组的第一个值和当前行的值+value_expr
5 该分组的当前行的值和最后一个值
6 该分组的当前行
7 该分组的当前行的值和当前行的值+value_expr
8 该分组的当前行的值-value_expr和最后一个值
9 该分组的当前行的值-value_expr和当前行的值
10 该分组的当前行的值-value_expr1和当前行的值-value-expr2
11 该分组的当前行的值-value-expr1和当前行的值+value+expr2
12 该分组的当前行的值+value_expr和最后一个值
13 该分组的当前行的值+value_expr1和当前行的值+value_expr2
  • range between unbounded preceding and unbounded following
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between unbounded preceding and unbounded following) range01 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE01
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      35400
       120 Weiss Matthew           50 1996-07-18       8000      35400
       121 Fripp Adam              50 1997-04-10       8200      35400
       123 Vollman Shanta          50 1997-10-10       6500      35400
       124 Mourgos Kevin           50 1999-11-16       4800      35400
       101 Kochhar Neena           90 1989-09-21      17000      34000
       102 De Haan Lex             90 1993-01-13      17000      34000
  • range between unbounded preceding and current row / range unbounded preceding
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between unbounded preceding and current row) range02 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE02
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900       7900
       120 Weiss Matthew           50 1996-07-18       8000      15900
       121 Fripp Adam              50 1997-04-10       8200      24100
       123 Vollman Shanta          50 1997-10-10       6500      30600
       124 Mourgos Kevin           50 1999-11-16       4800      35400
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000      34000
  • range between unbounded preceding and value_expr preceding
# 该分组的第一行和当前行hire_date-365天
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between unbounded preceding and 365 preceding) range03 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE03
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900
       120 Weiss Matthew           50 1996-07-18       8000       7900
       121 Fripp Adam              50 1997-04-10       8200       7900
       123 Vollman Shanta          50 1997-10-10       6500      15900
       124 Mourgos Kevin           50 1999-11-16       4800      30600
       101 Kochhar Neena           90 1989-09-21      17000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • range between unbounded preceding and value_expr following
# 该分组的第一行和当前行hire_date+365天
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between unbounded preceding and 365 following) range04
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE04
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900       7900
       120 Weiss Matthew           50 1996-07-18       8000      24100
       121 Fripp Adam              50 1997-04-10       8200      30600
       123 Vollman Shanta          50 1997-10-10       6500      30600
       124 Mourgos Kevin           50 1999-11-16       4800      35400
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000      34000
  • range between current row and unbounded following
5
  • range between current row and current row / range current row
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between current row and current row) range06 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE06
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900       7900
       120 Weiss Matthew           50 1996-07-18       8000       8000
       121 Fripp Adam              50 1997-04-10       8200       8200
       123 Vollman Shanta          50 1997-10-10       6500       6500
       124 Mourgos Kevin           50 1999-11-16       4800       4800
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • range between current row and value_expr following
# 该分组当前行和当前行hire_date+365天
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between current row and 365 following) range07 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE07
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900       7900
       120 Weiss Matthew           50 1996-07-18       8000      16200
       121 Fripp Adam              50 1997-04-10       8200      14700
       123 Vollman Shanta          50 1997-10-10       6500       6500
       124 Mourgos Kevin           50 1999-11-16       4800       4800
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • range between value_expr preceding and unbounded following
# 该分组当前行的值-365天和最后一个值
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 365 preceding and unbounded following) range08 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE08
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      35400
       120 Weiss Matthew           50 1996-07-18       8000      27500
       121 Fripp Adam              50 1997-04-10       8200      27500
       123 Vollman Shanta          50 1997-10-10       6500      19500
       124 Mourgos Kevin           50 1999-11-16       4800       4800
       101 Kochhar Neena           90 1989-09-21      17000      34000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • range between value_expr preceding and current row / range value_expr preceding
# 该分组当前行的值-365天和当前行的值
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 365 preceding and current row) range09 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE09
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900       7900
       120 Weiss Matthew           50 1996-07-18       8000       8000
       121 Fripp Adam              50 1997-04-10       8200      16200
       123 Vollman Shanta          50 1997-10-10       6500      14700
       124 Mourgos Kevin           50 1999-11-16       4800       4800
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • range between value_expr1 preceding and value_expr2 preceding
# 该分组当前行的值-365天和当前行的值-30天(value_expr1 >= value_expr2)

select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date 
  range between 365 preceding and 30 preceding) range10
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE10
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900
       120 Weiss Matthew           50 1996-07-18       8000
       121 Fripp Adam              50 1997-04-10       8200       8000
       123 Vollman Shanta          50 1997-10-10       6500       8200
       124 Mourgos Kevin           50 1999-11-16       4800
       101 Kochhar Neena           90 1989-09-21      17000
       102 De Haan Lex             90 1993-01-13      17000
  • range between value_expr1 preceding and value_expr2 following
# 该分组当前行的值hire_date-365天和当前行的值hire_date+365天
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 365 preceding and 365 following) range11 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE11
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900       7900
       120 Weiss Matthew           50 1996-07-18       8000      16200
       121 Fripp Adam              50 1997-04-10       8200      22700
       123 Vollman Shanta          50 1997-10-10       6500      14700
       124 Mourgos Kevin           50 1999-11-16       4800       4800
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000      17000
  • range between value_expr following and unbounded following
# 该分组当前行的值hire_date+365天和最后一个值
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 365 following and unbounded following) range12 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE12
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900      27500
       120 Weiss Matthew           50 1996-07-18       8000      11300
       121 Fripp Adam              50 1997-04-10       8200       4800
       123 Vollman Shanta          50 1997-10-10       6500       4800
       124 Mourgos Kevin           50 1999-11-16       4800
       101 Kochhar Neena           90 1989-09-21      17000      17000
       102 De Haan Lex             90 1993-01-13      17000
  • range between value_expr1 following and value_expr2 following
# 该分组当前行的值hire_date+30天和当前行的值hire_date+365天
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary,
  sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 30 following and 365 following) range13 
from EMPLOYEE t 
where t.dept_id in (50, 90);

    EMP_ID EMP_NAME           DEPT_ID HIRE_DATE      SALARY    RANGE13
---------- --------------- ---------- ---------- ---------- ----------
       122 Kaufling Payam          50 1995-05-01       7900
       120 Weiss Matthew           50 1996-07-18       8000       8200
       121 Fripp Adam              50 1997-04-10       8200       6500
       123 Vollman Shanta          50 1997-10-10       6500
       124 Mourgos Kevin           50 1999-11-16       4800
       101 Kochhar Neena           90 1989-09-21      17000
       102 De Haan Lex             90 1993-01-13      17000


数据准备

# employee.sql

DROP TABLE EMPLOYEE PURGE;
CREATE TABLE employee 
(
  EMP_ID NUMBER CONSTRAINT PK_EMPLOYEE PRIMARY KEY,
  EMP_NAME VARCHAR2(20) NOT NULL,
  DEPT_ID NUMBER NOT NULL,
  HIRE_DATE DATE,
  SALARY NUMBER
);
ALTER SESSION SET nls_date_format='YYYY-MM-DD';
INSERT INTO employee VALUES (100,'Wang John',10,'1990-01-01',20000);
INSERT INTO employee VALUES (101,'Kochhar Neena',90,'1989-09-21',17000);
INSERT INTO employee VALUES (102,'De Haan Lex',90,'1993-01-13',17000);
INSERT INTO employee VALUES (103,'Hunold Alexander',60,'1990-01-03',9000);
INSERT INTO employee VALUES (104,'Ernst Bruce',60,'1991-05-21',6000);
INSERT INTO employee VALUES (105,'Austin David',60,'1997-06-25',4800);
INSERT INTO employee VALUES (106,'Pataballa',60,'1998-02-05',4800);
INSERT INTO employee VALUES (107,'Lorentz Diana',60,'1999-02-07',4200);
INSERT INTO employee VALUES (108,'Greenberg Nancy',100,'1994-08-17',12000);
INSERT INTO employee VALUES (109,'Faviet Daniel',100,'1994-08-16',9000);
INSERT INTO employee VALUES (110,'Chen John',100,'1997-09-28',8200);
INSERT INTO employee VALUES (111,'Sciarra Ismael',100,'1997-09-30',7700);
INSERT INTO employee VALUES (112,'Urman Jose Manuel',100,'1998-03-07',7800);
INSERT INTO employee VALUES (113,'Popp Luis',100,'1999-12-07',6900);
INSERT INTO employee VALUES (114,'Raphaely Den',30,'1994-12-07',11000);
INSERT INTO employee VALUES (120,'Weiss Matthew',50,'1996-07-18',8000);
INSERT INTO employee VALUES (121,'Fripp Adam',50,'1997-04-10',8200);
INSERT INTO employee VALUES (122,'Kaufling Payam',50,'1995-05-01',7900);
INSERT INTO employee VALUES (123,'Vollman Shanta',50,'1997-10-10',6500);
INSERT INTO employee VALUES (124,'Mourgos Kevin',50,'1999-11-16',4800);
INSERT INTO employee VALUES (145,'Russell John',80,'1996-10-01',14000);
INSERT INTO employee VALUES (146,'Russell John',80,'1997-01-05',13500);
INSERT INTO employee VALUES (147,'Russell John',80,'1997-03-10',12000);
INSERT INTO employee VALUES (148,'Russell John',80,'1999-10-15',11000);
INSERT INTO employee VALUES (149,'Russell John',80,'2000-01-19',10500);
COMMIT;
# sales.sql

DROP TABLE sales PURGE;
CREATE TABLE sales
(
   country varchar2(5),
   sales_month date,
   sales_number integer,
   sales_value number(8,0)
);
ALTER SESSION SET nls_date_format='YYYY-MM-DD';
insert into sales values ('USA','2008-1-1',1200,'500000.00');
insert into sales values ('USA','2008-2-1',1150,'450000.00');
insert into sales values ('USA','2008-3-1',1300,'520000.00');
insert into sales values ('USA','2008-4-1',1280,'510000.00');
insert into sales values ('USA','2008-5-1',1350,'530000.00');
insert into sales values ('USA','2008-6-1',1400,'535000.00');
insert into sales values ('USA','2008-7-1',1300,'510000.00');
insert into sales values ('USA','2008-8-1',1250,'460000.00');
insert into sales values ('USA','2008-9-1',1400,'530000.00');
insert into sales values ('USA','2008-10-1',1380,'520000.00');
insert into sales values ('USA','2008-11-1',1450,'540000.00');
insert into sales values ('USA','2008-12-1',1500,'545000.00');
insert into sales values ('USA','2009-1-1',1600,'550000.00');
insert into sales values ('USA','2009-2-1',1390,'532000.00');
insert into sales values ('USA','2009-3-1',1730,'570000.00');
insert into sales values ('USA','2009-4-1',1900,'600000.00');
insert into sales values ('USA','2009-5-1',1850,'585000.00');
insert into sales values ('USA','2009-6-1',3800,'780000.00');
insert into sales values ('USA','2009-7-1',1700,'560000.00');
insert into sales values ('USA','2009-8-1',1490,'542000.00');
insert into sales values ('USA','2009-9-1',1830,'580000.00');
insert into sales values ('USA','2009-10-1',2000,'610000.00');
insert into sales values ('USA','2009-11-1',1950,'595000.00');
insert into sales values ('USA','2009-12-1',1900,'590000.00');
commit;


函数用法

常规统计函数

  • MAX
    MAX
# 获取每个部门中工资最高的员工
select * from (
  select 
    t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, 
    max(t.salary) over(partition by t.dept_id) max_sal
  from employee t
) where salary = max_sal order by dept_id, emp_id;

   EMP_ID EMP_NAME                DEPT_ID HIRE_DATE      SALARY    MAX_SAL
--------- -------------------- ---------- ---------- ---------- ----------
      100 Wang John                    10 1990-01-01      20000      20000
      114 Raphaely Den                 30 1994-12-07      11000      11000
      121 Fripp Adam                   50 1997-04-10       8200       8200
      103 Hunold Alexander             60 1990-01-03       9000       9000
      145 Russell John                 80 1996-10-01      14000      14000
      101 Kochhar Neena                90 1989-09-21      17000      17000
      102 De Haan Lex                  90 1993-01-13      17000      17000
      108 Greenberg Nancy             100 1994-08-17      12000      12000
  • MIN
    MIN
# 分析函数中使用 ORDER BY 子句 
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, 
  min(t.salary) over(partition by t.dept_id order by t.hire_date) min_sal
from employee t order by t.dept_id, t.hire_date;

    EMP_ID EMP_NAME        DEPT_ID HIRE_DATE      SALARY    MIN_SAL
---------- -------------------- ---------- ---------- ---------- ----------
       100 Wang John            10 1990-01-01      20000      20000
       114 Raphaely Den         30 1994-12-07      11000      11000
       122 Kaufling Payam       50 1995-05-01       7900       7900
       120 Weiss Matthew        50 1996-07-18       8000       7900
       121 Fripp Adam           50 1997-04-10       8200       7900
       123 Vollman Shanta       50 1997-10-10       6500       6500
       124 Mourgos Kevin        50 1999-11-16       4800       4800
       103 Hunold Alexander     60 1990-01-03       9000       9000
       104 Ernst Bruce          60 1991-05-21       6000       6000
       105 Austin David         60 1997-06-25       4800       4800
       106 Pataballa            60 1998-02-05       4800       4800
       107 Lorentz Diana        60 1999-02-07       4200       4200
# 分析函数中不使用 ORDER BY 子句
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, 
  min(t.salary) over(partition by t.dept_id) min_sal
from employee t order by t.dept_id, t.hire_date;

    EMP_ID EMP_NAME        DEPT_ID HIRE_DATE      SALARY    MIN_SAL
---------- -------------------- ---------- ---------- ---------- ----------
       100 Wang John            10 1990-01-01      20000      20000
       114 Raphaely Den         30 1994-12-07      11000      11000
       122 Kaufling Payam       50 1995-05-01       7900       4800
       120 Weiss Matthew        50 1996-07-18       8000       4800
       121 Fripp Adam           50 1997-04-10       8200       4800
       123 Vollman Shanta       50 1997-10-10       6500       4800
       124 Mourgos Kevin        50 1999-11-16       4800       4800
       103 Hunold Alexander     60 1990-01-03       9000       4200
       104 Ernst Bruce          60 1991-05-21       6000       4200
       105 Austin David         60 1997-06-25       4800       4200
       106 Pataballa            60 1998-02-05       4800       4200
       107 Lorentz Diana        60 1999-02-07       4200       4200
差异原因:
1. 不使用 ORDER BY 子句,默认窗口为该分组的第一行和该分组的最后一行,所以 MIN(salary) 会等于分组中的最小值;
2. 使用 ORDER BY 子句,会根据 hire_date 排序,然后获取小于等于当前行的 hire_date 的所有 salary,然后再取出其中的 min(salary)。
  • AVG
    AVG
# 获取部门平均薪水
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, 
  avg(t.salary) over(partition by t.dept_id) avg_sal
from employee t order by t.dept_id;

    EMP_ID EMP_NAME        DEPT_ID HIRE_DATE      SALARY    AVG_SAL
---------- -------------------- ---------- ---------- ---------- ----------
       100 Wang John            10 1990-01-01      20000      20000
       114 Raphaely Den         30 1994-12-07      11000      11000
       121 Fripp Adam           50 1997-04-10       8200       7080
       122 Kaufling Payam       50 1995-05-01       7900       7080
       124 Mourgos Kevin        50 1999-11-16       4800       7080
       120 Weiss Matthew        50 1996-07-18       8000       7080
       123 Vollman Shanta       50 1997-10-10       6500       7080
       107 Lorentz Diana        60 1999-02-07       4200       5760
       106 Pataballa            60 1998-02-05       4800       5760
       103 Hunold Alexander     60 1990-01-03       9000       5760
       104 Ernst Bruce          60 1991-05-21       6000       5760
       105 Austin David         60 1997-06-25       4800       5760
  • SUM
    SUM
# 获取分组内的薪水和(窗口范围为该分区的第一行和当前行)
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, 
  sum(t.salary) over(partition by t.dept_id order by salary 
  range between unbounded preceding and current row) sum_sal
from employee t order by t.dept_id, t.salary;

    EMP_ID EMP_NAME        DEPT_ID HIRE_DATE      SALARY    SUM_SAL
---------- -------------------- ---------- ---------- ---------- ----------
       100 Wang John            10 1990-01-01      20000      20000
       114 Raphaely Den         30 1994-12-07      11000      11000
       124 Mourgos Kevin        50 1999-11-16       4800       4800
       123 Vollman Shanta       50 1997-10-10       6500      11300
       122 Kaufling Payam       50 1995-05-01       7900      19200
       120 Weiss Matthew        50 1996-07-18       8000      27200
       121 Fripp Adam           50 1997-04-10       8200      35400
       107 Lorentz Diana        60 1999-02-07       4200       4200
       106 Pataballa            60 1998-02-05       4800      13800
       105 Austin David         60 1997-06-25       4800      13800
       104 Ernst Bruce          60 1991-05-21       6000      19800
       103 Hunold Alexander     60 1990-01-03       9000      28800
  • COUNT
    COUNT
# 窗口范围为该分区当前行的 hire_date-365*2 和 当前行的 hire_date+30*2 之间的累计计数
即: 统计当前行2年前至当前行3个月后这段时间内的累计计数。 
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, 
  count(*) over(partition by t.dept_id order by t.hire_date 
  range between 365*2 preceding and 30*3 following) sum_cnt
from employee t order by t.dept_id, t.hire_date;

    EMP_ID EMP_NAME        DEPT_ID HIRE_DATE      SALARY    SUM_CNT
---------- -------------------- ---------- ---------- ---------- ----------
       100 Wang John            10 1990-01-01      20000      1
       114 Raphaely Den         30 1994-12-07      11000      1
       122 Kaufling Payam       50 1995-05-01       7900      1
       120 Weiss Matthew        50 1996-07-18       8000      2
       121 Fripp Adam           50 1997-04-10       8200      3
       123 Vollman Shanta       50 1997-10-10       6500      3
       124 Mourgos Kevin        50 1999-11-16       4800      1
       103 Hunold Alexander     60 1990-01-03       9000      1
       104 Ernst Bruce          60 1991-05-21       6000      2
       105 Austin David         60 1997-06-25       4800      1
       106 Pataballa            60 1998-02-05       4800      2
       107 Lorentz Diana        60 1999-02-07       4200      3

说明: 
1. preceding("-")
2. following("+")
如 "107 Lorentz Diana" 统计的就是 "1997-02-07 ~ 1999-05-07" 之间的累计计数。

数据排序函数

  • RANK
    RANK
# 根据 ORDER BY 子句中表达式的值,从查询返回的每一行,计算它们与其它行的相对位置(排名跳号不连续)
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, 
  rank() over(partition by t.dept_id order by t.salary) rank
from employee t where t.dept_id = 60 order by rank;

    EMP_ID EMP_NAME        DEPT_ID HIRE_DATE      SALARY       RANK
---------- -------------------- ---------- ---------- ---------- ----------
       107 Lorentz Diana        60 1999-02-07       4200      1
       105 Austin David         60 1997-06-25       4800      2
       106 Pataballa            60 1998-02-05       4800      2
       104 Ernst Bruce          60 1991-05-21       6000      4
       103 Hunold Alexander     60 1990-01-03       9000      5
  • DENSE_RANK
    DENSE_RANK
# 根据 ORDER BY 子句中表达式的值,从查询返回的每一行,计算它们与其它行的相对位置(排名不跳号连续)

select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, 
  dense_rank() over(partition by t.dept_id order by t.salary) dense_rank
from employee t where t.dept_id = 60 order by dense_rank;

    EMP_ID EMP_NAME        DEPT_ID HIRE_DATE      SALARY DENSE_RANK
---------- -------------------- ---------- ---------- ---------- ----------
       107 Lorentz Diana        60 1999-02-07       4200      1
       105 Austin David         60 1997-06-25       4800      2
       106 Pataballa            60 1998-02-05       4800      2
       104 Ernst Bruce          60 1991-05-21       6000      3
       103 Hunold Alexander     60 1990-01-03       9000      4
  • ROW_NUMBER
    ROW_NUMBER
# 返回有序分组中一行的偏移量
select 
  t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, 
  row_number() over(partition by t.dept_id order by t.salary) row_number
from employee t where t.dept_id = 60 order by row_number;

    EMP_ID EMP_NAME        DEPT_ID HIRE_DATE      SALARY ROW_NUMBER
---------- -------------------- ---------- ---------- ---------- ----------
       107 Lorentz Diana        60 1999-02-07       4200      1
       105 Austin David         60 1997-06-25       4800      2
       106 Pataballa            60 1998-02-05       4800      3
       104 Ernst Bruce          60 1991-05-21       6000      4
       103 Hunold Alexander     60 1990-01-03       9000      5
  • FIRST
    FIRST
# 从 DENSE_RANK 返回的结果集中取出排在最前面的一个值的行(可能多行,因为值可能相等)
select 
  t.dept_id,
  min(t.salary) keep(dense_rank first order by t.salary) first,
  max(t.salary) keep(dense_rank last order by t.salary) last
from employee t group by t.dept_id;

   DEPT_ID  FIRST       LAST
---------- ---------- ----------
    10  20000      20000
    30  11000      11000
    50   4800       8200
    60   4200       9000
    80  10500      14000
    90  17000      17000
  • FIRST_VALUE
    FITST_VALUE
# 返回分组中数据窗口的第一个值

select 
  t.emp_id, t.emp_name, t.dept_id, t.salary,
  first_value(t.salary) over(partition by t.dept_id order by t.salary) first_val,
  first_value(t.salary) over(partition by t.dept_id order by t.salary desc) last_val
from employee t;

    EMP_ID EMP_NAME        DEPT_ID     SALARY  FIRST_VAL LAST_VAL
---------- -------------------- ---------- ---------- ---------- ----------
       100 Wang John            10  20000      20000      20000
       114 Raphaely Den         30  11000      11000      11000
       124 Mourgos Kevin        50   4800       4800       8200
       123 Vollman Shanta       50   6500       4800       8200
       122 Kaufling Payam       50   7900       4800       8200
       120 Weiss Matthew        50   8000       4800       8200
       121 Fripp Adam           50   8200       4800       8200
       107 Lorentz Diana        60   4200       4200       9000
       105 Austin David         60   4800       4200       9000
       106 Pataballa            60   4800       4200       9000
       104 Ernst Bruce          60   6000       4200       9000
       103 Hunold Alexander     60   9000       4200       9000
  • LAST
    LAST
# 从 DENSE_RANK 返回的结果集中取出排在最后面的一个值的行(可能多行,因为值可能相等)
  • LAST_VALUE
    LAST_VALUE
# 返回分组中数据窗口的最后个值
  • LAG
    LAG
# 在给定的分组中可以访问当前行之前的行,默认偏移量为1
select 
  t.emp_id, t.salary, t.hire_date,
  lag(salary, 1, 0) over(order by t.hire_date) prev_lag,
  lag(salary, 2, -1) over(order by t.hire_date) prev_lag2
from employee t where t.dept_id = 60
order by t.hire_date;

    EMP_ID     SALARY HIRE_DATE    PREV_LAG  PREV_LAG2
---------- ---------- ---------- ---------- ----------
       103   9000 1990-01-03       0        -1
       104   6000 1991-05-21       9000     -1
       105   4800 1997-06-25       6000   9000
       106   4800 1998-02-05       4800   6000
       107   4200 1999-02-07       4800   4800
  • LEAD
    LEAD
# 在给定的分组中可以访问当前行之后的行,默认偏移量为1
select 
  t.emp_id, t.salary, t.hire_date,
  lead(salary, 1, 0) over(order by t.hire_date) next_lead
from employee t where t.dept_id = 60
order by t.hire_date;

    EMP_ID     SALARY HIRE_DATE   NEXT_LEAD
---------- ---------- ---------- ----------
       103   9000 1990-01-03       6000
       104   6000 1991-05-21       4800
       105   4800 1997-06-25       4800
       106   4800 1998-02-05       4200
       107   4200 1999-02-07       0

数据分布函数

  • NTILE
    NTILE
# 将一个分组分为 "表达式" 的散列显示。例如表达式为=4,则给分组中的每一行分配一个数(1~4),如果分组中有20行,则给前5行分配1,给下5行分配2等
select 
  t.emp_id, t.emp_name, t.dept_id, t.salary,
  ntile(4) over(partition by t.dept_id order by t.salary) quartile
from employee t where t.dept_id in (60, 100)
order by t.dept_id, t.salary, quartile; 

    EMP_ID EMP_NAME        DEPT_ID     SALARY   QUARTILE
---------- -------------------- ---------- ---------- ----------
       107 Lorentz Diana        60   4200          1
       105 Austin David         60   4800          1
       106 Pataballa            60   4800          2
       104 Ernst Bruce          60   6000          3
       103 Hunold Alexander     60   9000          4
       113 Popp Luis            100  6900          1
       111 Sciarra Ismael       100  7700          1
       112 Urman Jose Manuel    100  7800          2
       110 Chen John            100  8200          2
       109 Faviet Daniel        100  9000          3
       108 Greenberg Nancy      100 12000          4

说明:
1. 将数据分为部门60和部门100两个分组,部门60有5条数据,因此将另外一个值(5/4余数)分配给 "buckets 1";
部门100有6条数据,因此将另外两个值(6/4余数)分配给 "buckets 1" 和 "buckets 2"。
  • CUME_DIST
    CUME_DIST
# 计算一行在分组中的相对位置,CUME_DIST 返回的值范围是>0到<=1
select 
  t.emp_id, t.emp_name, t.dept_id, t.salary,
  cume_dist() over(partition by t.dept_id order by t.salary) cume_dist
from employee t where t.dept_id = 50
order by t.salary;

    EMP_ID EMP_NAME        DEPT_ID     SALARY  CUME_DIST
---------- -------------------- ---------- ---------- ----------
       124 Mourgos Kevin        50   4800         .2
       123 Vollman Shanta       50   6500         .4
       122 Kaufling Payam       50   7900         .6
       120 Weiss Matthew        50   8000         .8
       121 Fripp Adam           50   8200          1

select t.emp_id, t.emp_name, t.dept_id, t.salary 
from employee t where t.dept_id = 50 order by t.salary;

    EMP_ID EMP_NAME        DEPT_ID     SALARY
---------- -------------------- ---------- ----------
       124 Mourgos Kevin        50   4800
       123 Vollman Shanta       50   6500
       122 Kaufling Payam       50   7900
       120 Weiss Matthew        50   8000
       121 Fripp Adam           50   8200

说明:
以 "123 Vollman Shanta" 为例,部门50的员工,有40%的工资低于或等于 Vollman Shanta(含自己)的工资。
  • PERCENT_RANK
    PERCENT_RANK
# 对于一个给定的分组来说,在计算那行的序号时,先减1,然后除以n-1(n为分组中所有的行数),该函数总是返回0~1(包括1)之间的数
select 
  t.emp_id, t.emp_name, t.dept_id, t.salary,
  percent_rank() over(partition by t.dept_id order by t.salary) pr
from employee t where t.dept_id in (10, 50)
order by t.dept_id, pr, t.salary;

    EMP_ID EMP_NAME        DEPT_ID     SALARY         PR
---------- -------------------- ---------- ---------- ----------
       100 Wang John            10  20000          0
       124 Mourgos Kevin        50   4800          0
       123 Vollman Shanta       50   6500        .25
       122 Kaufling Payam       50   7900         .5
       120 Weiss Matthew        50   8000        .75
       121 Fripp Adam           50   8200          1

说明:
计算每个员工在部门内的工资等级百分比。
  • PERCENTILE_CONT
    PERCENTILE_CONT
# 返回一个与输入的分布百分比相对应的数据值,根据 PERCENT_RANK 可以获的分布百分比
select 
  t.emp_id, t.emp_name, t.dept_id, t.salary,
  percent_rank() over(partition by t.dept_id order by t.salary) pr,
  percentile_cont(0.5) within group (order by t.salary) over(partition by t.dept_id) pd
from employee t where t.dept_id = 50
order by t.dept_id, pr, t.salary;

    EMP_ID EMP_NAME        DEPT_ID     SALARY         PR     PD
---------- -------------------- ---------- ---------- ---------- ----------
       124 Mourgos Kevin        50      4800           0       7900
       123 Vollman Shanta       50      6500         .25       7900
       122 Kaufling Payam       50      7900          .5       7900
       120 Weiss Matthew        50      8000         .75       7900
       121 Fripp Adam           50      8200           1       7900
  • PERCENTILE_DISC
    PERCENTILE_DISC
# 返回一个与输入的分布百分比值相对应的数据值,根据 CUME_DIST 可以获取分布百分比,如果找不到就取大于该分布值的下一个值
select 
  t.emp_id, t.dept_id, t.salary,
  cume_dist() over(partition by t.dept_id order by t.salary) cd,
  percentile_disc(0.6) within group (order by t.salary) over(partition by t.dept_id) pd,
  percentile_disc(0.1) within group (order by t.salary) over(partition by t.dept_id) pd2
from employee t where t.dept_id = 50
order by t.dept_id, pd, t.salary;

    EMP_ID    DEPT_ID     SALARY     CD     PD        PD2
---------- ---------- ---------- ---------- ---------- ----------
       124     50       4800        .2      7900         4800
       123     50       6500        .4      7900         4800
       122     50       7900        .6      7900         4800
       120     50       8000        .8      7900         4800
       121     50       8200         1      7900         4800

说明:
1. 找到0.6这个分布值,则取值
2. 找不到0.1这个分布在,则取大于分布值的下一个值(0.2)
  • RATIO_TO_REPORT
    RATIO_TO_REPORT
# 该函数计算 expression/(sum(expression)) 的值,它给出相对于总数的百分比
select * from (
  select 
    t.emp_id, t.dept_id, t.salary,
    sum(salary) over() sum_sal,
    sum(salary) over(partition by t.dept_id) sum_d_sal,
    round(ratio_to_report(salary) over(), 4) rtr,
    round(ratio_to_report(salary) over(partition by t.dept_id), 4) rtr2
  from employee t
) where dept_id = 60 order by salary;

    EMP_ID    DEPT_ID     SALARY    SUM_SAL  SUM_D_SAL        RTR   RTR2
---------- ---------- ---------- ---------- ---------- ---------- ----------
       107     60       4200     241800  28800      .0174      .1458
       106     60       4800     241800  28800      .0199      .1667
       105     60       4800     241800  28800      .0199      .1667
       104     60       6000     241800  28800      .0248      .2083
       103     60       9000     241800  28800      .0372      .3125

说明:
1. 计算部门60下每个员工的工资占所有员工工资总额的百分比;
2. 计算部门60下每个员工的工资占部门员工工资总额的百分比。
  • REGR_ (Linear Regression) Functions
    REGR_ (Linear Regression) Functions
REGR_ (Linear Regression) Functions 这些线性回归函数适合最小二乘法回归线,有9个不同的函数可使用

统计分析函数

  • CORR
    CORR
# 返回一对表达式的相关系数,通过返回-1~1之间的一个数,相关系数给出了关联的强度,0表示不相关
SELECT 
  t.emp_id, t.dept_id, 
  TO_CHAR((SYSDATE - hire_date) YEAR TO MONTH ) "Yrs-Mns",     
  t.salary, 
  CORR(SYSDATE-hire_date, salary) OVER(PARTITION BY t.dept_id) Correlation
FROM employee t
WHERE t.dept_id in (50, 80)
ORDER BY t.dept_id, t.emp_id;

    EMP_ID    DEPT_ID Yrs-Mns     SALARY CORRELATION
---------- ---------- ------- ---------- -----------
       120     50 +22-01        8000  .876156086
       121     50 +21-04        8200  .876156086
       122     50 +23-03        7900  .876156086
       123     50 +20-10        6500  .876156086
       124     50 +18-09        4800  .876156086
       145     80 +21-10       14000  .912198993
       146     80 +21-07       13500  .912198993
       147     80 +21-05       12000  .912198993
       148     80 +18-10       11000  .912198993
       149     80 +18-07       10500  .912198993

说明:
结果集展示了员工在公司工作的时间长短和员工所属部门的薪水之间的相关关系。
  • COVAR_POP
    COVAR_POP
# 返回一对表达式的总体协方差,用于表示的是两个变量的总体的误差
select 
  p.product_id, p.supplier_id,
  round(covar_pop(p.list_price, p.min_price) over(order by p.product_id, p.supplier_id), 4) covar_pop
from oe.product_information p where p.category_id = 29
order by p.product_id, p.supplier_id;

PRODUCT_ID SUPPLIER_ID  COVAR_POP
---------- ----------- ----------
      1774  103088      0
      1775  103087    1473.25
      1794  103096  1702.7778
      1825  103093    1926.25
      2004  103086     1591.4
      2005  103086     1512.5
      2416  103088  1475.9796
      2417  103088  1478.7031
      2449  103088  1326.8642
      3101  103086     1195.2
      3170  103089  1590.0744
      3171  103089    1718.25
  • COVAR_SAMP
    COVAR_SAMP
# 返回一对表达式的样本协方差,用于表示的是两个变量的总体的误差
select 
  p.product_id, p.supplier_id,
  round(covar_samp(p.list_price, p.min_price) over(order by p.product_id, p.supplier_id), 4) covar_samp
from oe.product_information p where p.category_id = 29
order by p.product_id, p.supplier_id;

PRODUCT_ID SUPPLIER_ID COVAR_SAMP
---------- ----------- ----------
      1774  103088
      1775  103087     2946.5
      1794  103096  2554.1667
      1825  103093  2568.3333
      2004  103086    1989.25
      2005  103086       1815
      2416  103088  1721.9762
      2417  103088  1689.9464
      2449  103088  1492.7222
      3101  103086       1328
      3170  103089  1749.0818
      3171  103089  1874.4545
  • STDDEV
    STDDEV
# 计算当前行关于分组的标准偏离(标准差)
select 
  t.last_name, t.salary, 
  round(stddev(t.salary) over (order by t.hire_date), 4) StdDev
from hr.employees t where t.department_id = 30
order by t.last_name, t.salary, StdDev; 

LAST_NAME    SALARY     STDDEV
------------ ---------- ----------
Baida          2900     4035.2612
Colmenares     2500     3362.5883
Himuro         2600     3649.2465
Khoo           3100     5586.1436
Raphaely      11000          0
Tobias         2800     4650.0896
  • STDDEV_POP
    STDDEV_POP
# 该函数计算累积总体标准偏离,并返回总体变量的平方根,其返回值与VAR_POP函数的平方根相同
select 
  t.department_id, t.last_name, t.salary, 
  stddev_pop(t.salary) over (partition by t.department_id) as pop_std
from hr.employees t
order by t.department_id, t.last_name, t.salary, pop_std;

DEPARTMENT_ID LAST_NAME        SALARY    POP_STD
------------- ------------ ---------- ----------
       10       Whalen           4400          0
       20       Fay              6000       3500
       20       Hartstein       13000       3500
       30       Baida            2900  3069.6091
       30       Colmenares       2500  3069.6091
       30       Himuro           2600  3069.6091
       30       Khoo             3100  3069.6091
       30       Raphaely        11000  3069.6091
       30       Tobias           2800  3069.6091
       40       Mavris           6500          0

说明:
返回样本中员工工资的总体标准差。
  • STDDEV_SAMP
    STDDEV_SAMP
# 该函数计算累积样本标准偏离,并返回总体变量的平方根,其返回值与VAR_POP函数的平方根相同
select 
  department_id, last_name, hire_date, salary, 
  stddev_samp(salary) over (partition by department_id order by hire_date 
  rows between unbounded preceding and current row) as cum_sdev
from hr.employees
order by department_id, last_name, hire_date, salary, cum_sdev;

DEPARTMENT_ID LAST_NAME    HIRE_DATE      SALARY   CUM_SDEV
------------- ------------ ---------- ---------- ----------
       10       Whalen     2003-09-17       4400
       20       Fay        2005-08-17       6000 4949.74747
       20       Hartstein  2004-02-17      13000
       30       Baida      2005-12-24       2900 4035.26125
       30       Colmenares 2007-08-10       2500 3362.58829
       30       Himuro     2006-11-15       2600  3649.2465
       30       Khoo       2003-05-18       3100 5586.14357
       30       Raphaely   2002-12-07      11000
       30       Tobias     2005-07-24       2800  4650.0896
       40       Mavris     2002-06-07       6500

说明:
返回员工表中工资的样本标准差。
  • VAR_POP
    VAR_POP
# 该函数返回非空集合的总体方差(忽略null)
SUM((expr - (SUM(expr) / COUNT(expr)))2) / COUNT(expr)

select 
  t.calendar_month_desc,
  var_pop(sum(s.amount_sold)) over (order by t.calendar_month_desc) var_pop
from sh.sales s, sh.times t
where s.time_id = t.time_id and t.calendar_year = 1998
group by t.calendar_month_desc
order by t.calendar_month_desc, var_pop;

CALENDAR    VAR_POP
-------- ----------
1998-01       0
1998-02  2269111326
1998-03  5.5849E+10
1998-04  4.8252E+10
1998-05  6.0020E+10
1998-06  5.4091E+10
1998-07  4.7150E+10

说明:
计算1998年月销售额的累积总体方差。
  • VAR_SAMP
    VAR_SAMP
# 该函数返回非空集合的样本方差(忽略null)
(SUM(expr - (SUM(expr) / COUNT(expr)))2) / (COUNT(expr) - 1)

select 
  t.calendar_month_desc,
  var_samp(sum(s.amount_sold)) over (order by t.calendar_month_desc) var_samp
from sh.sales s, sh.times t
where s.time_id = t.time_id and t.calendar_year = 1998
group by t.calendar_month_desc
order by t.calendar_month_desc, var_samp;

CALENDAR   VAR_SAMP
-------- ----------
1998-01
1998-02  4538222653
1998-03  8.3774E+10
1998-04  6.4336E+10
1998-05  7.5025E+10
1998-06  6.4909E+10
1998-07  5.5009E+10

说明: 
计算1998年月销售额的样本方差。
  • VARIANCE
    VARIANCE
# 该函数返回表达式的变量,如果表达式中行数为1,则返回0,如果表达式中行数大于1,则返回 VAR_SAMP
select 
  last_name, salary, 
  round(variance(salary) over (order by hire_date), 4) variance
from hr.employees 
where department_id = 30
order by last_name, salary, variance; 

LAST_NAME    SALARY   VARIANCE
------------ ---------- ----------
Baida          2900 16283333.3
Colmenares     2500   11307000
Himuro         2600   13317000
Khoo           3100   31205000
Raphaely      11000          0
Tobias         2800   21623333.3

说明:
返回按雇佣日期排序的部门30的员工工资的累积方差。

其它函数

  • LISTAGG
    LISTAGG
# 在每个分组内,LISTAGG 根据 order by 子句对列值进行排序,将排序后的结果拼接起来(多行合并一行)
select 
  department_id dept_id, hire_date, last_name,
  listagg(last_name, '; ') within group (order by hire_date, last_name) 
    over (partition by department_id) Emp_list
from hr.employees
where hire_date < '2003-09-01'
order by 1, 2, 3;

   DEPT_ID HIRE_DATE  LAST_NAME       EMP_LIST
---------- ---------- --------------- --------------------
        30 2002-12-07 Raphaely        Raphaely; Khoo
        30 2003-05-18 Khoo            Raphaely; Khoo
        40 2002-06-07 Mavris          Mavris
        50 2003-05-01 Kaufling        Kaufling; Ladwig
        50 2003-07-14 Ladwig          Kaufling; Ladwig
        70 2002-06-07 Baer            Baer
        90 2001-01-13 De Haan         De Haan; King
        90 2003-06-17 King            De Haan; King
       100 2002-08-16 Faviet          Faviet; Greenberg
       100 2002-08-17 Greenberg       Faviet; Greenberg
       110 2002-06-07 Gietz           Gietz; Higgins
       110 2002-06-07 Higgins         Gietz; Higgins

说明:
按部门分组, 找出 hire_date < '2003-09-01' 的所有员工,并将其 last_name 拼接起来。
  • MEDIAN
    MEDIAN
# 获取一个数值或日期时间值,并返回中间值或一个内插值,该值在值排序后将成为中间值(忽略null)
select 
  t.emp_id, t.dept_id, salary,
  median(t.salary) over (partition by t.dept_id) "Median by Dept"
from employee t 
where t.dept_id > 60 
order by t.dept_id, t.salary;

    EMP_ID    DEPT_ID     SALARY Median by Dept
---------- ---------- ---------- --------------
       149         80      10500          12000
       148         80      11000          12000
       147         80      12000          12000
       146         80      13500          12000
       145         80      14000          12000
       101         90      17000          17000
       102         90      17000          17000
       113        100       6900           8000
       111        100       7700           8000
       112        100       7800           8000
       110        100       8200           8000
       109        100       9000           8000
       108        100      12000           8000

说明:
查询结果返回部门中的每个员工的工资中值。
1. 部门80有奇数个员工,所以根据salary排序后,中值=(n/2+1)的数值
2. 部门100有偶数个员工,所以根据salary排序后,中值=((n/2的数值)+(n/2+1的数值))/2
  • NTH_VALUE
    NTH_VALUE
# 获取排序后结果集中任意一行,根据偏移量获取,而不会仅仅第一行或最后一行
select
  t.emp_id, t.dept_id, t.salary,
  nth_value(t.salary, 1) over() "first",
  nth_value(t.salary, 5) from first over() "fifth",
  nth_value(t.salary, 2) from last over() "second_desc"
from employee t where t.dept_id in (50, 60) 
order by t.salary;

    EMP_ID    DEPT_ID     SALARY      first      fifth second_desc
---------- ---------- ---------- ---------- ---------- -----------
       107         60       4200       4200       6000        8200
       105         60       4800       4200       6000        8200
       106         60       4800       4200       6000        8200
       124         50       4800       4200       6000        8200
       104         60       6000       4200       6000        8200
       123         50       6500       4200       6000        8200
       122         50       7900       4200       6000        8200
       120         50       8000       4200       6000        8200
       121         50       8200       4200       6000        8200
       103         60       9000       4200       6000        8200

说明(未分组):
1. NTH_VALUE 中的第二个参数是指这个函数取排名第几的记录
2. from first 表示从排名第一往后取(如第五)
3. from last 表示从排名最末往前取(如倒数第二)
# 分组
select
  t.emp_id, t.dept_id, t.salary,
  nth_value(t.salary, 1) over(partition by t.dept_id) "first",
  nth_value(t.salary, 5) from first over(partition by t.dept_id) "fifth",
  nth_value(t.salary, 2) from last over(partition by t.dept_id) "second_desc"
from employee t where t.dept_id in (50, 60) 
order by t.dept_id, t.salary;

    EMP_ID    DEPT_ID     SALARY      first      fifth second_desc
---------- ---------- ---------- ---------- ---------- -----------
       124         50       4800       4800       8200        8000
       123         50       6500       4800       8200        8000
       122         50       7900       4800       8200        8000
       120         50       8000       4800       8200        8000
       121         50       8200       4800       8200        8000
       107         60       4200       4200       9000        6000
       105         60       4800       4200       9000        6000
       106         60       4800       4200       9000        6000
       104         60       6000       4200       9000        6000
       103         60       9000       4200       9000        6000


分析函数应用

目录

序号 应用分类 明细分类
1 现状分析 众数分析
2 80/20 分析
3 比重分析
5 离散趋势分析
6 TOPN 分析
7 强度分析
8 发展分析 环比分析
9 同比分析
10 基比分析

现状分析

  • 众数分析

众数是一组数据中出现次数最多的数值,在统计分布上具有明显集中趋势点的数值,代表数据的一般水平

select 
  salary
from ( 
  select 
    salary, 
    rank() over(order by repeat_num desc) rk_repeat_num
  from (
    select 
      t.salary, count(*) repeat_num
    from employee t group by t.salary
  )
) where rk_repeat_num = 1;

    SALARY
----------
      4800
  • 80/20 分析

80/20集合分析源自于帕累托二八定律,也就是80/20法则

select 
  ename, sal, deptno, sal_ratio, commu_ratio,
  case when commu_ratio >= 0.8 then 0 else 1 end as "80/20分析"
from (
  select 
    ename, sal, deptno, sal_ratio,
    sum(sal_ratio) over(order by sal_ratio desc) commu_ratio
  from (
    select 
      t.ename, t.sal, t.deptno,
      ratio_to_report(t.sal) over() sal_ratio
    from scott.emp t
  )
);

ENAME             SAL     DEPTNO  SAL_RATIO COMMU_RATIO  80/20分析
---------- ---------- ---------- ---------- ----------- ----------
KING             5000         10 .172265289  .172265289          1
FORD             3000         20 .103359173  .378983635          1
SCOTT            3000         20 .103359173  .378983635          1
JONES            2975         20 .102497847  .481481481          1
BLAKE            2850         30 .098191214  .579672696          1
CLARK            2450         10 .084409991  .664082687          1
ALLEN            1600         30 .055124892   .71920758          1
TURNER           1500         30 .051679587  .770887166          1
MILLER           1300         10 .044788975  .815676141          0
WARD             1250         30 .043066322  .901808786          0
MARTIN           1250         30 .043066322  .901808786          0
ADAMS            1100         20 .037898363  .939707149          0
JAMES             950         30 .032730405  .972437554          0
SMITH             800         20 .027562446           1          0

说明:
结果集中 "80/20分析=1" 字段反映了,80%的薪水是由他们构成的。
  • 比重分析

比重分析主要用于分析业务整体中各要素在整体中所占的比重,并对这些比重进行排序,`便于决策者了解业务的整体结构

select 
  t.emp_id, t.dept_id, t.hire_date, t.salary,
  ratio_to_report(t.salary) over() sal_ratio,
  row_number() over(order by t.salary desc) row_num
from employee t;

    EMP_ID    DEPT_ID HIRE_DATE      SALARY  SAL_RATIO    ROW_NUM
---------- ---------- ---------- ---------- ---------- ----------
       100         10 1990-01-01      20000 .082712986          1
       101         90 1989-09-21      17000 .070306038          2
       102         90 1993-01-13      17000 .070306038          3
       145         80 1996-10-01      14000  .05789909          4
       146         80 1997-01-05      13500 .055831266          5
       147         80 1997-03-10      12000 .049627792          6
       108        100 1994-08-17      12000 .049627792          7
       148         80 1999-10-15      11000 .045492142          8
       114         30 1994-12-07      11000 .045492142          9
       149         80 2000-01-19      10500 .043424318         10
  • 离散趋势分析

离散趋势分析主要是进行标准差系数(变异系数)的分析,离散分析的主要作用在于分析样本指标的离散程度
在统计学中,我们通常是用标准差系数来体现指标样本的波动情况,标准差系数越大,表示平均值数的代表性越小,反之,则代表性越大

select 
  round(stddev_samp, 4) AS "标准差",
  round(stddev_samp/avg_sales_number, 4) AS "标准差系数",
  max_sales_number AS "最大值",
  min_sales_number AS "最小值",
  max_sales_number-min_sales_number AS "全距",
  total_sales_number AS "样本记录数"
from (
  select 
    distinct
    stddev_samp(s.sales_number) over() stddev_samp, --累积样本标准偏离
    avg(s.sales_number) over() avg_sales_number,
    max(s.sales_number) over() max_sales_number,
    min(s.sales_number) over() min_sales_number,
    count(s.sales_number) over() total_sales_number
  from sales s
);

   标准差 标准差系数     最大值     最小值       全距 样本记录数
--------- ---------- ---------- ---------- ---------- ----------
  530.438      .3256       3800       1150       2650         24
  • TOPN 分析

TOPN 分析能快速抽取指标中前N和后N名的数据,满足用户只想看报表中某个指标的最大和最小的几个值的需求

select * from (
  select 
    s.country, s.sales_month, s.sales_value,
    row_number() over(order by s.sales_value) row_num
  from sales s
) where row_num <= 3 or row_num >= (select count(*) from sales) - 2;

COUNT SALES_MONT SALES_VALUE    ROW_NUM
----- ---------- ----------- ----------
USA   2008-02-01      450000          1
USA   2008-08-01      460000          2
USA   2008-01-01      500000          3
USA   2009-04-01      600000         22
USA   2009-10-01      610000         23
USA   2009-06-01      780000         24
  • 强度分析

用一行指标数据作为基数,计算其它各个指标与基数的比值,用这个值来反映它们之间的差距
强度分析可以用于在不同时间或地区同类数字之间进行比较,产生比较相对指标,用以揭示不同单位之间的差异

select
  emp_id||'--'||first_emp_id,
  salary, first_salary,
  round(salary/first_salary, 2) salary_diff 
from (
  select 
    t.emp_id, t.emp_name, t.dept_id, t.salary,
    first_value(t.emp_id) over(order by t.emp_id) first_emp_id, --基数数据
    first_value(t.salary) over(order by t.emp_id) first_salary  --基数数据
  from employee t
);

EMP_ID||'--'||FIRST_EMP_ID     SALARY FIRST_SALARY SALARY_DIFF
-------------------------- ---------- ------------ -----------
100--100                        20000        20000           1
101--100                        17000        20000         .85
102--100                        17000        20000         .85
103--100                         9000        20000         .45
104--100                         6000        20000          .3
105--100                         4800        20000         .24
106--100                         4800        20000         .24
107--100                         4200        20000         .21
108--100                        12000        20000          .6
109--100                         9000        20000         .45
110--100                         8200        20000         .41

说明:
以 "EMP_ID=100" 作为基数,每一个 "EMP_ID"的 "salary" 和它进行对比,得出比值。

发展分析

  • 环比分析

环比就是现在的统计周期和上一个统计周期比较,例如,2018年7月份与2018年6月份相比较

select 
  country,
  to_char(sales_month, 'YYYYMM') ||' ~ '|| to_char(add_months(sales_month, -1), 'YYYYMM') "环比周期",
  sales_number "当期销售量",
  prev_sales_number "环比销售量",
  round(sales_number/prev_sales_number, 4) as "环比分析"
from (
  select 
    country, sales_month, sales_number, sales_value,
    --获取第一个月日期、第一个月的销售量
    first_value(sales_month) over(order by sales_month) first_sales_month,
    first_value(sales_number) over(order by sales_month) first_sales_number,
    prev_sales_number,
    prev12_sales_number
  from (
    select 
      s.country, s.sales_month, s.sales_number, s.sales_value,
      --获取上个月、上年销售量
      lag(s.sales_number, 1, 0) over(order by s.sales_month) prev_sales_number,
      lag(s.sales_number, 12, 0) over(order by s.sales_month) prev12_sales_number
    from sales s
  )
  where sales_month >= to_date('2009-01-01', 'YYYY-MM-DD')
); 

COUNTRY    环比周期          当期销售量 环比销售量   环比分析
---------- ----------------- ---------- ---------- ----------
USA        200901 ~ 200812         1600       1500     1.0667
USA        200902 ~ 200901         1390       1600      .8688
USA        200903 ~ 200902         1730       1390     1.2446
USA        200904 ~ 200903         1900       1730     1.0983
USA        200905 ~ 200904         1850       1900      .9737
USA        200906 ~ 200905         3800       1850     2.0541
USA        200907 ~ 200906         1700       3800      .4474
USA        200908 ~ 200907         1490       1700      .8765
USA        200909 ~ 200908         1830       1490     1.2282
USA        200910 ~ 200909         2000       1830     1.0929
USA        200911 ~ 200910         1950       2000       .975
USA        200912 ~ 200911         1900       1950      .9744

说明:
以 "200912 ~ 200911" 为例, "200912" 为当前统计周期, "200911" 为上一个统计周期(环比周期)。
  • 同比分析

同比就是与历史同期相比,例如,2018年8月份与2017年8月份相比较

select 
  country,
  to_char(sales_month, 'YYYYMM') ||' ~ '|| to_char(add_months(sales_month, -12), 'YYYYMM') "同比周期",
  sales_number "当期销售量",
  prev12_sales_number "同比销售量",
  round(sales_number/prev12_sales_number, 4) as "同比分析"
from (
  select 
    country, sales_month, sales_number, sales_value,
    --获取第一个月日期、第一个月的销售量
    first_value(sales_month) over(order by sales_month) first_sales_month,
    first_value(sales_number) over(order by sales_month) first_sales_number,
    prev_sales_number,
    prev12_sales_number
  from (
    select 
      s.country, s.sales_month, s.sales_number, s.sales_value,
      --获取上个月、上年销售量
      lag(s.sales_number, 1, 0) over(order by s.sales_month) prev_sales_number,
      lag(s.sales_number, 12, 0) over(order by s.sales_month) prev12_sales_number
    from sales s
  )
  where sales_month >= to_date('2009-01-01', 'YYYY-MM-DD')
); 

COUNTRY    同比周期        当期销售量 同比销售量   同比分析
---------- --------------- ---------- ---------- ----------
USA        200901 ~ 200801       1600       1200     1.3333
USA        200902 ~ 200802       1390       1150     1.2087
USA        200903 ~ 200803       1730       1300     1.3308
USA        200904 ~ 200804       1900       1280     1.4844
USA        200905 ~ 200805       1850       1350     1.3704
USA        200906 ~ 200806       3800       1400     2.7143
USA        200907 ~ 200807       1700       1300     1.3077
USA        200908 ~ 200808       1490       1250      1.192
USA        200909 ~ 200809       1830       1400     1.3071
USA        200910 ~ 200810       2000       1380     1.4493
USA        200911 ~ 200811       1950       1450     1.3448
USA        200912 ~ 200812       1900       1500     1.2667
  • 基比分析(定基比分析)

基比发展速度也叫总速度,是报告期水平与某一个固定时期水平之比,表明这种现象在较长时期内的发展速度

select 
  country,
  to_char(sales_month, 'YYYYMM') ||' ~ '|| to_char(first_sales_month, 'YYYYMM') "基比周期",
  sales_number "当期销售量",
  first_sales_number "基比初始值",
  round(sales_number/first_sales_number, 4) as "基比分析"
from (
  select 
    country, sales_month, sales_number, sales_value,
    --获取第一个月日期、第一个月的销售量
    first_value(sales_month) over(order by sales_month) first_sales_month,
    first_value(sales_number) over(order by sales_month) first_sales_number,
    prev_sales_number,
    prev12_sales_number
  from (
    select 
      s.country, s.sales_month, s.sales_number, s.sales_value,
      --获取上个月、上年销售量
      lag(s.sales_number, 1, 0) over(order by s.sales_month) prev_sales_number,
      lag(s.sales_number, 12, 0) over(order by s.sales_month) prev12_sales_number
    from sales s
  )
  where sales_month >= to_date('2009-01-01', 'YYYY-MM-DD')
); 

COUNTRY    基比周期        当期销售量 基比初始值   基比分析
---------- --------------- ---------- ---------- ----------
USA        200901 ~ 200901       1600       1600          1
USA        200902 ~ 200901       1390       1600      .8688
USA        200903 ~ 200901       1730       1600     1.0813
USA        200904 ~ 200901       1900       1600     1.1875
USA        200905 ~ 200901       1850       1600     1.1563
USA        200906 ~ 200901       3800       1600      2.375
USA        200907 ~ 200901       1700       1600     1.0625
USA        200908 ~ 200901       1490       1600      .9313
USA        200909 ~ 200901       1830       1600     1.1438
USA        200910 ~ 200901       2000       1600       1.25
USA        200911 ~ 200901       1950       1600     1.2188
USA        200912 ~ 200901       1900       1600     1.1875


自定义聚集函数

用户可以通过实现 Oracle 的 Extensibility Framework 中的 ODCIAggregate interface 来创建自定义聚集函数

接口函数

  • static function ODCIAggregateInitialize(sctx IN OUT str_conn_type) return number

自定义聚集函数初始化操作,从这儿开始一个聚集函数

  • member function ODCIAggregateIterate(self IN OUT str_conn_type, value IN varchar2) return number

自定义聚集函数的最主要步骤,这个函数定义我们聚集函数做什么操作(如取最大值,最小值,平均值),self 为当前聚集函数的指针,用来与前面的计算结果进行关联

  • member function ODCIAggregateMerge(self IN OUT str_conn_type, ctx2 str_conn_type) return number

用来合并两个聚集函数的两个不同指针对应的结果

  • member function ODCIAggregateTerminate(self IN str_conn_type, returnValue OUT varchar2, flags IN number) return number

终止聚集函数的处理,返回聚集函数处理的结果

创建自定义聚集函数

  • 创建 OBJECT TYPE
create or replace type secmax_context as object(
  firmax number,   -- 保存最大值
  secmax number,   -- 保存第二大值

  static function odciaggregateinitialize(sctx in out secmax_context) return number,   --初始化函数
  member function odciaggregateiterate(self  in out secmax_context,value in number) return number,   --迭代运算函数
  member function odciaggregatemerge(self in out secmax_context, ctx2 in secmax_context)return number,   --合并两个上下文到一个上下文中
  member function odciaggregateterminate(self in secmax_context, returnvalue out number, flags in number) return number   --对结果进行处理并返回处理结果
);
  • 实现该 OBJECT TYPE
create or replace type body secmax_context is

  static function ODCIAggregateInitialize(sctx IN OUT secmax_context) return number is
  begin
    sctx := secmax_context(0, 0);
    return ODCIConst.Success;
  end;

  member function ODCIAggregateIterate(self IN OUT secmax_context, value IN number) return number is
  begin
    if value > self.firmax then
      self.secmax := self.firmax;
      self.firmax := value;
    elsif value > self.secmax then
      self.secmax := value;
    end if;
    return ODCIConst.Success;
  end;

  member function ODCIAggregateTerminate(self IN secmax_context, returnValue OUT number, flags IN number) return number is
  begin
    returnValue := self.secmax;
    return ODCIConst.Success;
  end;

  member function ODCIAggregateMerge(self IN OUT secmax_context, ctx2 IN secmax_context) return number is
  begin
    if ctx2.firmax > self.firmax then
      if ctx2.secmax > self.firmax then
        self.secmax := ctx2.secmax;
            else
        self.secmax := self.firmax;
      end if;
      self.firmax := ctx2.firmax;
    elsif ctx2.firmax > self.secmax then
      self.secmax := ctx2.firmax;
    end if;
    return ODCIConst.Success;
  end;

end;
  • 创建聚合函数
create or replace function f_secmax (input number) return number parallel_enable aggregate using secmax_context;  
  • 数据测试
select f_secmax(sal) from scott.emp;

F_SECMAX(SAL)
-------------
         3000


select * from (
  select 
    t.emp_id, t.dept_id, t.hire_date, t.salary, 
    f_secmax(t.salary) over(partition by t.dept_id) max_sal
  from employee t
) where salary = max_sal order by dept_id, emp_id;

    EMP_ID    DEPT_ID HIRE_DATE      SALARY    MAX_SAL
---------- ---------- ---------- ---------- ----------
       120         50 1996-07-18       8000       8000
       104         60 1991-05-21       6000       6000
       146         80 1997-01-05      13500      13500
       101         90 1989-09-21      17000      17000
       102         90 1993-01-13      17000      17000
       109        100 1994-08-16       9000       9000

原创文章,转载请注明出处:http://www.opcoder.cn/article/3/