SQL-经典的T+N指标计算问题
关于SQL的经典取数问题,我们已经梳理了 连续登录、TOP N、环比同比 等问题的取数逻辑和代码实现,本文继续总结数据分析中必用的T+N指标计算问题。常见的T+N指标有 T+N留存率 、T+N转化率、T+N逾期率、T+N复借率 等,接下来,我们就以常见的注册及登录为例,计算注册客户的 T+N留存率 。
假设我们已有注册表register_detail和登录表login_detail,我们要按注册月份统计T+1注册登录率、T+3注册登录率、T+7注册登录率。
示例代码:
--01:注册 drop table if exists tmp1; create table tmp1 as select cust_id, register_time, date(register_time) as register_date, to_char(register_time,'yyyymm') as register_month from register_detail; --02:登录 drop table if exists tmp2; create table tmp2 as select cust_id, date(login_time) as login_date from login_detail group by 1,2; --03:匹配打标签 drop table if exists tmp3; create table tmp3 as select t1.cust_id, max(case when t2.login_date = t1.register_date + interval '1' day then 1 else 0 end) as is_login_t1, max(case when t2.login_date = t1.register_date + interval '3' day then 1 else 0 end) as is_login_t3, max(case when t2.login_date = t1.register_date + interval '7' day then 1 else 0 end) as is_login_t7 from tmp1 as t1 left join tmp2 as t2 on t1.cust_id = t2.cust_id and t2.login_date > t1.register_date and t2.login_date <= t1.register_date + interval '7' day group by 1; --04:反匹并按需处理 drop table if exists tmp4; create table tmp4 as select t1.cust_id, t1.register_time, t1.register_date, t1.register_month, coalesce(t2.is_login_t1, 0) as is_login_t1, coalesce(t2.is_login_t3, 0) as is_login_t3, coalesce(t2.is_login_t7, 0) as is_login_t7 from tmp1 as t1 left join tmp3 as t2 on t1.cust_id = t2.cust_id; --05:按注册月份汇总统计 drop table if exists tmp5; create table tmp5 as select register_month, register_cnt_t1, register_login_cnt_t1, register_cnt_t3, register_login_cnt_t3, register_cnt_t7, register_login_cnt_t7, register_login_cnt_t1*1.0000/register_cnt_t1 as register_login_rate_t1, register_login_cnt_t3*1.0000/register_cnt_t3 as register_login_rate_t3, register_login_cnt_t7*1.0000/register_cnt_t7 as register_login_rate_t7 from ( select register_month, sum(case when register_date < current_date - interval '1' day then 1 else 0 end) as register_cnt_t1, --注意分母要限时间窗口 sum(case when register_date < current_date - interval '1' day then is_login_t1 end) as register_login_cnt_t1, --注意分子要限时间窗口 sum(case when register_date < current_date - interval '3' day then 1 else 0 end) as register_cnt_t3, sum(case when register_date < current_date - interval '3' day then is_login_t3 end) as register_login_cnt_t3, sum(case when register_date < current_date - interval '7' day then 1 else 0 end) as register_cnt_t7, sum(case when register_date < current_date - interval '7' day then is_login_t7 end) as register_login_cnt_t7 from tmp4 group by 1 ) as t;
代码结果:
以上就是T+N注册登录率的分步代码,使用Presto SQL执行,其他SQL在日期时间的处理上可能会有所差异。需要强调的是代码中的第5步,T+N指标的计算需要注意隐含的时间窗口条件,分子要满足T+N的条件,分母也要满足N的窗口,就是说我们要计算T+3注册登录率,分母应该是已经注册达到3天的客户,刚注册1天的客户,其反应周期还不足3天,不应该作为分母。
其他诸如 T+N转化率、T+N逾期率、T+N复借率 其底层处理逻辑都差不多,参考上述代码按需调整即可。
原创文章,欢迎转载,转载请注明出处并留下原文链接。