关于SQL的经典取数问题,我们已经梳理了 连续登录TOP N环比同比 等问题的取数逻辑和代码实现,本文继续总结数据分析中必用的T+N指标计算问题。常见的T+N指标有 T+N留存率 、T+N转化率、T+N逾期率、T+N复借率 等,接下来,我们就以常见的注册及登录为例,计算注册客户的 T+N留存率 。

假设我们已有注册表register_detail登录表login_detail,我们要按注册月份统计T+1注册登录率、T+3注册登录率、T+7注册登录率。

示例代码:

--01:注册
drop table if exists tmp1;
create table tmp1 as
select
    cust_id,
    register_time,
    date(register_time) as register_date,
    to_char(register_time,'yyyymm') as register_month
from register_detail;


--02:登录
drop table if exists tmp2;
create table tmp2 as
select
    cust_id,
    date(login_time) as login_date
from login_detail
group by 1,2;


--03:匹配打标签
drop table if exists tmp3;
create table tmp3 as
select
    t1.cust_id,
    max(case when t2.login_date = t1.register_date + interval '1' day then 1 else 0 end) as is_login_t1,
    max(case when t2.login_date = t1.register_date + interval '3' day then 1 else 0 end) as is_login_t3,
    max(case when t2.login_date = t1.register_date + interval '7' day then 1 else 0 end) as is_login_t7
from tmp1 as t1
left join tmp2 as t2
on t1.cust_id = t2.cust_id and t2.login_date > t1.register_date and t2.login_date <= t1.register_date + interval '7' day
group by 1;


--04:反匹并按需处理
drop table if exists tmp4;
create table tmp4 as
select
    t1.cust_id,
    t1.register_time,
    t1.register_date,
    t1.register_month,
    coalesce(t2.is_login_t1, 0) as is_login_t1,
    coalesce(t2.is_login_t3, 0) as is_login_t3,
    coalesce(t2.is_login_t7, 0) as is_login_t7
from tmp1 as t1
left join tmp3 as t2
on t1.cust_id = t2.cust_id;


--05:按注册月份汇总统计
drop table if exists tmp5;
create table tmp5 as
select
    register_month,
    register_cnt_t1,
    register_login_cnt_t1,
    register_cnt_t3,
    register_login_cnt_t3,
    register_cnt_t7,
    register_login_cnt_t7,
    register_login_cnt_t1*1.0000/register_cnt_t1 as register_login_rate_t1,
    register_login_cnt_t3*1.0000/register_cnt_t3 as register_login_rate_t3,
    register_login_cnt_t7*1.0000/register_cnt_t7 as register_login_rate_t7
from (
    select
        register_month,
        
        sum(case when register_date < current_date - interval '1' day then 1 else 0 end) as register_cnt_t1,  --注意分母要限时间窗口
        sum(case when register_date < current_date - interval '1' day then is_login_t1 end) as register_login_cnt_t1,  --注意分子要限时间窗口
        
        sum(case when register_date < current_date - interval '3' day then 1 else 0 end) as register_cnt_t3,
        sum(case when register_date < current_date - interval '3' day then is_login_t3 end) as register_login_cnt_t3,
        
        sum(case when register_date < current_date - interval '7' day then 1 else 0 end) as register_cnt_t7,
        sum(case when register_date < current_date - interval '7' day then is_login_t7 end) as register_login_cnt_t7
    from tmp4
    group by 1
) as t;

代码结果:

以上就是T+N注册登录率的分步代码,使用Presto SQL执行,其他SQL在日期时间的处理上可能会有所差异。需要强调的是代码中的第5步,T+N指标的计算需要注意隐含的时间窗口条件,分子要满足T+N的条件,分母也要满足N的窗口,就是说我们要计算T+3注册登录率,分母应该是已经注册达到3天的客户,刚注册1天的客户,其反应周期还不足3天,不应该作为分母。

其他诸如 T+N转化率、T+N逾期率、T+N复借率 其底层处理逻辑都差不多,参考上述代码按需调整即可。


原创文章,欢迎转载,转载请注明出处并留下原文链接。

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注