0
点赞
收藏
分享

微信扫一扫

代码工艺:写代码的好习惯

爱动漫建模 2024-11-04 阅读 12
  • 题目:

  • sql建表:

  • Create table If Not Exists Product (product_id int, product_name varchar(30))
    Create table If Not Exists Sales (product_id int, period_start date, period_end date, average_daily_sales int)
    Truncate table Product
    insert into Product (product_id, product_name) values ('1', 'LC Phone ')
    insert into Product (product_id, product_name) values ('2', 'LC T-Shirt')
    insert into Product (product_id, product_name) values ('3', 'LC Keychain')
    Truncate table Sales
    insert into Sales (product_id, period_start, period_end, average_daily_sales) values ('1', '2019-01-25', '2019-02-28', '100')
    insert into Sales (product_id, period_start, period_end, average_daily_sales) values ('2', '2018-12-01', '2020-01-01', '10')
    insert into Sales (product_id, period_start, period_end, average_daily_sales) values ('3', '2019-12-01', '2020-01-31', '1')
  • 分析:首先,我们看到题目后,肯定会感觉这个题很难,我看到的时候也是,这道题的难点是怎么把每个商品的每一年分开,我们这里可以使用递归,递归结束条件为period_end的年份,开始年份是period_start的年份,但是,我们这里要加一些字段,例如sales表中的所有字段,然后再根据特定的条件来计算每一年的天数,然后在乘以日销,在关联商品表获取商品名称,大概思路就是这个样子,这里文字不太好理解,直接看图解(这一篇写的很详细):

  • sql实现:

  • with RECURSIVE
        year as (select product_id,
                        year(period_start) as year,
                        year(period_end)      a2,
                        period_start,
                        period_end,
                        average_daily_sales
                 from Sales
                 union
                 select product_id, year + 1, a2, period_start, period_end, average_daily_sales
                 from year
                 where year < a2),        -- 递归输出每个商品的每一年,加上sales的字段
        t2 as (select product_id,
                      year,
                      if(year = a2 and year = year(period_start),
                         (datediff(period_end, period_start) + 1)*average_daily_sales,          -- 如果满足year = a2 and year = year(period_start)条件,然后计算period_end和period_start的差值
                         
                         if(year = a2, (datediff(period_end, date_format(concat(year, '-01-01'), '%Y-%m-%d'))+1)*average_daily_sales,           -- 如果满足year = a2 and year != year(period_start)条件,计算period_end和year+1的第一天的差值
                         
                            if(year != a2 and year = year(period_start),
                               (datediff(date_format(concat(year + 1, '-01-01'), '%Y-%m-%d'), period_start))*average_daily_sales,365*average_daily_sales))) as a -- 如果满足year != a2 and year = year(period_start) 返回year+的第一天与period_start的差值,否则返回365
                               
               from year)
    select t2.product_id,product_name,date_format(concat(year,'-01-01'),'%Y') report_year ,a total_amount  
    from t2,Product p1 where t2.product_id=p1.product_id
    order by t2.product_id; -- 连接商品表获取商品姓名,然后返回需要字段,并转换类型
  • pandas例子:

  • data = [[1, 'LC Phone '], [2, 'LC T-Shirt'], [3, 'LC Keychain']]
    product = pd.DataFrame(data, columns=['product_id', 'product_name']).astype({'product_id':'Int64', 'product_name':'object'})
    data = [[1, '2019-01-25', '2019-02-28', 100], [2, '2018-12-01', '2020-01-01', 10], [3, '2019-12-01', '2020-01-31', 1]]
    sales = pd.DataFrame(data, columns=['product_id', 'period_start', 'period_end', 'average_daily_sales']).astype({'product_id':'Int64', 'period_start':'datetime64[ns]', 'period_end':'datetime64[ns]', 'average_daily_sales':'Int64'})
  • pandas分析和sql思路一样,只不过是把递归改成了for循环

  • pandas实现:

  • import pandas as pd
    
    def total_sales(product: pd.DataFrame, sales: pd.DataFrame) -> pd.DataFrame:
        sales['year']=sales['period_start'].dt.year.astype('Int64')
        sales['end'] = sales['period_end'].dt.year.astype('Int64')  -- 新增两列year和end,把开始时间和结束时间的年取出转换成int方便后面计算
    
        b = [[a,sales['end'].iloc[i],sales['product_id'].iloc[i],sales['period_start'].iloc[i],sales['period_end'].iloc[i],sales['average_daily_sales'].iloc[i]] for i in range(0, len(sales)) for a in range(sales['year'].iloc[i],sales['end'].iloc[i]+1)]  -- for循环输出每个商品的每年
    
        c=pd.DataFrame(b, columns=['year','end','product_id', 'period_start', 'period_end', 'average_daily_sales'])          -- 转换成dataframe对象并设置列名
    
    -- 自定义函数
        def get_func(x):
        -- 如果满足year = a2 and year = year(period_start)条件,然后计算period_end和period_start的差值
            if x['year']==x['end'] and x['year']==int(str(x['period_start']).split('-')[0]):
                a=x['period_end']-x['period_start']
                a=str(a).split(' ')[0]
                # print(a)
                return (int(a)+1)*x['average_daily_sales']
       --  如果满足year = a2 and year != year(period_start)条件,计算period_end和year+1的第一天的差值
            elif x['year']==x['end'] and x['year']!=int(str(x['period_start']).split('-')[0]):
                a=x['period_end']-pd.to_datetime(str(x['year']))
                a=str(a).split(' ')[0]
                # print(a)
                return (int(a)+1)*x['average_daily_sales']
                
          --  如果满足year != a2 and year = year(period_start) 返回year+的第一天与period_start的差值,否则返回365
            elif x['year']!=x['end'] and x['year']==int(str(x['period_start']).split('-')[0]):
                a=pd.to_datetime(str(x['year']+1))-x['period_start']
                a=str(a).split(' ')[0]
                return (int(a))*x['average_daily_sales']
            else:
                return 365*x['average_daily_sales']
        c['total_amount']=c.apply(get_func,axis=1)  -- 调用自定义函数
    
        d=c.merge(product, on='product_id', how='inner')   --连接商品表获取表名
    
        d=d[['product_id','product_name','year','total_amount']].sort_values(by=['product_id','year'])   -- 取出需要的列,然后排序
    
        d.columns=['product_id','product_name','report_year','total_amount']  -- 修改列名
        d['report_year']=d['report_year'].astype('str')   -- 把year改成字符串类型
        return d

举报

相关推荐

0 条评论