本文根据Bail et al.的著作Empirical Asset Pricing编写相关程序,投资组合分析的模块是EAP.portfolio_analysis。本文的Package已发布于Github:
Github: GitHub - whyecofiliter/EAP: empirical asset pricing
盈利能力和未来回报的关系在文献中有更多讨论(Fama and French, 2015; Hou et al.,2015)。通过这些文献,盈利能力因素被构成并包含在两个主要的资产定价模型中,Fama French 5因素模型和HXZ模型。盈利能力因素有许多代理变量。这些变量高度相关,最常见的是股本回报率(ROE)。除了ROE,ROA还得到了一些文献的支持(Novy Marx, 2013; Ball et al., 2015)。另一个替代方案是有形资本回报率(ROTC),由Greenblatt (2006, 2010)提出。实证结果表明,美国股市的盈利能力与未来收益率正相关,而中国股市的盈利能力与未来收益率的关系有待进一步探讨。
在这个demo中,ROE(TTM)被用作盈利能力因素的代理变量。数据集始于2004年1月,从CSMAR数据集中收集。警告:请勿将此演示中的数据集用于任何商业目的。
# %% import package
import pandas as pd
import sys, os
sys.path.append(os.path.abspath(".."))
# %% import data
# Monthly return of stocks in China security market
month_return = pd.read_hdf('.\data\month_return.h5', key='month_return')
company_data = pd.read_hdf('.\data\last_filter_pe.h5', key='data')
数据需要一些预处理。
# %% preprocessing data
# forward the monthly return for each stock
# emrwd is the return including dividend
month_return['emrwd'] = month_return.groupby(['Stkcd'])['Mretwd'].shift(-1)
# emrnd is the return including no dividend
month_return['emrnd'] = month_return.groupby(['Stkcd'])['Mretnd'].shift(-1)
# select the A share stock
month_return = month_return[month_return['Markettype'].isin([1, 4, 16])]
# % distinguish the stocks whose size is among the up 30% stocks in each month
def percentile(stocks) :
return stocks >= stocks.quantile(q=.3)
month_return['cap'] = month_return.groupby(['Trdmnt'])['Msmvttl'].apply(percentile)
# %% prepare merge data
from pandas.tseries.offsets import *
month_return['Stkcd_merge'] = month_return['Stkcd'].astype(dtype='string')
month_return['Date_merge'] = pd.to_datetime(month_return['Trdmnt'])
month_return['Yearmonth'] = month_return['Date_merge'].map(lambda x : 1000*x.year + x.month)
#month_return['Date_merge'] += MonthEnd()
# in this demo, the ROE(TTM) are used
# ROE(TTM) = PBV1B/PE(TTM)
company_data['ROE(TTM)'] = company_data['PBV1B']/company_data['PE1TTM']
company_data['Stkcd_merge'] = company_data['Symbol'].dropna().astype(dtype='int').astype(dtype='string')
company_data['Date_merge'] = pd.to_datetime(company_data['TradingDate'])
company_data['Yearmonth'] = company_data['Date_merge'].map(lambda x : 1000*x.year + x.month)
company_data['Date_merge'] += MonthBegin()
# %% dataset starts from '2000-01'
company_data = company_data[company_data['Date_merge'] >= '2000-01']
month_return = month_return[month_return['Date_merge'] >= '2000-01']
return_company = pd.merge(company_data, month_return, on=['Stkcd_merge', 'Date_merge'])
构成了两个数据集。一个包括尾部30%的股票,而另一个不包括尾部30%的股票。附单变量分析和双变量分析。
# %% construct test_data for bivariate analysis
# dataset 1 : no tail stocks & ROE Bivariate
from portfolio_analysis import Bivariate, Univariate
import numpy as np
# select stocks whose size is among the up 30% stocks in each month and whose trading
# days are more than or equal to 10 days
test_data_1 = return_company[(return_company['cap']==True) & (return_company['Ndaytrd']>=10)]
test_data_1 = test_data_1[['emrwd', 'Msmvttl', 'ROE(TTM)', 'Date_merge']].dropna()
test_data_1 = test_data_1[(test_data_1['Date_merge'] >= '2004-01-01') & (test_data_1['Date_merge'] <= '2019-12-01')]
# Univariate analysis
uni_1 = Univariate(np.array(test_data_1[['emrwd', 'ROE(TTM)', 'Date_merge']]), number=9)
uni_1.summary_and_test()
uni_1.print_summary_by_time()
uni_1.print_summary()
====================================================================================================
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Diff |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Average | 0.012 | 0.011 | 0.013 | 0.013 | 0.013 | 0.015 | 0.016 | 0.015 | 0.017 | 0.017 | 0.005 |
| T-Test | 1.465 | 1.372 | 1.648 | 1.826 | 1.884 | 2.048 | 2.354 | 2.3 | 2.548 | 2.494 | 1.473 |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
====================================================================================================
# Bivariate analysis
bi_1 = Bivariate(np.array(test_data_1), number=4)
bi_1.average_by_time()
bi_1.summary_and_test()
bi_1.print_summary_by_time()
bi_1.print_summary()
===============================================================
+-------+--------+--------+--------+--------+--------+-------+
| Group | 1 | 2 | 3 | 4 | 5 | Diff |
+-------+--------+--------+--------+--------+--------+-------+
| 1 | 0.014 | 0.017 | 0.021 | 0.023 | 0.02 | 0.006 |
| | 1.683 | 2.197 | 2.712 | 3.026 | 2.519 | 2.308 |
| 2 | 0.014 | 0.015 | 0.017 | 0.018 | 0.019 | 0.005 |
| | 1.702 | 1.895 | 2.228 | 2.604 | 2.567 | 1.699 |
| 3 | 0.008 | 0.011 | 0.012 | 0.016 | 0.018 | 0.01 |
| | 1.091 | 1.373 | 1.712 | 2.246 | 2.624 | 3.775 |
| 4 | 0.008 | 0.011 | 0.01 | 0.014 | 0.016 | 0.008 |
| | 1.002 | 1.442 | 1.468 | 2.109 | 2.305 | 2.41 |
| 5 | 0.004 | 0.009 | 0.008 | 0.01 | 0.016 | 0.012 |
| | 0.565 | 1.271 | 1.257 | 1.613 | 2.417 | 3.234 |
| Diff | -0.009 | -0.008 | -0.012 | -0.013 | -0.004 | 0.005 |
| | -2.446 | -2.151 | -3.025 | -3.231 | -0.917 | 1.502 |
+-------+--------+--------+--------+--------+--------+-------+
===============================================================
数据集#1的结果很有趣,在单变量分析中,由于t值低于2.3,差异收益并不显著,而在双变量分析中,差异收益在很大程度上是显著的,因为t值大于2.3,这表明盈利能力因素提供了超额收益。
# %% construct test_data for bivariate analysis
# dataset 2 : tail stocks & ROE Bivariate
from portfolio_analysis import Bivariate
import numpy as np
# select stocks whose size is among the up 30% stocks in each month and whose trading
# days are more than or equal to 10 days
test_data_2 = return_company[return_company['Ndaytrd']>=10]
test_data_2 = test_data_2[['emrwd', 'Msmvttl', 'ROE(TTM)', 'Date_merge']].dropna()
test_data_2 = test_data_2[(test_data_2['Date_merge'] >= '2004-01-01') & (test_data_2['Date_merge'] <= '2019-12-01')]
# Univariate analysis
uni_2 = Univariate(np.array(test_data_2[['emrwd', 'ROE(TTM)', 'Date_merge']]), number=9)
uni_2.summary_and_test()
uni_2.print_summary_by_time()
uni_2.print_summary()
====================================================================================================
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Diff |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
| Average | 0.017 | 0.016 | 0.016 | 0.017 | 0.017 | 0.016 | 0.017 | 0.017 | 0.017 | 0.017 | 0.0 |
| T-Test | 2.132 | 2.032 | 2.037 | 2.2 | 2.298 | 2.293 | 2.392 | 2.549 | 2.586 | 2.572 | 0.019 |
+---------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+
====================================================================================================
# analysis
bi_2 = Bivariate(np.array(test_data_2), number=4)
bi_2.average_by_time()
bi_2.summary_and_test()
bi_2.print_summary_by_time()
bi_2.print_summary()
================================================================
+-------+--------+--------+--------+--------+--------+--------+
| Group | 1 | 2 | 3 | 4 | 5 | Diff |
+-------+--------+--------+--------+--------+--------+--------+
| 1 | 0.025 | 0.026 | 0.026 | 0.028 | 0.024 | -0.001 |
| | 3.043 | 3.195 | 3.306 | 3.489 | 2.874 | -0.434 |
| 2 | 0.015 | 0.018 | 0.019 | 0.023 | 0.021 | 0.006 |
| | 1.833 | 2.238 | 2.488 | 3.105 | 2.701 | 2.495 |
| 3 | 0.013 | 0.011 | 0.016 | 0.017 | 0.019 | 0.006 |
| | 1.626 | 1.466 | 2.135 | 2.38 | 2.65 | 2.073 |
| 4 | 0.009 | 0.011 | 0.012 | 0.015 | 0.016 | 0.008 |
| | 1.086 | 1.359 | 1.656 | 2.101 | 2.394 | 2.546 |
| 5 | 0.006 | 0.008 | 0.009 | 0.01 | 0.015 | 0.01 |
| | 0.748 | 1.091 | 1.321 | 1.638 | 2.377 | 2.831 |
| Diff | -0.019 | -0.018 | -0.017 | -0.018 | -0.009 | 0.011 |
| | -4.676 | -4.202 | -3.963 | -3.927 | -1.68 | 2.951 |
+-------+--------+--------+--------+--------+--------+--------+
================================================================
数据集#2的结果很有趣,在单变量分析中,差异收益并不显著,因为t值低于2.3,而在双变量分析中,差异收益在很大程度上是显著的,因为t值大于2.3,这表明盈利能力因素提供了超额收益。