-
Notifications
You must be signed in to change notification settings - Fork 8
Expand file tree
/
Copy pathFamaFrench_pandas.py
More file actions
182 lines (153 loc) · 7.34 KB
/
FamaFrench_pandas.py
File metadata and controls
182 lines (153 loc) · 7.34 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
"""
Fama/French Data
@author: wmadavis
"""
## Preliminaries
import pandas as pd
import pandas.io.data as web
## IMPORTING DATA
# Research Data Factors
df_research = web.DataReader('F-F_Research_Data_factors', 'famafrench')[0]
#%%
# Portfolios Formed on Size
url_folios = 'https://www.quandl.com/api/v1/datasets/KFRENCH/P_ME_AFS_M.csv'
df_folios = pd.read_csv(url_folios, index_col = 0).iloc[::-1]
index = df_folios.index
df_folios.index = df_research.index
# Momentum Factor
df_momentum = web.DataReader('F-F_Momentum_Factor', 'famafrench')[1]
# 5-Industry Portfolios
'''
This dataset is organized strangely. Using the DataReader produces a dict
object with keys 4, 6, 8, 10, 12, 14, 16 corresponding to the following:
4: Average Value Weighted Returns (Monthly)
6: Average Equal Weighted Returns (Monthly)
8: Average Value Weighted Returns (Annual)
10: Average Equal Weighted Returns (Annual)
12: Number of Firms in Portfolios (Monthly)
14: Average Firm Size (Monthly)
16: Sum of BE/Sum of ME (Annual)
18: Value-Weighted Average of BE/ME (Annual)
We will import the monthly data separately. We need to relabel the columns
since they are identical between these eight datasets.
'''
ind = web.DataReader('5_Industry_Portfolios', 'famafrench')
# Average Value Weighted Returns
df_ind_val = ind[4]
df_ind_val.columns = ['Cons_val', 'Manuf_val', 'HiTec_val', 'Hlth_val', 'Other_val']
# Average Equal Weighted Returns
df_ind_eq = ind[6]
df_ind_eq.columns = ['Cons_eq', 'Manuf_eq', 'HiTec_eq', 'Hlth_eq', 'Other_eq']
# Number of Firms in Portfolios
df_ind_firms = ind[12]
df_ind_firms.columns = ['Cons_firms', 'Manuf_firms', 'HiTec_firms', 'Hlth_firms', 'Other_firms']
# Average Firm Size
df_ind_size = ind[14]
df_ind_size.columns = ['Cons_size', 'Manuf_size', 'HiTec_size', 'Hlth_size', 'Other_size']
# Merging all ind data
df_ind = df_ind_val.join(df_ind_eq).join(df_ind_firms).join(df_ind_size)
'''
Now we will merge the four DataFrames we have: df_research, df_momentum,
df_folios, and df_ind. These three are all indexed YYYYMM, but we will then
re-index them by the dates provided by the more specific YYYY-MM-DD dates
from the Quandl data.
'''
# MERGING ALL DATA
df_ff = df_research.join(df_folios).join(df_momentum).join(df_ind)
df_ff.index = index # Re-indexing
df_ff = df_ff.replace('-99.99', 'NaN') # Replace missing data with 'NaN'
# Looking at df_ff.columns, we find we can still clean up some column titles
# for ease of navigation.
df_ff = df_ff.rename(columns={"1 b'Mkt-RF'":'Mkt_RF', "2 b'SMB'":'SMB', "3 b'HML'":'HML', "4 b'RF'":'RF', "<= 0":'Negative', "1 b'Mom'":'Momentum'})
'''
Now we have a single DataFrame 'df_ff' of dimensions 1064x44, with the
following as columns corresponding to the datasets
Column # Dataset Column title
-------- ------- ------------
0 FF RESEARCH DATA FACTORS Mkt-RF
1 SMB
2 HML
3 RF
4 PORTFOLIOS FORMED ON SIZE Negative
5 Lo 30
6 Med 40
7 Hi 30
8 Lo 20
9 Qnt 2
10 Qnt 3
11 Qnt 4
12 Hi 20
13 Lo 10
14 Dec 2
15 Dec 3
16 Dec 4
17 Dec 5
18 Dec 6
19 Dec 7
20 Dec 8
21 Dec 9
22 Hi 10
23 MOMENTUM FACTORS Momentum
24 FIVE-INDUSTRY PORTFOLIOS Cons_val
25 Manuf_val
26 HiTec_val
27 Hlth_val
28 Other_val
29 Cons_eq
30 Manuf_eq
31 HiTec_eq
32 Hlth_eq
33 Other_eq
34 Cons_firms
35 Manuf_firms
36 HiTec_firms
37 Hlth_firms
38 Other_firms
39 Cons_size
40 Manuf_size
41 HiTec_size
42 Hlth_size
43 Other_size
For the Portfolios Formed On Size,
Qnt refers to quintiles
Dec refers to deciles
For the Five-Industry Portfolios,
Cons refers to Consumer
Manuf refers to Manufacturing
HiTec refers to High Technology
Hlth refers to Healthcare
Other refers to all other industries
_val refers to Average Value Weighted Returns
_eq refers to Average Equal Weighted Returns
_firms refers to the Number of Firms in Portfolios
_size refers to the Average Firm Size
For example, we may call
df_ff['Hi 10'] or df_ff.HiTec_firms
Additional Notes provided by the authors:
On the Portfolios Formed on Size dataset:
"It contains value- and equal-weighted returns for size portfolios.
Each record contains returns for:
Negative (not used)
30%
40%
30%
5 Quintiles
10 Deciles
The portfolios are constructed at the end of Jun.
On the Momentum Factor dataset:
"It contains a momentum factor, constructed from six value-weight
portfolios formed using independent sorts on size and prior return of
NYSE, AMEX, and NASDAQ stocks.
Momentum is the average of the returns on two (big and small) high
prior return portfolios minus the average of the returns on two low
prior return portfolios.
The portfolios are constructed monthly. Big means a firm is above the
median market cap on the NYSE at the end of the previous month; small
firms are below the median NYSE market cap. Prior return is measured
from month -12 to - 2. Firms in the low prior return portfolio are
below the 30th NYSE percentile. Those in the high portfolio are above
the 70th NYSE percentile.
On the Five-Industry Portfolio dataset:
It contains value- and equal-weighted returns for 5 industry
portfolios. The portfolios are constructed at the end of June.
'''