Equity Analysis

Equity analysis was a take-home case I was given as part of an interview process for a fin-tech company I interviewed for.
Excel file is a time-series data for public companies with a few relevant attributes. Also, some not relevant attributes were added to test a candidate’s ability to recognize essential and functional data for the task from irrelevant data.

Detailed commentary and steps are in iPython notebook comments, please see file below.

For download:
Raw Excel file
iPython notebook on GitHub
Tableau visualization workbook

Data exploration using iPython notebook

In [1]:
import pandas as pd
In [2]:
df = pd.read_csv('equity_data.csv')
In [3]:
df.head()
Out[3]:
Company Name Date Ticker Returns Capitalization Mkt Cap FCF Yield Momentum Sales Growth 1Y Sales Growth 5Y EPS Revision
0 AAC Holdings, Inc. 12/31/2014 AAC -17.46 1 651.82 NaN NaN 75.27 75.27 1.15
1 AAC Holdings, Inc. 1/30/2015 AAC 41.03 1 537.99 NaN NaN 75.27 75.27 1.01
2 AAC Holdings, Inc. 2/27/2015 AAC -15.03 1 758.71 -1.18 NaN 75.27 75.27 1.09
3 AAC Holdings, Inc. 3/31/2015 AAC 13.34 1 666.06 -1.18 NaN 14.88 41.72 1.07
4 AAC Holdings, Inc. 4/30/2015 AAC 12.12 1 756.97 -1.18 NaN 14.88 41.72 1.07
In [4]:
df.describe()
Out[4]:
Returns Capitalization
count 377684.000000 379119.0
mean 0.771014 1.0
std 13.905732 0.0
min -98.980000 1.0
25% -5.450000 1.0
50% 0.670000 1.0
75% 6.630000 1.0
max 890.390000 1.0
In [5]:
# Interestingly enough, only Returns and Capitalization came back. Let's check types of other columns
In [6]:
df.dtypes
Out[6]:
Company Name        object
Date                object
Ticker              object
Returns            float64
Capitalization       int64
Mkt Cap             object
FCF Yield           object
Momentum            object
Sales Growth 1Y     object
Sales Growth 5Y     object
EPS Revision        object
dtype: object
In [7]:
# As suspected, columns like Mkt Cap, Free Cash Flow etc are objects not numbers, let'a change that.
In [8]:
# First, let's rename column names with SPACE "_"
df.rename(columns = {'Company Name'  : 'Company_Name',
                    'Mkt Cap'        : 'Mkt_Cap',
                    'FCF Yield'      : 'FCF_Yield',
                    'Sales Growth 1Y': 'Sales_Growth_1Y',
                    'Sales Growth 5Y': 'Sales_Growth_5Y',
                    'EPS Revision'   : 'EPS_Revision'},
           inplace = True) 
In [9]:
# Quality check
df.dtypes
Out[9]:
Company_Name        object
Date                object
Ticker              object
Returns            float64
Capitalization       int64
Mkt_Cap             object
FCF_Yield           object
Momentum            object
Sales_Growth_1Y     object
Sales_Growth_5Y     object
EPS_Revision        object
dtype: object
In [10]:
# We can't convert NaN to number, so let's change NaN to zero because that will not affect out analysis.
df.isna().sum()
Out[10]:
Company_Name           0
Date                   0
Ticker                 0
Returns             1435
Capitalization         0
Mkt_Cap               57
FCF_Yield          15973
Momentum           12740
Sales_Growth_1Y     7960
Sales_Growth_5Y     5851
EPS_Revision       21802
dtype: int64
In [11]:
# change NaN to zero values
df.loc[df['Returns'].isnull(), 'Returns'] = '0'
df.loc[df['Mkt_Cap'].isnull(), 'Mkt_Cap'] = '0'
df.loc[df['FCF_Yield'].isnull(), 'FCF_Yield'] = '0'
df.loc[df['Momentum'].isnull(), 'Momentum'] = '0'
df.loc[df['Sales_Growth_1Y'].isnull(), 'Sales_Growth_1Y'] = '0'
df.loc[df['Sales_Growth_5Y'].isnull(), 'Sales_Growth_5Y'] = '0'
df.loc[df['EPS_Revision'].isnull(), 'EPS_Revision'] = '0'
In [12]:
# Quality check
df.isna().sum()
Out[12]:
Company_Name       0
Date               0
Ticker             0
Returns            0
Capitalization     0
Mkt_Cap            0
FCF_Yield          0
Momentum           0
Sales_Growth_1Y    0
Sales_Growth_5Y    0
EPS_Revision       0
dtype: int64
In [13]:
df.dtypes
Out[13]:
Company_Name       object
Date               object
Ticker             object
Returns            object
Capitalization      int64
Mkt_Cap            object
FCF_Yield          object
Momentum           object
Sales_Growth_1Y    object
Sales_Growth_5Y    object
EPS_Revision       object
dtype: object
In [14]:
# Converting object to number/float
df['Returns'] = df.Returns.astype(float)
In [15]:
# Returns column is now float64
df.dtypes
Out[15]:
Company_Name        object
Date                object
Ticker              object
Returns            float64
Capitalization       int64
Mkt_Cap             object
FCF_Yield           object
Momentum            object
Sales_Growth_1Y     object
Sales_Growth_5Y     object
EPS_Revision        object
dtype: object
In [16]:
# Next let's do Mkt_Cap
df['Mkt_Cap'] = df.Mkt_Cap.astype(float)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-249e607076f8> in <module>
      1 # Next let's do Mkt_Cap
----> 2 df['Mkt_Cap'] = df.Mkt_Cap.astype(float)

/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in astype(self, dtype, copy, errors, **kwargs)
   5689             # else, only a single dtype is given
   5690             new_data = self._data.astype(dtype=dtype, copy=copy, errors=errors,
-> 5691                                          **kwargs)
   5692             return self._constructor(new_data).__finalize__(self)
   5693 

/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in astype(self, dtype, **kwargs)
    529 
    530     def astype(self, dtype, **kwargs):
--> 531         return self.apply('astype', dtype=dtype, **kwargs)
    532 
    533     def convert(self, **kwargs):

/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py in apply(self, f, axes, filter, do_integrity_check, consolidate, **kwargs)
    393                                             copy=align_copy)
    394 
--> 395             applied = getattr(b, f)(**kwargs)
    396             result_blocks = _extend_blocks(applied, result_blocks)
    397 

/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py in astype(self, dtype, copy, errors, values, **kwargs)
    532     def astype(self, dtype, copy=False, errors='raise', values=None, **kwargs):
    533         return self._astype(dtype, copy=copy, errors=errors, values=values,
--> 534                             **kwargs)
    535 
    536     def _astype(self, dtype, copy=False, errors='raise', values=None,

/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py in _astype(self, dtype, copy, errors, values, **kwargs)
    631 
    632                     # _astype_nansafe works fine with 1-d only
--> 633                     values = astype_nansafe(values.ravel(), dtype, copy=True)
    634 
    635                 # TODO(extension)

/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py in astype_nansafe(arr, dtype, copy, skipna)
    700     if copy or is_object_dtype(arr) or is_object_dtype(dtype):
    701         # Explicit copy, or required since NumPy can't view from / to object.
--> 702         return arr.astype(dtype, copy=True)
    703 
    704     return arr.view(dtype)

ValueError: could not convert string to float: '1,130.74'
In [17]:
# There is a string error. We can't convert A to a number so, let's see what the problem is
In [18]:
# After inspecting '1,130.74' it seems that the comma is problematic, so let's strip it out
In [19]:
# Converting to float but also stipping the comma so we get a string number that we can convert
df['Mkt_Cap'] = df.Mkt_Cap.str.replace(',', '').astype(float)
In [20]:
# No error is a good sign
In [21]:
# Quality check proves we have sucessfully converted Mkt_Cap object/tring to float/number
df.dtypes
Out[21]:
Company_Name        object
Date                object
Ticker              object
Returns            float64
Capitalization       int64
Mkt_Cap            float64
FCF_Yield           object
Momentum            object
Sales_Growth_1Y     object
Sales_Growth_5Y     object
EPS_Revision        object
dtype: object
In [22]:
df['FCF_Yield'] = df.FCF_Yield.str.replace(',', '').astype(float)
In [23]:
df['Momentum'] = df.Momentum.str.replace(',', '').astype(float)
In [24]:
df['Sales_Growth_1Y'] = df.Sales_Growth_1Y.str.replace(',', '').astype(float)
In [25]:
df['Sales_Growth_5Y'] = df.Sales_Growth_5Y.str.replace(',', '').astype(float)
In [26]:
df['EPS_Revision'] = df.EPS_Revision.str.replace(',', '').astype(float)
In [27]:
# Quality check = great
df.dtypes
Out[27]:
Company_Name        object
Date                object
Ticker              object
Returns            float64
Capitalization       int64
Mkt_Cap            float64
FCF_Yield          float64
Momentum           float64
Sales_Growth_1Y    float64
Sales_Growth_5Y    float64
EPS_Revision       float64
dtype: object
In [28]:
# Now we can perform some statistical checks, let's try it with describe function
df.describe()
Out[28]:
Returns Capitalization Mkt_Cap FCF_Yield Momentum Sales_Growth_1Y Sales_Growth_5Y EPS_Revision
count 379119.000000 379119.0 379119.000000 379119.000000 379119.000000 3.791190e+05 3.791190e+05 3.791190e+05
mean 0.768096 1.0 6231.498876 1.435006 11.024613 2.542597e+03 4.103473e+03 -2.240330e+02
std 13.879471 0.0 23489.381721 75.938681 67.903766 4.414335e+05 4.899940e+05 5.258837e+05
min -98.980000 1.0 0.000000 -10296.930000 -99.640000 -1.812420e+03 -1.514150e+03 -9.663676e+07
25% -5.420000 1.0 384.810000 -0.040000 -15.710000 -1.350000e+00 2.270000e+00 9.400000e-01
50% 0.630000 1.0 1087.080000 4.020000 4.600000 6.700000e+00 9.070000e+00 1.000000e+00
75% 6.600000 1.0 3498.320000 7.840000 28.130000 1.925000e+01 2.097000e+01 1.010000e+00
max 890.390000 1.0 796466.150000 2769.110000 9580.560000 7.846261e+07 7.846261e+07 1.127429e+08
In [36]:
# We can see that we're finally operating with numbers, so math and stats operations are now possible
In [37]:
# Convert string to DateTime
df['Date'] = pd.to_datetime(df.Date)
In [38]:
# Counting number of records per year
pd.to_datetime(df['Date']).dt.year.value_counts()
Out[38]:
2015    36155
2014    36112
2016    35758
2013    35560
2010    35511
2012    35450
2011    35287
2009    35257
2008    35207
2007    35137
2017    20726
2006     2959
Name: Date, dtype: int64
In [39]:
# We can clearly see that year 2006 is incomplete with records
# Also 2017 looks the same, probably the data was gathered mid-2017
# 2006 should be excluded from analysis because it will give false information/trends/
# 2017 could be used as a trend and compared to previous years but for simplicity I will exclude it from analysis
In [40]:
# For curiosity I will check if all companies have the same # of records/data
df.groupby(df.Ticker.str.strip("'"))['Date'].nunique()
Out[40]:
Ticker
@NA          2
A          128
AA         128
AABA       126
AAC         32
AACC        66
AAI.XX1     53
AAL         44
AAMC        24
AAMRQ       61
AAN         80
AAN.XX1     48
AAOI        44
AAON       128
AAP        128
AAPL       128
AAT         76
AATI        61
AAWW       128
AAXN       128
ABAT        36
ABAX       128
ABBC        58
ABBV        55
ABC        128
ABCB       128
ABCD        53
ABCO       128
ABCW        17
ABCWQ       30
          ... 
ZBRA       128
ZEN         38
ZEP         91
ZEUS       128
ZFGN        32
ZG          70
ZGEN        46
ZGNX        80
ZIGO        90
ZINCQ      100
ZION       128
ZIOP        86
ZIP         21
ZIXI        98
ZLC         64
ZLTQ        64
ZN          24
ZNGA        64
ZNT         41
ZOES        38
ZOLL        64
ZOLT        87
ZQKSQ      105
ZRAN        57
ZSPH        14
ZTS         52
ZU          22
ZUMZ       128
ZYNE        10
ZZ          75
Name: Date, Length: 5495, dtype: int64
In [41]:
# An interesting finding, there are differences in a number of records per company, could be for many reasons
# we can use this knowledge for refinement in deeper analysis, for the time being let's just be aware of this
In [42]:
# Quick check on correlation across the data
df.corr(method ='pearson') 
Out[42]:
Returns Capitalization Mkt_Cap FCF_Yield Momentum Sales_Growth_1Y Sales_Growth_5Y EPS_Revision
Returns 1.000000 NaN -0.001339 -0.013090 -0.001583 0.001224 0.001000 0.002010
Capitalization NaN NaN NaN NaN NaN NaN NaN NaN
Mkt_Cap -0.001339 NaN 1.000000 0.013302 0.016542 -0.001458 -0.002078 -0.000203
FCF_Yield -0.013090 NaN 0.013302 1.000000 0.024217 -0.005492 -0.005630 -0.000007
Momentum -0.001583 NaN 0.016542 0.024217 1.000000 0.000167 0.000215 0.000746
Sales_Growth_1Y 0.001224 NaN -0.001458 -0.005492 0.000167 1.000000 0.900873 0.000002
Sales_Growth_5Y 0.001000 NaN -0.002078 -0.005630 0.000215 0.900873 1.000000 0.000004
EPS_Revision 0.002010 NaN -0.000203 -0.000007 0.000746 0.000002 0.000004 1.000000
In [43]:
# Capitalization has a value 1, so we can disregard it, but I will not drop it for file-originality reasons.
In [44]:
# I am happy how everything looks now and let's save this work to csv file.
df.to_csv('Cleaned_equity_data.csv')
In [45]:
# I can visualize in Matplotlib or Seaborn, but our file is fairly small so I will use Tableau
# END.

Visualization using Tableau

Source file: Tableau workbook

Equity analysis by Milan Tomin
Equity analysis by Milan Tomin
Equity analysis by Milan Tomin
Equity analysis by Milan Tomin