In this project I analyze 15 top performing tech stocks in an attempt to determine an optimal portfolio to recommend to potential investors. The rolling window used tracks 100 days of stock performance and compares a number of statistical factors including: variance, standard deviation, covariance, correlation and more. This notebook can be reran to collect the most recent 100 days of stock data.
Information gathered uses Pandas, and Pandas Data-reader which are Python libraries used to request data from Yahoo Finance's API for me to wrangle and develop models with. All of my code is sitting in a Jupyter Notebook environment so data does not need to be stored or maintained in a database. If in the future a database were to be used for deployment purposes I would most likely move data to a Heroku Dyno and set up a Postgres environment to act as the managed database. In doing so I could potentially create an interactive tool for investors to use online.
Importing all of the required dependencies.
import datetime as dt
import emoji
import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import plotly.graph_objs as go
import plotly.offline as offline_py
from rf import return_portfolios, optimal_portfolio
offline_py.init_notebook_mode(connected=True)
%matplotlib inline
Using Pandas data-reader to pull stock data from Yahoo Finance API, create dates, retrieve data, and view data.
symbols = ["MSFT", "AMZN", "AAPL", "GOOG", "FB",
"CRM", "CSCO", "NVDA", "AMD", "NFLX",
"DOCU", "SQ", "ORCL", "TSLA", "TWTR"]
delta = dt.timedelta(days=365)
end_date = dt.datetime.now()
start_date = dt.datetime.now() - delta
stock_data = web.get_data_yahoo(symbols, start_date, end_date)
Focusing on Adjusted Close for analysis
close = stock_data['Adj Close']
close.head()
Here's our Tech Stock Universe:
close.plot()
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.title('Daily Prices');
In order to demonstrate the use of log returns and resampling, I've used Google's stock (GOOG). However, these transformations are applied to all assets in my stock universe.
color_scheme = {
'index': '#B6B2CF',
'etf': '#2D3ECF',
'tracking_error': '#6F91DE',
'df_header': 'silver',
'df_value': 'white',
'df_line': 'silver',
'heatmap_colorscale': [(0, '#6F91DE'), (0.5, 'grey'), (1, 'red')],
'background_label': '#9dbdd5',
'low_value': '#B6B2CF',
'high_value': '#2D3ECF',
'y_axis_2_text_color': 'grey',
'shadow': 'rgba(0, 0, 0, 0.75)',
'major_line': '#2D3ECF',
'minor_line': '#B6B2CF',
'main_line': 'black'}
def _generate_stock_trace(prices):
return go.Scatter(
name='Index',
x=prices.index,
y=prices,
line={'color': color_scheme['major_line']})
def plot_stock(prices, title):
config = generate_config()
layout = go.Layout(title=title)
stock_trace = _generate_stock_trace(prices)
offline_py.iplot({'data': [stock_trace], 'layout': layout})
goog_ticker = 'GOOG'
plot_stock(close[goog_ticker], f'{goog_ticker} Stock')
Resampling the daily adjusted closing prices into monthly buckets, and selecting the last observation of each month.
def resample_prices(close_prices, freq='M'):
"""
Resample close prices for each ticker and return month end prices.
"""
return close_prices.resample(freq).last()
def _generate_traces(name_df_color_data):
traces = []
for name, df, color in name_df_color_data:
traces.append(go.Scatter(
name=name,
x=df.index,
y=df,
mode='line',
line={'color': color}))
return traces
def plot_resampled_prices(df_resampled, df, title):
config = generate_config()
layout = go.Layout(title=title)
traces = _generate_traces([
('Monthly Close', df_resampled, color_scheme['major_line']),
('Close', df, color_scheme['minor_line'])])
offline_py.iplot({'data': traces, 'layout': layout})
monthly_close = resample_prices(close)
plot_resampled_prices(
monthly_close.loc[:, goog_ticker],
close.loc[:, goog_ticker],
f'{goog_ticker} Stock - Close Vs Monthly Close')
def compute_log_returns(prices):
"""
Compute log returns for each ticker.
"""
return np.log(prices) - np.log(prices.shift(1))
def plot_returns(returns, title):
layout = go.Layout(title=title)
traces = _generate_traces([
('Returns', returns, color_scheme['major_line'])])
offline_py.iplot({'data': traces, 'layout': layout})
monthly_close_returns = compute_log_returns(monthly_close)
plot_returns(
monthly_close_returns.loc[:, goog_ticker],
f'Log Returns of {goog_ticker} Stock (Monthly)')
Log returns are used in order to create probability distributions based on a normal dataset.
monthly_close_returns
def shift_returns(returns, shift_n):
"""
Generate shifted returns
"""
return returns.shift(shift_n)
prev_returns = shift_returns(monthly_close_returns, 1)
Here I have shifted the mean of the monthly log returns forward by one month in order to generate our predictions:
prev_returns
Below is a function that I have written to generate a list of the stocks that have increased the most of the past year -e.g. taken the most long position over the given timeframe.
def get_top_n(prev_returns, top_n):
"""
Select the top performing stocks
"""
res = pd.DataFrame(columns=prev_returns.columns)
for index, row in prev_returns.iterrows():
curr_month = row
curr_top = pd.Series(curr_month).nlargest(top_n)
top = list(curr_top.index.values)
for col in res.columns:
if(col in top):
res.loc[index, col] = True
else:
res.loc[index, col] = False
for index, row in res.iterrows():
res.loc[index] = res.loc[index].astype('int64')
#print(res.head())
return res
def print_top(df, name, top_n=5):
print('{} Most {}:'.format(top_n, name))
print(', '.join(df.sum().sort_values(ascending=False).index[:top_n].values.tolist()))
By simply applying the multiplication of -1, I use the same function to generate the most short stocks over the provided timeframe.
top_bottom_n = 10
df_long = get_top_n(prev_returns, top_bottom_n)
df_short = get_top_n(-1*prev_returns, top_bottom_n)
print_top(df_long, 'Longed Stocks')
print_top(df_short, 'Shorted Stocks')
long_stocks = ["MSFT", "AAPL", "NVDA", "FB", "DOCU"]
stock_data_daily_returns = stock_data['Adj Close'][long_stocks].pct_change()
stock_data_daily_returns.plot()
plt.xlabel("Date")
plt.ylabel("ROR")
plt.title("Daily Simple Rate of Return Over time")
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.);
["MSFT", "AAPL", "NVDA", "FB", "DOCU"]
fig = plt.figure(figsize=(15,15))
ax1 = fig.add_subplot(321)
ax2 = fig.add_subplot(322)
ax3 = fig.add_subplot(323)
ax4 = fig.add_subplot(324)
ax5 = fig.add_subplot(325)
ax1.plot(stock_data['Adj Close']['MSFT'].pct_change())
ax1.set_title("Microsoft")
ax2.plot(stock_data['Adj Close']['AAPL'].pct_change())
ax2.set_title("Apple")
ax3.plot(stock_data['Adj Close']['NVDA'].pct_change())
ax3.set_title("Nvidia")
ax4.plot(stock_data['Adj Close']['FB'].pct_change())
ax4.set_title("Facebook")
ax5.plot(stock_data['Adj Close']['DOCU'].pct_change())
ax5.set_title("Docusign")
plt.tight_layout()
plt.show()
# calculate daily mean
daily_mean = stock_data_daily_returns.mean()
daily_mean
# daily mean index for the x axis
daily_mean.keys()
# grab each daily mean value for the y axis
height = []
for key in daily_mean.keys():
height.append(daily_mean[key])
# arrange keys on x axis based on length
x_pos = np.arange(len(daily_mean.keys()))
# plot bars
plt.bar(x_pos, height)
# create names on the x-axis
plt.xticks(x_pos, daily_mean.keys())
# label chart
plt.xlabel("Tech Stocks")
plt.ylabel("Mean")
plt.title("Daily Mean Rate of Return")
plt.show()
# calculate variance
daily_var = stock_data_daily_returns.var()
daily_var
# variance index for the x axis
daily_var.keys()
# grab each variance value for the y axis
height = []
for key in daily_var.keys():
height.append(daily_var[key])
# plot bars
plt.bar(x_pos, height)
# create names on the x-axis
plt.xticks(x_pos, daily_var.keys())
# label chart
plt.xlabel("Tech Stocks")
plt.ylabel("Variance")
plt.title("Daily Variance")
# show graphic
plt.show()
# calculate standard deviation
daily_std = stock_data_daily_returns.std()
daily_std
# grab each standard deviation value for the y axis
height = []
for key in daily_std.keys():
height.append(daily_std[key])
# plot bars
plt.bar(x_pos, height)
# create names on the x-axis
plt.xticks(x_pos, daily_std.keys())
# label chart
plt.xlabel("Tech Stocks")
plt.ylabel("Std. Dev.")
plt.title("Daily Standard Deviation")
# show graphic
plt.show()
corr = stock_data_daily_returns.corr()
corr
plt.imshow(cov)
plt.colorbar()
plt.xticks(rotation='horizontal')
plt.xticks(range(len(cov)), corr.columns)
plt.yticks(range(len(cov)), corr.columns);
cov = stock_data_daily_returns.cov()
cov
plt.imshow(cov)
plt.colorbar()
plt.xticks(rotation='horizontal')
plt.xticks(range(len(cov)), cov.columns)
plt.yticks(range(len(cov)), cov.columns);
# use the covariance
cov_monthly = monthly_close_returns[long_stocks][1:].cov()
# find the expected return
expected_returns = cov_monthly.mean()
# create a set of random portfolios
random_portfolios = return_portfolios(expected_returns, cov_monthly)
# plot the set of random portfolios
random_portfolios.plot.scatter(x='Volatility', y='Returns', fontsize=12)
# calculate the set of portfolios on the EF
weights, returns, risks = optimal_portfolio(cov_monthly[1:])
Above I have generated a pool of 5000 random portfolios comprised of different combinations of the five longest stocks.
The following line (if uncommented) will generate a CSV filed called "all_five" that contains all risk and return values for all data points associated with our "longest portfolio".
# pd.DataFrame({'Risks': risks, 'Returns': returns}).to_csv('all_five.csv', index=False)
all_five_EF = pd.read_csv('all_five.csv')
# plot the set of portfolios on the EF
plt.plot(risks, returns, 'y-o')
plt.ylabel('Expected Returns',fontsize=14)
plt.xlabel('Volatility (Std. Deviation)',fontsize=14)
plt.title('Efficient Frontier')
single_asset_std=np.sqrt(np.diagonal(cov_monthly))
plt.scatter(single_asset_std,expected_returns,marker='X',color='red',s=200)
plt.plot(all_five_EF['Risks'], all_five_EF['Returns'], 'g-o')
plt.show();
I have analyzed 15 of the most valuable tech stocks and generated a method for determining the optimal mix of goods in our portfolio.
Although I have taken a long-investor approach, the final functions provided can be used to cycle through various mixes of any stocks an individual chooses.
In the end, I have plotted the volatility and expected returns of each of the five long stocks I have chosen, along with the efficient frontier (green) that maximizes portfolio return based on input parameters (monthly expected mean log return alongsided volatility in terms of standard deviation).