Building an Equal weight portfolio allocation strategy with Python

Introduction

This article focuses on building an equal-weight portfolio allocation strategy with Python. If you had $10,000 that you’d like to invest in the fifty top-performing companies in the S&P 500 index, how would you allocate capital across these stocks? In this article, you will learn how to extract value from the top S&P 500 companies by tabulating the Ticker, current trading price, 1-year % return, and calculating the number of shares to buy on the top 50 performing stocks.

Creating a new Jupyter Notebook.

Google Colab is a cloud-based Jupyter notebook that allows you to write and execute python code on the web.

Importing relevant Tickers and libraries.

Follow this link to download the S&P500 ticker symbols. This will make it easy for you to extract the data associated with each ticker.

Installing Yahoo Finance and importing libraries.

Use the following codes to import Yahoo-Finance data.

<span>!</span><span>pip</span> <span>install</span> <span>"</span><span>yfinance</span><span>"</span><span>.</span>
<span>!</span><span>pip</span> <span>install</span> <span>"</span><span>yfinance</span><span>"</span><span>.</span>
!pip install "yfinance".

Enter fullscreen mode Exit fullscreen mode

When you are done, import the following libraries.

<span>import</span> <span>pandas</span> <span>as</span> <span>pd</span>
<span>import</span> <span>numpy</span> <span>as</span> <span>np</span>
<span>import</span> <span>os</span>
<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span><span>,</span> <span>timedelta</span>
<span>import</span> <span>pandas</span> <span>as</span> <span>pd</span>

<span>import</span> <span>numpy</span> <span>as</span> <span>np</span>

<span>import</span> <span>os</span>

<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span><span>,</span> <span>timedelta</span>
import pandas as pd import numpy as np import os from datetime import datetime, timedelta

Enter fullscreen mode Exit fullscreen mode

Pandas– present your data as dataframes and series, allowing you to clean, manipulate, and analyze data with in-built functionalities. Numpy is a library used for working with arrays and general mathematical functions. Importing os helps you manipulate file paths that will used further in the project. Finally, the Datetime class helps you work with dates and times and helps us manipulate dates and times in general. Timedelta, as the name implies, is used to find a duration within a time period, beginning and end.

Extracting the 1 year, 6 month, 3 month, and Monthly return on each stock in the s&p500 index.

These periods will help you extract the returns of each stock within each time frame. Comments are added to each chunk of code to explain what is happening.

<span>def</span> <span>get_first_last_trading_days</span><span>(</span><span>stocks_file</span><span>,</span> <span>years</span><span>):</span>
<span># Initialize an empty dictionary to store data </span> <span>data</span> <span>=</span> <span>{}</span>
<span># Read the stock tickers from the CSV file. </span> <span>stocks</span> <span>=</span> <span>pd</span><span>.</span><span>read_csv</span><span>(</span><span>stocks_file</span><span>)[</span><span>'</span><span>Ticker</span><span>'</span><span>].</span><span>tolist</span><span>()</span>
<span>if</span> <span>not</span> <span>os</span><span>.</span><span>path</span><span>.</span><span>exists</span><span>(</span><span>'</span><span>stockss_dfs</span><span>'</span><span>):</span>
<span>os</span><span>.</span><span>makedirs</span><span>(</span><span>'</span><span>stockss_dfs</span><span>'</span><span>)</span>
<span>def</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>startdate</span><span>,</span> <span>enddate</span><span>,</span> <span>freq</span><span>):</span>
<span># Offset is defined based on the time frequency </span> <span># Define offset based on time frequency </span> <span>if</span> <span>freq</span> <span>==</span> <span>'</span><span>Y</span><span>'</span><span>:</span>
<span>offset</span> <span>=</span> <span>'</span><span>366 days</span><span>'</span>
<span>elif</span> <span>freq</span> <span>==</span> <span>'</span><span>M</span><span>'</span><span>:</span>
<span>offset</span> <span>=</span> <span>'</span><span>31 days</span><span>'</span>
<span>elif</span> <span>freq</span> <span>==</span> <span>'</span><span>3M</span><span>'</span><span>:</span>
<span>offset</span> <span>=</span> <span>'</span><span>93 days</span><span>'</span>
<span>elif</span> <span>freq</span> <span>==</span> <span>'</span><span>6M</span><span>'</span><span>:</span>
<span>offset</span> <span>=</span> <span>'</span><span>183 days</span><span>'</span>
<span>else</span><span>:</span>
<span>raise</span> <span>ValueError</span><span>(</span><span>"</span><span>Frequency not supported. Use </span><span>'</span><span>Y</span><span>'</span><span>, </span><span>'</span><span>M</span><span>'</span><span>, </span><span>'</span><span>3M</span><span>'</span><span>, or </span><span>'</span><span>6M</span><span>'</span><span>.</span><span>"</span><span>)</span>
<span># Filter the dataframe and calculate the % change ratio, that ranks returns </span> <span>dff</span> <span>=</span> <span>df</span><span>.</span><span>loc</span><span>[(</span><span>df</span><span>.</span><span>index</span> <span>>=</span> <span>pd</span><span>.</span><span>Timestamp</span><span>(</span><span>startdate</span><span>)</span> <span>-</span> <span>pd</span><span>.</span><span>Timedelta</span><span>(</span><span>offset</span><span>))</span> <span>&</span> <span>(</span><span>df</span><span>.</span><span>index</span> <span><=</span> <span>pd</span><span>.</span><span>Timestamp</span><span>(</span><span>enddate</span><span>))]</span>
<span>dfy</span> <span>=</span> <span>dff</span><span>.</span><span>groupby</span><span>(</span><span>pd</span><span>.</span><span>Grouper</span><span>(</span><span>level</span><span>=</span><span>'</span><span>Date</span><span>'</span><span>,</span> <span>freq</span><span>=</span><span>freq</span><span>)).</span><span>tail</span><span>(</span><span>1</span><span>)</span>
<span>ratio</span> <span>=</span> <span>(</span><span>dfy</span><span>[</span><span>'</span><span>Close</span><span>'</span><span>]</span> <span>/</span> <span>dfy</span><span>[</span><span>'</span><span>Close</span><span>'</span><span>].</span><span>shift</span><span>()</span> <span>-</span> <span>1</span><span>)</span> <span>*</span> <span>100</span>
<span>return</span> <span>ratio</span>
<span># For sake of scalability, we avoid hardcoding years and try to insert the specified year as a parameter. </span> <span>for</span> <span>year</span> <span>in</span> <span>years</span><span>:</span>
<span># start and end dates for the year </span> <span>start_date</span> <span>=</span> <span>f</span><span>"</span><span>{</span><span>year</span><span>}</span><span>-01-01</span><span>"</span>
<span>end_date</span> <span>=</span> <span>f</span><span>"</span><span>{</span><span>year</span><span>}</span><span>-12-31</span><span>"</span>
<span># Loop through each stock ticker </span> <span>for</span> <span>stock</span> <span>in</span> <span>stocks</span><span>:</span>
<span># Download the data for each s&p stock and create a file for each stock if it's not already available. </span> <span>file_path</span> <span>=</span> <span>f</span><span>'</span><span>stockss_dfs/</span><span>{</span><span>stock</span><span>}</span><span>_</span><span>{</span><span>year</span><span>}</span><span>.csv</span><span>'</span>
<span>if</span> <span>not</span> <span>os</span><span>.</span><span>path</span><span>.</span><span>exists</span><span>(</span><span>file_path</span><span>):</span>
<span>try</span><span>:</span>
<span>df</span> <span>=</span> <span>yf</span><span>.</span><span>download</span><span>(</span><span>stock</span><span>,</span> <span>start</span><span>=</span><span>start_date</span><span>,</span> <span>end</span><span>=</span><span>end_date</span><span>)</span>
<span>df</span><span>.</span><span>index</span> <span>=</span> <span>pd</span><span>.</span><span>to_datetime</span><span>(</span><span>df</span><span>.</span><span>index</span><span>)</span>
<span>df</span><span>.</span><span>index</span> <span>=</span> <span>df</span><span>.</span><span>index</span><span>.</span><span>tz_localize</span><span>(</span><span>None</span><span>)</span>
<span>if</span> <span>not</span> <span>df</span><span>.</span><span>empty</span><span>:</span>
<span>period_rating</span> <span>=</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>start_date</span><span>,</span> <span>end_date</span><span>,</span> <span>freq</span><span>=</span><span>'</span><span>Y</span><span>'</span><span>)</span>
<span>period_rating_monthly</span> <span>=</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>start_date</span><span>,</span> <span>end_date</span><span>,</span> <span>freq</span><span>=</span><span>'</span><span>M</span><span>'</span><span>)</span>
<span>period_rating_3months</span> <span>=</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>start_date</span><span>,</span> <span>end_date</span><span>,</span> <span>freq</span><span>=</span><span>'</span><span>3M</span><span>'</span><span>)</span>
<span>period_rating_6months</span> <span>=</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>start_date</span><span>,</span> <span>end_date</span><span>,</span> <span>freq</span><span>=</span><span>'</span><span>6M</span><span>'</span><span>)</span>
<span># Store the ratings in the data dictionary </span> <span>data</span><span>[</span><span>stock</span><span>]</span> <span>=</span> <span>{</span>
<span>'</span><span>Yearly</span><span>'</span><span>:</span> <span>period_rating</span><span>,</span>
<span>'</span><span>Monthly</span><span>'</span><span>:</span> <span>period_rating_monthly</span><span>,</span>
<span>'</span><span>3 Months</span><span>'</span><span>:</span> <span>period_rating_3months</span><span>,</span>
<span>'</span><span>6 Months</span><span>'</span><span>:</span> <span>period_rating_6months</span>
<span>}</span>
<span># Save the results to CSV </span> <span>df_results</span> <span>=</span> <span>pd</span><span>.</span><span>DataFrame</span><span>({</span>
<span>'</span><span>Yearly</span><span>'</span><span>:</span> <span>period_rating</span><span>,</span>
<span>'</span><span>Monthly</span><span>'</span><span>:</span> <span>period_rating_monthly</span><span>,</span>
<span>'</span><span>3 Months</span><span>'</span><span>:</span> <span>period_rating_3months</span><span>,</span>
<span>'</span><span>6 Months</span><span>'</span><span>:</span> <span>period_rating_6months</span>
<span>})</span>
<span>df_results</span><span>.</span><span>to_csv</span><span>(</span><span>file_path</span><span>,</span> <span>index</span><span>=</span><span>True</span><span>)</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>print</span><span>(</span><span>f</span><span>"</span><span>Error processing </span><span>{</span><span>stock</span><span>}</span><span> for year </span><span>{</span><span>year</span><span>}</span><span>: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>
<span>continue</span>
<span>return</span> <span>data</span>
<span># Get the current year and the previous year </span><span>current_year</span> <span>=</span> <span>datetime</span><span>.</span><span>now</span><span>().</span><span>year</span>
<span>years</span> <span>=</span> <span>[</span><span>current_year</span> <span>-</span> <span>i</span> <span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>1</span><span>,</span> <span>2</span><span>)]</span>
<span># Retrieve the data </span><span>stocks_file</span> <span>=</span> <span>'</span><span>/content/sp_500_stocks.csv</span><span>'</span>
<span>data</span> <span>=</span> <span>get_first_last_trading_days</span><span>(</span><span>stocks_file</span><span>,</span> <span>years</span><span>)</span>
<span>def</span> <span>get_first_last_trading_days</span><span>(</span><span>stocks_file</span><span>,</span> <span>years</span><span>):</span>
    <span># Initialize an empty dictionary to store data </span>    <span>data</span> <span>=</span> <span>{}</span>

    <span># Read the stock tickers from the CSV file. </span>    <span>stocks</span> <span>=</span> <span>pd</span><span>.</span><span>read_csv</span><span>(</span><span>stocks_file</span><span>)[</span><span>'</span><span>Ticker</span><span>'</span><span>].</span><span>tolist</span><span>()</span>

    <span>if</span> <span>not</span> <span>os</span><span>.</span><span>path</span><span>.</span><span>exists</span><span>(</span><span>'</span><span>stockss_dfs</span><span>'</span><span>):</span>
        <span>os</span><span>.</span><span>makedirs</span><span>(</span><span>'</span><span>stockss_dfs</span><span>'</span><span>)</span>

    <span>def</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>startdate</span><span>,</span> <span>enddate</span><span>,</span> <span>freq</span><span>):</span>
        <span># Offset is defined based on the time frequency </span>        <span># Define offset based on time frequency </span>        <span>if</span> <span>freq</span> <span>==</span> <span>'</span><span>Y</span><span>'</span><span>:</span>
            <span>offset</span> <span>=</span> <span>'</span><span>366 days</span><span>'</span>
        <span>elif</span> <span>freq</span> <span>==</span> <span>'</span><span>M</span><span>'</span><span>:</span>
            <span>offset</span> <span>=</span> <span>'</span><span>31 days</span><span>'</span>
        <span>elif</span> <span>freq</span> <span>==</span> <span>'</span><span>3M</span><span>'</span><span>:</span>
            <span>offset</span> <span>=</span> <span>'</span><span>93 days</span><span>'</span>
        <span>elif</span> <span>freq</span> <span>==</span> <span>'</span><span>6M</span><span>'</span><span>:</span>
            <span>offset</span> <span>=</span> <span>'</span><span>183 days</span><span>'</span>
        <span>else</span><span>:</span>
            <span>raise</span> <span>ValueError</span><span>(</span><span>"</span><span>Frequency not supported. Use </span><span>'</span><span>Y</span><span>'</span><span>, </span><span>'</span><span>M</span><span>'</span><span>, </span><span>'</span><span>3M</span><span>'</span><span>, or </span><span>'</span><span>6M</span><span>'</span><span>.</span><span>"</span><span>)</span>

        <span># Filter the dataframe and calculate the % change ratio, that ranks returns </span>        <span>dff</span> <span>=</span> <span>df</span><span>.</span><span>loc</span><span>[(</span><span>df</span><span>.</span><span>index</span> <span>>=</span> <span>pd</span><span>.</span><span>Timestamp</span><span>(</span><span>startdate</span><span>)</span> <span>-</span> <span>pd</span><span>.</span><span>Timedelta</span><span>(</span><span>offset</span><span>))</span> <span>&</span> <span>(</span><span>df</span><span>.</span><span>index</span> <span><=</span> <span>pd</span><span>.</span><span>Timestamp</span><span>(</span><span>enddate</span><span>))]</span>
        <span>dfy</span> <span>=</span> <span>dff</span><span>.</span><span>groupby</span><span>(</span><span>pd</span><span>.</span><span>Grouper</span><span>(</span><span>level</span><span>=</span><span>'</span><span>Date</span><span>'</span><span>,</span> <span>freq</span><span>=</span><span>freq</span><span>)).</span><span>tail</span><span>(</span><span>1</span><span>)</span>
        <span>ratio</span> <span>=</span> <span>(</span><span>dfy</span><span>[</span><span>'</span><span>Close</span><span>'</span><span>]</span> <span>/</span> <span>dfy</span><span>[</span><span>'</span><span>Close</span><span>'</span><span>].</span><span>shift</span><span>()</span> <span>-</span> <span>1</span><span>)</span> <span>*</span> <span>100</span>

        <span>return</span> <span>ratio</span>

    <span># For sake of scalability, we avoid hardcoding years and try to insert the specified year as a parameter. </span>    <span>for</span> <span>year</span> <span>in</span> <span>years</span><span>:</span>
        <span># start and end dates for the year </span>        <span>start_date</span> <span>=</span> <span>f</span><span>"</span><span>{</span><span>year</span><span>}</span><span>-01-01</span><span>"</span>
        <span>end_date</span> <span>=</span> <span>f</span><span>"</span><span>{</span><span>year</span><span>}</span><span>-12-31</span><span>"</span>

        <span># Loop through each stock ticker </span>        <span>for</span> <span>stock</span> <span>in</span> <span>stocks</span><span>:</span>
            <span># Download the data for each s&p stock and create a file for each stock if it's not already available. </span>            <span>file_path</span> <span>=</span> <span>f</span><span>'</span><span>stockss_dfs/</span><span>{</span><span>stock</span><span>}</span><span>_</span><span>{</span><span>year</span><span>}</span><span>.csv</span><span>'</span>

            <span>if</span> <span>not</span> <span>os</span><span>.</span><span>path</span><span>.</span><span>exists</span><span>(</span><span>file_path</span><span>):</span>
                <span>try</span><span>:</span>
                    <span>df</span> <span>=</span> <span>yf</span><span>.</span><span>download</span><span>(</span><span>stock</span><span>,</span> <span>start</span><span>=</span><span>start_date</span><span>,</span> <span>end</span><span>=</span><span>end_date</span><span>)</span>
                    <span>df</span><span>.</span><span>index</span> <span>=</span> <span>pd</span><span>.</span><span>to_datetime</span><span>(</span><span>df</span><span>.</span><span>index</span><span>)</span>
                    <span>df</span><span>.</span><span>index</span> <span>=</span> <span>df</span><span>.</span><span>index</span><span>.</span><span>tz_localize</span><span>(</span><span>None</span><span>)</span>

                    <span>if</span> <span>not</span> <span>df</span><span>.</span><span>empty</span><span>:</span>
                        <span>period_rating</span> <span>=</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>start_date</span><span>,</span> <span>end_date</span><span>,</span> <span>freq</span><span>=</span><span>'</span><span>Y</span><span>'</span><span>)</span>
                        <span>period_rating_monthly</span> <span>=</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>start_date</span><span>,</span> <span>end_date</span><span>,</span> <span>freq</span><span>=</span><span>'</span><span>M</span><span>'</span><span>)</span>
                        <span>period_rating_3months</span> <span>=</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>start_date</span><span>,</span> <span>end_date</span><span>,</span> <span>freq</span><span>=</span><span>'</span><span>3M</span><span>'</span><span>)</span>
                        <span>period_rating_6months</span> <span>=</span> <span>rating</span><span>(</span><span>df</span><span>,</span> <span>start_date</span><span>,</span> <span>end_date</span><span>,</span> <span>freq</span><span>=</span><span>'</span><span>6M</span><span>'</span><span>)</span>

                        <span># Store the ratings in the data dictionary </span>                        <span>data</span><span>[</span><span>stock</span><span>]</span> <span>=</span> <span>{</span>
                            <span>'</span><span>Yearly</span><span>'</span><span>:</span> <span>period_rating</span><span>,</span>
                            <span>'</span><span>Monthly</span><span>'</span><span>:</span> <span>period_rating_monthly</span><span>,</span>
                            <span>'</span><span>3 Months</span><span>'</span><span>:</span> <span>period_rating_3months</span><span>,</span>
                            <span>'</span><span>6 Months</span><span>'</span><span>:</span> <span>period_rating_6months</span>
                        <span>}</span>

                        <span># Save the results to CSV </span>                        <span>df_results</span> <span>=</span> <span>pd</span><span>.</span><span>DataFrame</span><span>({</span>
                            <span>'</span><span>Yearly</span><span>'</span><span>:</span> <span>period_rating</span><span>,</span>
                            <span>'</span><span>Monthly</span><span>'</span><span>:</span> <span>period_rating_monthly</span><span>,</span>
                            <span>'</span><span>3 Months</span><span>'</span><span>:</span> <span>period_rating_3months</span><span>,</span>
                            <span>'</span><span>6 Months</span><span>'</span><span>:</span> <span>period_rating_6months</span>
                        <span>})</span>

                        <span>df_results</span><span>.</span><span>to_csv</span><span>(</span><span>file_path</span><span>,</span> <span>index</span><span>=</span><span>True</span><span>)</span>

                <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
                    <span>print</span><span>(</span><span>f</span><span>"</span><span>Error processing </span><span>{</span><span>stock</span><span>}</span><span> for year </span><span>{</span><span>year</span><span>}</span><span>: </span><span>{</span><span>e</span><span>}</span><span>"</span><span>)</span>
                    <span>continue</span>

    <span>return</span> <span>data</span>

<span># Get the current year and the previous year </span><span>current_year</span> <span>=</span> <span>datetime</span><span>.</span><span>now</span><span>().</span><span>year</span>
<span>years</span> <span>=</span> <span>[</span><span>current_year</span> <span>-</span> <span>i</span> <span>for</span> <span>i</span> <span>in</span> <span>range</span><span>(</span><span>1</span><span>,</span> <span>2</span><span>)]</span>

<span># Retrieve the data </span><span>stocks_file</span> <span>=</span> <span>'</span><span>/content/sp_500_stocks.csv</span><span>'</span>
<span>data</span> <span>=</span> <span>get_first_last_trading_days</span><span>(</span><span>stocks_file</span><span>,</span> <span>years</span><span>)</span>
def get_first_last_trading_days(stocks_file, years): # Initialize an empty dictionary to store data data = {} # Read the stock tickers from the CSV file. stocks = pd.read_csv(stocks_file)['Ticker'].tolist() if not os.path.exists('stockss_dfs'): os.makedirs('stockss_dfs') def rating(df, startdate, enddate, freq): # Offset is defined based on the time frequency # Define offset based on time frequency if freq == 'Y': offset = '366 days' elif freq == 'M': offset = '31 days' elif freq == '3M': offset = '93 days' elif freq == '6M': offset = '183 days' else: raise ValueError("Frequency not supported. Use 'Y', 'M', '3M', or '6M'.") # Filter the dataframe and calculate the % change ratio, that ranks returns dff = df.loc[(df.index >= pd.Timestamp(startdate) - pd.Timedelta(offset)) & (df.index <= pd.Timestamp(enddate))] dfy = dff.groupby(pd.Grouper(level='Date', freq=freq)).tail(1) ratio = (dfy['Close'] / dfy['Close'].shift() - 1) * 100 return ratio # For sake of scalability, we avoid hardcoding years and try to insert the specified year as a parameter. for year in years: # start and end dates for the year start_date = f"{year}-01-01" end_date = f"{year}-12-31" # Loop through each stock ticker for stock in stocks: # Download the data for each s&p stock and create a file for each stock if it's not already available. file_path = f'stockss_dfs/{stock}_{year}.csv' if not os.path.exists(file_path): try: df = yf.download(stock, start=start_date, end=end_date) df.index = pd.to_datetime(df.index) df.index = df.index.tz_localize(None) if not df.empty: period_rating = rating(df, start_date, end_date, freq='Y') period_rating_monthly = rating(df, start_date, end_date, freq='M') period_rating_3months = rating(df, start_date, end_date, freq='3M') period_rating_6months = rating(df, start_date, end_date, freq='6M') # Store the ratings in the data dictionary data[stock] = { 'Yearly': period_rating, 'Monthly': period_rating_monthly, '3 Months': period_rating_3months, '6 Months': period_rating_6months } # Save the results to CSV df_results = pd.DataFrame({ 'Yearly': period_rating, 'Monthly': period_rating_monthly, '3 Months': period_rating_3months, '6 Months': period_rating_6months }) df_results.to_csv(file_path, index=True) except Exception as e: print(f"Error processing {stock} for year {year}: {e}") continue return data # Get the current year and the previous year current_year = datetime.now().year years = [current_year - i for i in range(1, 2)] # Retrieve the data stocks_file = '/content/sp_500_stocks.csv' data = get_first_last_trading_days(stocks_file, years)

Enter fullscreen mode Exit fullscreen mode

When you run the code, you will get the following:

In the image above, you will observe that the 1-year column is empty. This is because we need the difference at the end of two years to get the return on a year. Alternatively, we can creatively add up the values of individual months for a 12 month cycle, and we trust that will give us the result of the 1 year result. The following code does just that for us.

<span>def</span> <span>extract_sum_of_1_year_return</span><span>(</span><span>directory</span><span>):</span>
<span>all_instruments</span> <span>=</span> <span>[]</span>
<span># List all the files in the directory </span> <span>files</span> <span>=</span> <span>os</span><span>.</span><span>listdir</span><span>(</span><span>directory</span><span>)</span>
<span># Iterate over the files </span> <span>for</span> <span>file</span> <span>in</span> <span>files</span><span>:</span>
<span>file_path</span> <span>=</span> <span>os</span><span>.</span><span>path</span><span>.</span><span>join</span><span>(</span><span>directory</span><span>,</span> <span>file</span><span>)</span>
<span>if</span> <span>os</span><span>.</span><span>path</span><span>.</span><span>isfile</span><span>(</span><span>file_path</span><span>):</span>
<span>df</span> <span>=</span> <span>pd</span><span>.</span><span>read_csv</span><span>(</span><span>file_path</span><span>)</span>
<span>df_sum</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>Monthly</span><span>'</span><span>].</span><span>sum</span><span>()</span>
<span># split the file along the slash </span> <span>file_full_path</span> <span>=</span> <span>file_path</span><span>.</span><span>split</span><span>(</span><span>'</span><span>/</span><span>'</span><span>)</span>
<span>real_file_path</span> <span>=</span> <span>file_full_path</span><span>[</span><span>3</span><span>].</span><span>split</span><span>(</span><span>'</span><span>_</span><span>'</span><span>)</span>
<span>ticker_name</span> <span>=</span> <span>real_file_path</span><span>[</span><span>0</span><span>]</span>
<span>all_instruments</span><span>.</span><span>append</span><span>({</span>
<span>"</span><span>ticker</span><span>"</span><span>:</span> <span>ticker_name</span><span>,</span>
<span>"</span><span>yearly_sum</span><span>"</span><span>:</span> <span>df_sum</span>
<span>})</span>
<span>return</span> <span>all_instruments</span>
<span># Directory containing the CSV files </span><span>directory</span> <span>=</span> <span>"</span><span>/content/stockss_dfs</span><span>"</span>
<span># Call the function and print the result </span><span>results</span> <span>=</span> <span>extract_sum_of_1_year_return</span><span>(</span><span>directory</span><span>)</span>
<span># Read the stock tickers from the CSV file # stocks = pd.read_csv('/content/sp_500_stocks.csv')['Ticker'].tolist() </span><span>def</span> <span>get_stocks</span><span>(</span><span>results</span><span>):</span>
<span># Initialize a list to hold the stock data that was successfully processed </span> <span>successful_stocks</span> <span>=</span> <span>[]</span>
<span># Loop through the first 10 stocks </span> <span>for</span> <span>stock</span> <span>in</span> <span>results</span><span>:</span>
<span>try</span><span>:</span>
<span>api_url</span> <span>=</span> <span>yf</span><span>.</span><span>Ticker</span><span>(</span><span>stock</span><span>[</span><span>'</span><span>ticker</span><span>'</span><span>])</span>
<span>stock_instrument</span> <span>=</span> <span>api_url</span><span>.</span><span>info</span>
<span>current_price</span> <span>=</span> <span>stock_instrument</span><span>.</span><span>get</span><span>(</span><span>'</span><span>currentPrice</span><span>'</span><span>,</span> <span>None</span><span>)</span>
<span># Only add to successful_stocks if both values are not None </span> <span>if</span> <span>current_price</span> <span>is</span> <span>not</span> <span>None</span><span>:</span>
<span>successful_stocks</span><span>.</span><span>append</span><span>({</span>
<span>'</span><span>ticker</span><span>'</span><span>:</span> <span>stock</span><span>[</span><span>'</span><span>ticker</span><span>'</span><span>],</span>
<span>'</span><span>current_price</span><span>'</span><span>:</span> <span>current_price</span><span>,</span>
<span>'</span><span>yearly_sum</span><span>'</span><span>:</span> <span>stock</span><span>[</span><span>'</span><span>yearly_sum</span><span>'</span><span>]</span>
<span>})</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>continue</span>
<span>return</span> <span>successful_stocks</span>
<span>final_stocks</span> <span>=</span> <span>get_stocks</span><span>(</span><span>results</span><span>)</span>
<span>def</span> <span>extract_sum_of_1_year_return</span><span>(</span><span>directory</span><span>):</span>
    <span>all_instruments</span> <span>=</span> <span>[]</span>
    <span># List all the files in the directory </span>    <span>files</span> <span>=</span> <span>os</span><span>.</span><span>listdir</span><span>(</span><span>directory</span><span>)</span>
    <span># Iterate over the files </span>    <span>for</span> <span>file</span> <span>in</span> <span>files</span><span>:</span>
        <span>file_path</span> <span>=</span> <span>os</span><span>.</span><span>path</span><span>.</span><span>join</span><span>(</span><span>directory</span><span>,</span> <span>file</span><span>)</span>
        <span>if</span> <span>os</span><span>.</span><span>path</span><span>.</span><span>isfile</span><span>(</span><span>file_path</span><span>):</span>
            <span>df</span> <span>=</span> <span>pd</span><span>.</span><span>read_csv</span><span>(</span><span>file_path</span><span>)</span>
            <span>df_sum</span> <span>=</span> <span>df</span><span>[</span><span>'</span><span>Monthly</span><span>'</span><span>].</span><span>sum</span><span>()</span>

            <span># split the file along the slash </span>            <span>file_full_path</span> <span>=</span> <span>file_path</span><span>.</span><span>split</span><span>(</span><span>'</span><span>/</span><span>'</span><span>)</span>
            <span>real_file_path</span> <span>=</span> <span>file_full_path</span><span>[</span><span>3</span><span>].</span><span>split</span><span>(</span><span>'</span><span>_</span><span>'</span><span>)</span>
            <span>ticker_name</span> <span>=</span> <span>real_file_path</span><span>[</span><span>0</span><span>]</span>

            <span>all_instruments</span><span>.</span><span>append</span><span>({</span>
                <span>"</span><span>ticker</span><span>"</span><span>:</span> <span>ticker_name</span><span>,</span>
                <span>"</span><span>yearly_sum</span><span>"</span><span>:</span> <span>df_sum</span>
            <span>})</span>

    <span>return</span> <span>all_instruments</span>

<span># Directory containing the CSV files </span><span>directory</span> <span>=</span> <span>"</span><span>/content/stockss_dfs</span><span>"</span>
<span># Call the function and print the result </span><span>results</span> <span>=</span> <span>extract_sum_of_1_year_return</span><span>(</span><span>directory</span><span>)</span>

<span># Read the stock tickers from the CSV file # stocks = pd.read_csv('/content/sp_500_stocks.csv')['Ticker'].tolist() </span><span>def</span> <span>get_stocks</span><span>(</span><span>results</span><span>):</span>
    <span># Initialize a list to hold the stock data that was successfully processed </span>    <span>successful_stocks</span> <span>=</span> <span>[]</span>
    <span># Loop through the first 10 stocks </span>    <span>for</span> <span>stock</span> <span>in</span> <span>results</span><span>:</span>
        <span>try</span><span>:</span>
            <span>api_url</span> <span>=</span> <span>yf</span><span>.</span><span>Ticker</span><span>(</span><span>stock</span><span>[</span><span>'</span><span>ticker</span><span>'</span><span>])</span>
            <span>stock_instrument</span> <span>=</span> <span>api_url</span><span>.</span><span>info</span>
            <span>current_price</span> <span>=</span> <span>stock_instrument</span><span>.</span><span>get</span><span>(</span><span>'</span><span>currentPrice</span><span>'</span><span>,</span> <span>None</span><span>)</span>
            <span># Only add to successful_stocks if both values are not None </span>            <span>if</span> <span>current_price</span> <span>is</span> <span>not</span> <span>None</span><span>:</span>
                <span>successful_stocks</span><span>.</span><span>append</span><span>({</span>
                    <span>'</span><span>ticker</span><span>'</span><span>:</span> <span>stock</span><span>[</span><span>'</span><span>ticker</span><span>'</span><span>],</span>
                    <span>'</span><span>current_price</span><span>'</span><span>:</span> <span>current_price</span><span>,</span>
                    <span>'</span><span>yearly_sum</span><span>'</span><span>:</span> <span>stock</span><span>[</span><span>'</span><span>yearly_sum</span><span>'</span><span>]</span>
                <span>})</span>
        <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
            <span>continue</span>
    <span>return</span> <span>successful_stocks</span>

<span>final_stocks</span> <span>=</span> <span>get_stocks</span><span>(</span><span>results</span><span>)</span>
def extract_sum_of_1_year_return(directory): all_instruments = [] # List all the files in the directory files = os.listdir(directory) # Iterate over the files for file in files: file_path = os.path.join(directory, file) if os.path.isfile(file_path): df = pd.read_csv(file_path) df_sum = df['Monthly'].sum() # split the file along the slash file_full_path = file_path.split('/') real_file_path = file_full_path[3].split('_') ticker_name = real_file_path[0] all_instruments.append({ "ticker": ticker_name, "yearly_sum": df_sum }) return all_instruments # Directory containing the CSV files directory = "/content/stockss_dfs" # Call the function and print the result results = extract_sum_of_1_year_return(directory) # Read the stock tickers from the CSV file # stocks = pd.read_csv('/content/sp_500_stocks.csv')['Ticker'].tolist() def get_stocks(results): # Initialize a list to hold the stock data that was successfully processed successful_stocks = [] # Loop through the first 10 stocks for stock in results: try: api_url = yf.Ticker(stock['ticker']) stock_instrument = api_url.info current_price = stock_instrument.get('currentPrice', None) # Only add to successful_stocks if both values are not None if current_price is not None: successful_stocks.append({ 'ticker': stock['ticker'], 'current_price': current_price, 'yearly_sum': stock['yearly_sum'] }) except Exception as e: continue return successful_stocks final_stocks = get_stocks(results)

Enter fullscreen mode Exit fullscreen mode

Output :

Selecting the top 50 performing stocks.

You will calculate the number of shares per stock you can buy with a certain amount in capital. First you have to select the first 50 stocks with the highest return within a one-year time frame.

<span># Ensure the '1-year-return' column is numeric </span><span>final_stocks_df</span><span>[</span><span>'</span><span>yearly_sum</span><span>'</span><span>]</span> <span>=</span> <span>pd</span><span>.</span><span>to_numeric</span><span>(</span><span>final_stocks_df</span><span>[</span><span>'</span><span>yearly_sum</span><span>'</span><span>],</span> <span>errors</span><span>=</span><span>'</span><span>coerce</span><span>'</span><span>)</span>
<span># Drop rows with NaN values in 'yearly_sum' </span><span>final_stocks_df</span><span>.</span><span>dropna</span><span>(</span><span>subset</span><span>=</span><span>[</span><span>'</span><span>yearly_sum</span><span>'</span><span>],</span> <span>inplace</span><span>=</span><span>True</span><span>)</span>
<span># Sort the dataframe by 'yearly_sum' in descending order </span><span>final_stocks_df</span><span>.</span><span>sort_values</span><span>(</span><span>'</span><span>yearly_sum</span><span>'</span><span>,</span> <span>ascending</span><span>=</span><span>False</span><span>,</span> <span>inplace</span><span>=</span><span>True</span><span>)</span>
<span># Select the top 50 rows </span><span>final_stocks_df</span> <span>=</span> <span>final_stocks_df</span><span>[:</span><span>50</span><span>]</span>
<span># Drop the 'level_0' column </span><span>final_stocks_df</span><span>.</span><span>drop</span><span>(</span><span>columns</span><span>=</span><span>[</span><span>'</span><span>level_0</span><span>'</span><span>],</span> <span>inplace</span><span>=</span><span>True</span><span>)</span>
<span># Display the dataframe </span><span>final_stocks_df</span>
<span># Ensure the '1-year-return' column is numeric </span><span>final_stocks_df</span><span>[</span><span>'</span><span>yearly_sum</span><span>'</span><span>]</span> <span>=</span> <span>pd</span><span>.</span><span>to_numeric</span><span>(</span><span>final_stocks_df</span><span>[</span><span>'</span><span>yearly_sum</span><span>'</span><span>],</span> <span>errors</span><span>=</span><span>'</span><span>coerce</span><span>'</span><span>)</span>

<span># Drop rows with NaN values in 'yearly_sum' </span><span>final_stocks_df</span><span>.</span><span>dropna</span><span>(</span><span>subset</span><span>=</span><span>[</span><span>'</span><span>yearly_sum</span><span>'</span><span>],</span> <span>inplace</span><span>=</span><span>True</span><span>)</span>

<span># Sort the dataframe by 'yearly_sum' in descending order </span><span>final_stocks_df</span><span>.</span><span>sort_values</span><span>(</span><span>'</span><span>yearly_sum</span><span>'</span><span>,</span> <span>ascending</span><span>=</span><span>False</span><span>,</span> <span>inplace</span><span>=</span><span>True</span><span>)</span>

<span># Select the top 50 rows </span><span>final_stocks_df</span> <span>=</span> <span>final_stocks_df</span><span>[:</span><span>50</span><span>]</span>

<span># Drop the 'level_0' column </span><span>final_stocks_df</span><span>.</span><span>drop</span><span>(</span><span>columns</span><span>=</span><span>[</span><span>'</span><span>level_0</span><span>'</span><span>],</span> <span>inplace</span><span>=</span><span>True</span><span>)</span>

<span># Display the dataframe </span><span>final_stocks_df</span>
# Ensure the '1-year-return' column is numeric final_stocks_df['yearly_sum'] = pd.to_numeric(final_stocks_df['yearly_sum'], errors='coerce') # Drop rows with NaN values in 'yearly_sum' final_stocks_df.dropna(subset=['yearly_sum'], inplace=True) # Sort the dataframe by 'yearly_sum' in descending order final_stocks_df.sort_values('yearly_sum', ascending=False, inplace=True) # Select the top 50 rows final_stocks_df = final_stocks_df[:50] # Drop the 'level_0' column final_stocks_df.drop(columns=['level_0'], inplace=True) # Display the dataframe final_stocks_df

Enter fullscreen mode Exit fullscreen mode

Output :

Calculating portfolio amount

Here, you choose an initial starting balance for your portfolio, this amount will be split in equal weights across all the stocks.

<span>def</span> <span>portfolio_input</span><span>():</span>
<span>global</span> <span>portfolio_size</span>
<span>portfolio_size</span> <span>=</span> <span>input</span><span>(</span><span>'</span><span>Enter the value of your portfolio </span><span>'</span><span>)</span>
<span>try</span><span>:</span>
<span>float </span><span>(</span><span>portfolio_size</span><span>)</span>
<span>except</span> <span>ValueError</span><span>:</span>
<span>print</span><span>(</span><span>"</span><span>That</span><span>'</span><span>s not a number! </span><span>\n</span><span>Please try again:</span><span>"</span><span>)</span>
<span>portfolio_size</span> <span>=</span> <span>input</span><span>(</span><span>'</span><span>Enter the value of yout portfolio: </span><span>'</span><span>)</span>
<span>val</span> <span>=</span> <span>float</span><span>(</span><span>portfolio_size</span><span>)</span>
<span>portfolio_input</span><span>()</span>
<span>print </span><span>(</span><span>portfolio_size</span><span>)</span>
<span>def</span> <span>portfolio_input</span><span>():</span>
    <span>global</span> <span>portfolio_size</span>
    <span>portfolio_size</span> <span>=</span> <span>input</span><span>(</span><span>'</span><span>Enter the value of your portfolio </span><span>'</span><span>)</span>
    <span>try</span><span>:</span>
       <span>float </span><span>(</span><span>portfolio_size</span><span>)</span>
    <span>except</span> <span>ValueError</span><span>:</span>
        <span>print</span><span>(</span><span>"</span><span>That</span><span>'</span><span>s not a number! </span><span>\n</span><span>Please try again:</span><span>"</span><span>)</span>
        <span>portfolio_size</span> <span>=</span> <span>input</span><span>(</span><span>'</span><span>Enter the value of yout portfolio: </span><span>'</span><span>)</span>
    <span>val</span> <span>=</span> <span>float</span><span>(</span><span>portfolio_size</span><span>)</span>

<span>portfolio_input</span><span>()</span>
<span>print </span><span>(</span><span>portfolio_size</span><span>)</span>
def portfolio_input(): global portfolio_size portfolio_size = input('Enter the value of your portfolio ') try: float (portfolio_size) except ValueError: print("That's not a number! \nPlease try again:") portfolio_size = input('Enter the value of yout portfolio: ') val = float(portfolio_size) portfolio_input() print (portfolio_size)

Enter fullscreen mode Exit fullscreen mode

Output:

Calculating the number of shares to buy

Divide the portfolio size by the total number of stocks in the s&p500 index to get average amount of investable capital, then calculate the number of shares to buy by dividing the value you got by the current price the stock is trading at.

<span># Find the mean of the portfolio size. </span><span>position_size</span> <span>=</span> <span>float</span><span>(</span><span>portfolio_size</span><span>)</span> <span>/</span> <span>len</span><span>(</span><span>final_stocks_df</span><span>.</span><span>index</span><span>)</span>
<span># Insert the result of 'Enterprise value' / 'Stock Price' into the column of 'Number of Shares to Buy'. </span>
<span>final_stocks_df</span><span>[</span><span>'</span><span>Number of Shares to Buy</span><span>'</span><span>]</span> <span>=</span> <span>np</span><span>.</span><span>floor</span><span>(</span><span>position_size</span> <span>/</span> <span>final_stocks_df</span><span>[</span><span>'</span><span>current_price</span><span>'</span><span>]).</span><span>astype</span><span>(</span><span>int</span><span>)</span>
<span>final_stocks_df</span>
<span># Find the mean of the portfolio size. </span><span>position_size</span> <span>=</span> <span>float</span><span>(</span><span>portfolio_size</span><span>)</span> <span>/</span> <span>len</span><span>(</span><span>final_stocks_df</span><span>.</span><span>index</span><span>)</span>


<span># Insert the result of 'Enterprise value' / 'Stock Price' into the column of 'Number of Shares to Buy'. </span>
<span>final_stocks_df</span><span>[</span><span>'</span><span>Number of Shares to Buy</span><span>'</span><span>]</span> <span>=</span> <span>np</span><span>.</span><span>floor</span><span>(</span><span>position_size</span> <span>/</span> <span>final_stocks_df</span><span>[</span><span>'</span><span>current_price</span><span>'</span><span>]).</span><span>astype</span><span>(</span><span>int</span><span>)</span>

<span>final_stocks_df</span>
# Find the mean of the portfolio size. position_size = float(portfolio_size) / len(final_stocks_df.index) # Insert the result of 'Enterprise value' / 'Stock Price' into the column of 'Number of Shares to Buy'. final_stocks_df['Number of Shares to Buy'] = np.floor(position_size / final_stocks_df['current_price']).astype(int) final_stocks_df

Enter fullscreen mode Exit fullscreen mode

Output:

Conclusion.

In this article, you learned how to allocate capital among the top 50 performing stocks in the S&P 500. You cleaned the data to drop NAN(not a number) data that would have messed with results. This article was inspired by freecode camp’s tutorial (https://www.youtube.com/watch?v=xfzGZB4HhEE), but since much original thought went into writing the code, I decided to write and publish. I hope you learned a thing or two, see you next time.

原文链接:Building an Equal weight portfolio allocation strategy with Python

© 版权声明
THE END
喜欢就支持一下吧
点赞7 分享
Aim for the moon. If you miss, you may hit a star.
把月亮作为你的目标。如果你没打中,也许你还能打中星星
评论 抢沙发

请登录后发表评论

    暂无评论内容