Tick Data
Tick data is the stream of individual trades executed on an exchange (usually a stock exchange) with each ‘tick’ representing a single trade.
Typically each tick contains a timestamp, trade price, volume and the exchange the trade was executed on. For example, below is a series of ticks for Apple AAPL:
2021-02-01 04:00:02:533,133.65,1,ARCX
2021-02-01 04:00:02:533,133.7,4,ARCX
2021-02-01 04:00:03:713,133.71,50,XNGS
2021-02-01 04:00:03:713,134,50,XNGS
2021-02-01 04:00:03:713,133.7,50,ARCX
2021-02-01 04:00:03:932,134,200,XNGS
Enter fullscreen mode Exit fullscreen mode
(note the timestamp includes milliseconds)
Resampling Tick Data
Tick data is the highest resolution form of market data and can give a lot of insight into a market’s microstructure over very short timeframes. However, it is extremely large in size and the sheer volume of the data can make it unwieldy for analysis for longer timeframe analysis (such as over 1 week). For analysis of longer timeframes, intraday bars (or ‘candles’) are the preferred data format.
A bar, is a single data-point for a timeframe which includes the open, close, high and low prices. For example: below is a series of 1-minute bars for Apple AAPL:
2021-01-04 09:30:00,133.52,133.612,132.95,133.15,2328651
2021-01-04 09:31:00,133.13,133.45,133.08,133.335,486524
2021-01-04 09:32:00,133.345,133.36,132.99,133.11,471947
2021-01-04 09:33:00,133.11,133.15,132.71,132.746,477518
Enter fullscreen mode Exit fullscreen mode
(format : timestamp, high, low, open, close, volume)
Therefore, a common requirement is to resample tick data into intraday bars. Fortunately, the Pandas Python library has several inbuilt functions to perform this task very efficiently.
Worked Example
Starting with a tick dataset for AAPL, you can get a sample tick dataset for AAPL at TickHistory.
If you do not already have Python and Pandas installed, a simple solution is to install Anaconda and then use Anaconda to install Pandas.
Once in Python, import the Pandas package
import pandas as pd
Enter fullscreen mode Exit fullscreen mode
Next, load the data into a dataframe
aapl_df = pd.read_csv('AAPL_2020_10.txt', names=['timestamp', 'trade_price', 'volume', 'exchange'], index_col=0, parse_dates=True)
Enter fullscreen mode Exit fullscreen mode
This statement reads the csv formatted file (note that it can also read directly from a zip file), names the columns, parses the timestamp into a date_time, and finally sets the index to the timestamp column.
Once the data has been loaded, we can quickly review the dataframe to ensure it has correctly loaded using the head()
function:
aapl_df.head()
Enter fullscreen mode Exit fullscreen mode
This should output :
Timestamp trade_price volume exchange
2021-02-01 04:00:02:533 133.65 1 ARCX
2021-02-01 04:00:02:533 133.7 4 ARCX
2021-02-01 04:00:03:713 133.71 50 XNGS
2021-02-01 04:00:03:713 134 50 XNGS
2021-02-01 04:00:03:713 133.7 50 ARCX
Enter fullscreen mode Exit fullscreen mode
To resample the data, we will use the Pandas resample()
function. This needs to be repeated for each of the high, low, open, close, volume datapoints in the bar:
aapl_1hour_open_df = aapl_df.resample("1H").agg({'trade_price': 'first’})
aapl_1hour_high_df = aapl_df.resample("1H").agg({'trade_price': 'high’})
aapl_1hour_low_df = aapl_df.resample("1H").agg({'trade_price': 'low’})
aapl_1hour_close_df = aapl_df.resample("1H").agg({'trade_price': 'last’})
aapl_1hour_volume_df = aapl_df.resample("1H").agg({'volume': 'sum’})
Enter fullscreen mode Exit fullscreen mode
Now we have a separate dataframe for each of the open, high, low, close, volume datapoints. We now need to combine these into a single dataframe using the Pandas concat()
function:
aapl_1hour_df =pd.concat([aapl_1hour_open_df, aapl_1hour_high_df, aapl_1hour_low_df, aapl_1hour_close_df, aapl_1hour_volume_df], axis=1, keys=['open', 'high', 'low', 'close', 'volume'])
Enter fullscreen mode Exit fullscreen mode
Finally, we need to remove the zero volume bars as the resample function will include a bar for every timeframe during the 24-hour day and not just the trading hours. This can be done by filtering for volumes above 0.
aapl_1hour_df = aapl_1hour_df[aapl_1hour_df.volume > 0]
Enter fullscreen mode Exit fullscreen mode
The resampling from ticks to 1-hour bars is now complete and the file can be created using the Pandas to_csv()
function:
aapl_1hour_df.to_csv('file_path')
Enter fullscreen mode Exit fullscreen mode
暂无评论内容