Market Raker Progress Report: A Step Closer to Noise-Free Stock Picking

8 min readJun 3, 2023

Introduction

To the skeptic, stock picking and prediction of prices has mostly been relegated to the realm of 5 sophisticated guesswork. Famously, Orlando the Cat did better at picking stocks than actual 6 analysts, and the general consensus among the time-series forecasting crowd is that the change in stock prices are little better than stochastic noise. However, there is light at the end of the tunnel! It has been conjectured, and supported, that stock prices are not completely a ‘Random Walk’. This, besides the fact that there are countless careers based on the ability to at least analyse a stock, means that it is not all noise. From the barrel into the ditch (if that is even an expression) — here we present our most recent progress report in our mission to have *no more noise* in your stock picking process.

2. Exploring the data

All our data is sourced from TradingView, a platform that you likely already use regularly. The group of stocks we are currently modeling are tech services from the NASDAQ index, including Microsoft and NVIDIA among others. 13 stocks are currently selected for our prototyping process. Along with the raw time series (opening, high, low, and closing price at a two-hour resolution), we download some technical indicators freely available on TradingView. The first task in any modeling effort is to first explore and analyse the data. We converted each of the relevant indicators to percentage-change versions of themselves, and included a bull-bear indication flag based on Ehler’s auto-correlation. Besides this and some other housekeeping tasks, like handling nan-values and getting summary statistics, we did the following:
• Plotted the percentage change in closing price alongside selected input features to attempt to visually determine correlation (See Figure 1 for an example with Ehler’s autocorrelation).
• Calculated the Gini Coefficient[3] for each of the input features, which indicates how well you would descriminate the target based on that feature (See Figure 2).
• Created lag plots for the percentage change in closing price at different degrees of time-lag 28 (See Figure 3).

Figure 1: Percentage change in closing price every two hours vs Ehler’s autocorrelation. Percentage change in closing price is highly noisy and it is difficult to ascertain a clear relationship visually.

Figure 2: Gini coefficient of each feature vs. percentage change in closing price. A gini of at least 0.2 is predictive. As can be seen, exponential moving averages of the closing price are moderately to strongly predictive (0.4 +) with momentum and some other technical indicators at least somewhat predictive (0.2 +).

Figure 3: Lag plot of percentage change in closing price every two hours lagged 1x to 9x. Mostly weak auto-correlation in all cases, indicating much noise and weak relationships in the data itself. Lags that appear closest to the original star-shape of the 1x autocorrelation happen every 3x, or every 6 hours.

Having settled on the input features, we also settled on using a TCN (temporal convolutional network) [4] as a prototype model. TCNs are relatively simple time-series Neural Network (NN) models, that are still quite powerful in comparsion to traditional LSTM and RNN models. For our prototype model, we will use the input features over a time window of the preceding 24 hours (x12 lag) to predict the percent change in closing price in the next 6 hours (x3 lag). Data is split into a training, validation, and testing set based on time.

3. The modeling pipeline

According to Francois Chollet, the creator of Keras, and Ian Goodfellow, the author of THE Deep Learning book, the first thing to do is to set up a working end-to-end modeling pipeline. This we have done in Python, using Pandas and Numpy for a majority of the data preprocessing, 3 Tensorflow and the Keras functional API for the actual modeling, and scipy for the post-training model evaluation. Model training and logging is managed by Weights and Biases, and hyperparam eters are chosen through Bayesian optimisation (as managed by Weights and Biases). Using Keras’ functional API, we implement a TCN by layering 1D convolutions, dropout and a residual connection in a residual block as depicted in Figure 4 [4].

Figure 4: Depiction of a TCN residual block. Each residual block acts as an individual layer. The length of the time-window that the TCN is able to process is linked to its depth (number of residual blocks) and kernel size.

4. The model selection and training

We need the TCN to be able to process a 24-hour time window, or an input length of 12. Fixing TCN stage depth and kernel size, we have a smaller space to search for hyperparameters in our prototype. The input length that a TCN will be able to process is equal to its receptive field, which is calculated with the following formula:

Where R is the receptive field, K is the kernel size, and i is the number of TCN layers in the model. With two TCN layers, and a kernel size of 3, our receptive field would be:

which is sufficient for our input length. Using weights and biases, we conduct two ’Bayesian sweeps’. A Bayesian sweep uses Bayesian 5optimisation to select hyperparameters for a model from a provided configuration space. The sweeps search over different learning rates, weight decay values, dropout values, batch sizes and MLP stage size. The loss value that we are optimising is Mean Squared Error (MSE), and we use the Adam optimiser with its default settings. Weight normalisation is used after each convolutional layer in the residual blocks. Each model in the sweep is trained for 100 epochs. Below, we share a selection of the learning curves of the models trained during the second sweep (Figure 5), as well as the relative hyperparameter importances as calculated by Weights and Biases (Figure 6) and the sweep graph generated by the same (Figure 7).

Figure 5: The learning/loss curves of the first 10 models trained during the second sweep. On the left is the MSE curve for the training steps, and on the right is the MSE curve for the validation steps.

Figure 6: Relative hyperparameter importances as calculated by Weights and Biases. We see that learning rate is one of the most important parameters.

Figure 7: Sweep graph showing the different hyperparameter choices for the various models trained and their resulting validation MSE loss as generated by Weights and Biases.

5. Results

The best resulting model from the second sweep is selected for evaluation. The test set partitions of each stock in question is run through the model to generate predictions. The predicted percentage change in closing price 6 hours into the future at each time step is added to the current closing price of each time step to create a ‘predicted closing price’. This ‘predicted’ closing price is plotted alongside the true closing price as well as the predicted and true percentage change in closing price for the test sets of Apple, NVIDIA, and Microsoft in Figure 8. Table 1: Model performance statistics for Apple (AAPL), NVIDIA (NVDA), and Microsoft (MSFT). Here we show the mean absolute error (MAE) on the (unscaled) output and the % time that the output is within 1 percentage point, 3 percentage points, and 5 percentage points of the prediction. The average % change in closing price for each is also listed.

In addition, some performance statistics are also calculated. Besides the Mean Absolute Error (MAE) on each of the test sets, the ‘percentage time’ that each test set’s predicted percentage change is within 1 percentage point of the truth up to within 5 percentage points of the truth is calculated. The mean true percentage change is provided in comparison. These calculations for Apple, NVIDIA, 79 and Microsoft are shown in Table 1.

(a) Apple

(b) NVIDIA

Figure 8: Predicted percentage change in closing price vs. true percentage change in closing price, portrayed as predicted closing price vs. true closing price. Highly visually noisy results in either direction, with some periods exemplifying much less noise than others. Typically, smaller changes appear more accurately modeled than larger changes.

6. Conclusion and next steps

Although we were able to get a model to converge on this data set, and establish that there are at least some input features that have predictive power, the output of the model on the evaluation set is quite noisy. We see that the model is within 1 percentage point of the true percentage change in closing price 20% of the time, with a mean absolute error of 4.5 to 5 percentage points in comparison to a average percentage change of 3.5 to 4 %. Visually inspecting the result graphs, there are some stretches of time where the model is able to quite accurately predict the percentage change in closing price, whereas in other areas it quite misses the mark. This indicates that there are, on the whole, at least some scenarios where patterns can be effectively ascertained from the data. It is true that as the sampling rate increases, noise increases with it. For our next steps, we would like to:
• Begin modeling with larger resolutions (weekly or daily price data rather than 2-hourly price data) over longer periods of time.
• Include previous changes in closing price as an auto-regressive input feature.
• Be more creative with input feature pre-processing and selection.

References

Allen FE. Cat Beats Professionals at Stock Picking. 2013. url: https://www.forbes.com/ 97 sites/frederickallen/2013/01/15/cat-beats-professionals-at-stock-picking/.
Hyndman RJ and Athanasopoulos G. Forecasting: Principles and Practice. 2021.
Schatz I. Using the Gini coefficient to evaluate the performance of credit score models. 2020. url: https://towardsdatascience.com/using-the-gini-coefficient-to-evaluate-the-performance-of-credit-score-models-59fe13ef420.
Bai S, Kolter JZ, and Koltun V. An Empirical Evaluation of Generic Convolutional and Recur103 rent Networks for Sequence Modeling. ArXiv 2018;abs/1803.01271.

Market Raker Progress Report: A Step Closer to Noise-Free Stock Picking

Written by MarketRaker AI

No responses yet