Best Algorithm For Stock Prediction

Best algorithm for stock prediction, Stock Prediction-Intraday is one of the trading norms of the stock market, buy shares at the opening time of the market and then sell the same at the closing time of the same day.

Today we are dealing with one of the data sets, based on daily data of seven years from 2014 to 2021.

We are going to use a simple machine learning algorithm to understand the data, analyze and make predictions based on an algorithm.

If you want to read more on stock market secrets click here.

In this tutorial, we took randomly one of the stocks for analysis and prediction. You can try some other stocks based on your view and interest.

Time Series Trend Analysis in R »

One of the suggestions is that you need to watch the stocks for at least 3 months closely and make your own conclusions with the help of these predictions.

These prediction ideas you can make use of long-term investment. For intraday, you need to know about some kind of strategy.

Suppose if you are going against the market trend chances are higher for losing money. One of the simple strategies we already explained in one of our old posts click here to read.

Load Library

library(prophet)
library(lubridate)
library(ggplot2)
library(pacman)
pacman::p_load(data.table, fixest, BatchGetSymbols, finreportr, ggplot2, lubridate)

Set parameters

first.date <- Sys.Date() - 2500
last.date <- Sys.Date()
freq.data <- "daily"
tickers <- c("BALKRISIND.NS")

We are taking daily data from 2014-07-01 to 2021-05-05.

How to choose lottery numbers?

Getting Data

stocks <- BatchGetSymbols(tickers = tickers,
                          first.date = first.date,
                          last.date = last.date,
                          freq.data = freq.data,
                          do.cache = FALSE,
                          thresh.bad.data = 0)
data<-stocks$df.tickers
data<-na.omit(data)
head(data)

The following details will get for analysis.

price.open price.high price.low price.close volume price.adjusted   ref.date
2   367.2603   371.1934  357.9189    364.2366 124870       342.4004 2014-07-02
3   363.8187   367.3586  358.1648    361.9259  30469       340.2281 2014-07-03
4   359.0743   369.2268  359.0743    365.3428  29728       343.4402 2014-07-04
5   362.8600   367.9485  354.9691    358.0419  74821       336.5770 2014-07-07
6   356.7390   360.8688  347.5944    348.8972  79854       327.9806 2014-07-08
7   346.1194   348.5285  330.4850    341.2030 402494       320.7476 2014-07-09
         ticker ret.adjusted.prices ret.closing.prices
2 BALKRISIND.NS         0.004678743        0.004678642
3 BALKRISIND.NS        -0.006344423       -0.006344118
4 BALKRISIND.NS         0.009441102        0.009441052
5 BALKRISIND.NS        -0.019983830       -0.019983871
6 BALKRISIND.NS        -0.025540681       -0.025540653
7 BALKRISIND.NS        -0.022053048       -0.022053126
str(data)

The dataset contains total of 1680 observations and 10 variables.

Q plot

Let’s plot the dataset for understanding.

qplot(data$ref.date, data$price.close,data=data)

It is clearly evident that the data set is not stationary. Let make use of log transformation and convert it into stationary data.

What is mean by best standard deviation?

Log transformation

ds <- data$ref.date
y <- log(data$price.close)
df <- data.frame(ds, y)
head(df)

After log transformation, the data set should be like this.

      ds        y
 1 2014-07-03 5.891439
 2 2014-07-04 5.900836
 3 2014-07-07 5.880650
 4 2014-07-08 5.854777
 5 2014-07-09 5.832478
 6 2014-07-10 5.854777

Stock forecasting we are using prophet package

m <- prophet(df)
future <- make_future_dataframe(m, periods = 30)

periods indicate the number of days needs to forecast.

Minimum number of units in an experimental design

forecast <- predict(m, future)

Model performance & Stock Prediction

pred <- forecast$yhat[1:dim(df)[1]]
actual <- m$history$y
plot(actual, pred)
summary(lm(pred~actual))

Call:

Call:
 lm(formula = pred ~ actual)
 Residuals:
      Min       1Q   Median       3Q      Max 
 -0.25281 -0.03433  0.00118  0.03647  0.32375 
 Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
 (Intercept) 0.096114   0.017839   5.388 8.15e-08 ***
 actual      0.985297   0.002719 362.360  < 2e-16 ***
 Signif. codes:  0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
 Residual standard error: 0.06204 on 1678 degrees of freedom
 Multiple R-squared:  0.9874,    Adjusted R-squared:  0.9874 
 F-statistic: 1.313e+05 on 1 and 1678 DF,  p-value: < 2.2e-16

Adjusted R square is 98% quite good model.

When you are dealing with time series you need to get an idea about some of the trends like weekly and yearly.

Plot forecast

prophet_plot_components(m, forecast)

For our understanding purpose plotted original data also. Now you can see a plot based on original data without log transformation shows some clear indication.

Coming to prediction all the information is valuable. Now you can see some of the trends like weekly Monday price is going down and Thursday and Friday it’s going up and some seasonal trend based on yearly data.

Decision Trees in R

plot(m, forecast)

The plot is showing an increasing trend for the next 30 days. You can transform log values into original values based on exp function in R.

tail(forecast) 
forecast$yhat<-exp(forecast$yhat)
forecast$trend<-exp(forecast$trend)
forecast$trend_upper<-exp(forecast$trend_upper)
forecast$trend_lower<-exp(forecast$trend_lower)

You can see predicted values in yhat.

ds trend additive_terms additive_terms_lower additive_terms_upper
1677 2021-04-30 1797.005 -0.01183780 -0.01183780 -0.01183780
1678 2021-05-03 1804.474 -0.02116647 -0.02116647 -0.02116647
1679 2021-05-04 1806.971 -0.02292851 -0.02292851 -0.02292851
1680 2021-05-05 1809.471 -0.02313039 -0.02313039 -0.02313039
1681 2021-05-06 1811.975 -0.02336908 -0.02336908 -0.02336908
1682 2021-05-07 1814.482 -0.02667740 -0.02667740 -0.02667740
weekly weekly_lower weekly_upper yearly yearly_lower yearly_upper
1677 0.01570607 0.01570607 0.01570607 -0.02754387 -0.02754387 -0.02754387
1678 0.01255691 0.01255691 0.01255691 -0.03372338 -0.03372338 -0.03372338
1679 0.01296090 0.01296090 0.01296090 -0.03588941 -0.03588941 -0.03588941
1680 0.01493960 0.01493960 0.01493960 -0.03806999 -0.03806999 -0.03806999
1681 0.01687311 0.01687311 0.01687311 -0.04024219 -0.04024219 -0.04024219
1682 0.01570607 0.01570607 0.01570607 -0.04238347 -0.04238347 -0.04238347
multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper
1677 0 0 0
1678 0 0 0
1679 0 0 0
1680 0 0 0
1681 0 0 0
1682 0 0 0
yhat_lower yhat_upper trend_lower trend_upper yhat
1677 7.401588 7.558997 1797.005 1797.005 1775.858
1678 7.398457 7.561202 1804.474 1804.474 1766.681
1679 7.400651 7.563982 1806.971 1806.971 1766.011
1680 7.397345 7.555723 1809.471 1809.471 1768.098
1681 7.405778 7.561941 1811.975 1811.975 1770.122
1682 7.392498 7.553050 1814.482 1814.482 1766.716

Applying the knowledge of machine learning and algorithms to daily life allows us to make better decisions instead of random guesses. 

Gradient Boosting in R

Disclaimer:- For any kind of investment please consult your financial advisor, we are not recommending any stocks or trading ideas.

You may also like...

2 Responses

  1. James Donovan says:

    Are you missing a piece of code that produces the “plot based on original data” in the Plot Forecast section? No code to create second set of graphs.

Leave a Reply

Your email address will not be published. Required fields are marked *

16 + nine =