Machine Learning and Quantitative Investing: 2. Data Processing

Load the trained data

import pandas as pd
data=pd.read_csv('data.csv')

Feature engineering, we need the Time and Close columns

data = data[['Time', 'Close']]

Install matplotlib

sudo /home/skka3134/folder/bot/bin/python -m pip install matplotlib

Use matplotlib to plot a line graph based on the date and closing price. Matplotlib is a 2D plotting library.

import matplotlib.pyplot as plt
plt.plot(data['Time'],data['Close'])
plt.show()

5. Prepare a dataframe for the LSTM neural network. RNN is an artificial neural network that uses sequential or time series data. LSTM is a special type of RNN designed to address the issues of vanishing and exploding gradients during training with long sequences.

from copy import deepcopy as dc
def prepare_dataframe_for_lstm(df, n_steps):
    df = dc(df)
    
    df.set_index('Time', inplace=True)
    
    for i in range(1, n_steps+1):
        df[f'Close(t-{i})'] = df['Close'].shift(i)
        
    df.dropna(inplace=True)
    
    return df

lookback = 7
shifted_df = prepare_dataframe_for_lstm(data, lookback)

6. Convert it to numpy

shifted_df = prepare_dataframe_for_lstm(data, lookback)
shifted_df_as_np = shifted_df.to_numpy()

7. Install sklearn, we need to use MinMaxScaler from sklearn for data scaling to reduce errors.

sudo /home/skka3134/folder/bot/bin/python -m pip install Scikit-Learn

Scale the data to be between -1 and 1, then inverse scale it back

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(-1, 1))
shifted_df_as_np = scaler.fit_transform(shifted_df_as_np)

9. Create input X and output Y

X = shifted_df_as_np[:, 1:]
y = shifted_df_as_np[:, 0]
X = dc(np.flip(X, axis=1))

10. Split X, use 95% for training and the remaining 5% for testing

split_index = int(len(X) * 0.95)

Reshape

X_train = X[:split_index]
X_test = X[split_index:]
y_train = y[:split_index]
y_test = y[split_index:]

X_train = X_train.reshape((-1, lookback, 1))
X_test = X_test.reshape((-1, lookback, 1))
y_train = y_train.reshape((-1, 1))
y_test = y_test.reshape((-1, 1))

X_train = torch.tensor(X_train).float()
y_train = torch.tensor(y_train).float()
X_test = torch.tensor(X_test).float()
y_test = torch.tensor(y_test).float()