Machine Learning and Quantitative Investing: 3. Creating a Dataset with PyTorch

Install PyTorch, PyTorch is a Python-first deep learning framework. With PyTorch, you can automatically combine factors into strategies. GPU training is only supported by N cards, so choose CPU mode here. https://pytorch.org/. torchvision is used for image processing, and torchaudio is not needed, so it is not installed.

sudo /home/skka3134/folder/bot/bin/python -m pip install torch

2. Set up the dataset, inherit from the Dataset class, and create a TimeSeriesDataset.

from torch.utils.data import Dataset

class TimeSeriesDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y

    def __len__(self):
        return len(self.X)

    def __getitem__(self, i):
        return self.X[i], self.y[i]
    
train_dataset = TimeSeriesDataset(X_train, y_train)
test_dataset = TimeSeriesDataset(X_test, y_test)

Load the dataset.

from torch.utils.data import DataLoader
batch_size = 16    # Read 16 data points per batch. If training with GPU, can increase to 128?
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True) # shuffle=True means shuffle the data
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False) # shuffle=False means do not shuffle the data

Visualization processing

for _, batch in enumerate(train_loader):
    x_batch, y_batch = batch[0].to(device), batch[1].to(device)
    print(x_batch.shape, y_batch.shape)
    break