Game Character Dataset
Create a pytorch dataset
Last updated
Create a pytorch dataset
Last updated
This step is needless if you already have the images that need to be trained. Since we are constructing the dataset from scratch, we need a procedural generation of images from a sample space of images. We initially have a set of images for each different action that a character and its type can make. For example, if we take the character Pikachu, the types will be the different evolution of the character.
Each type of Pikachu will do an action in a different way. They may walk, or attack in a different way as depicted below.
Our image has 2 characters interacting with each other. One is an actor, or the one who instigates, and the other character, the reactor, one who reacts to the actor. Hence, we procedurally generate all images for different types of characters being the actor or reactor doing a different set of actions. In our dataset, we have two characters, Satyr and Golem. Here are a few images from our dataset.
The dataset of images should reflect the distribution of the images. If a certain class is underrepresented in your dataset, the generative model doesn't learn it's representations well.
To create a dataset compatible with PyTorch, we need to create a dataset class and override a few methods. In this section, we'll go over this in detail. The overall code structure is given below. The new class implements a Dataset class from PyTorch and overrides the __getitem__ and __len__ method.
The __getitem__ method takes in an index parameter and we return the image and the labels at that index. The __len__ method returns the length of the dataset.
A Dataloader is used to implement batching of the dataset. If we have a huge dataset, without batching, the weights won't change for a long time during training. Appropriate batch size is chosen, so that backpropagation takes place after every batch. Usually, the batch size is way less than the number of samples in the dataset.
We could have a data loader for training and testing separately. More info on how to create a custom dataset on PyTorch can be found here,
We end up with 432 images and they are increased further by using data augmentation techniques. We have the following class labels which can be observed with any image.
Actor Character
Reactor Character
Actor Type
Reactor Type
Actor Action
Reactor Reaction
If you have more images that reflect the true distribution of the problem statement, the better it is. The number of images used here is significantly less than what it is used in real life even with data augmentation.