Rory Nicholas

University College London - Computer Science BSc

Sketch Completion with Machine Learning

There are many examples of recent research in the field of image generation using machine learning. However, the vast majority of these projects are focused around pixel-based image formats. To that end, this project aimed to build upon vector-based image generation research with the idea that this image format follows human intuition more closely. This may then allow the creation of tools which may work more closely with people to provide assistance and inspiration in the sketching process.

Specifically, this may assist those learning to draw or who may otherwise find sketching difficult; in particular, those who have a physical or mental disability which may make it difficult to put onto paper exactly the picture they had in mind (this project was made as a dissertation project with the assistance of the UCL Global Disability Innovation Hub).

The project was made by adapting the Sketch-RNN framework.

The model used in this project was a bi-directional variational autoencoder (VAE) using long-short term memory (LSTM) recurrent neural networks (RNNs). The input to the network was a series of five-dimensional vectors to represent strokes. After encoding, Gaussian noise and normalisation is applied to the latent vector to provide a well-formed and regularised latent space. The output of the model was a series of six-dimensional vectors representing a Gaussian mixture model (GMM) to provide a probability function of the stroke to be made at each point of the output sequence.

The idea behind this tool is that a user may input some strokes into the sketching window. They then may request that their sketch is either re-interpreted or completed by a pre-trained model (in this case the model is trained on owls). The user is then shown a grid of the resulting sketches, each sampled around a focal point in the model's latent space.

The user may then select their favourite sketch, according to which the focal point is moved such that newly generated sketches resemble their favourite more closely. This process may be repeated until the user finds a satisfactory sketch, at which point the sketch may be saved as an SVG file.

In addition, this project explored the viability of transfer learning within the Sketch-RNN framework by comparing training results on pre-trained models to results on fresh models.

The final application had the ability to generate quality images while adapting the features according to the user preferences. However, one caveat is that user selections could lead the latent space into extreme values, meaning low quality images would occasionally be generated. In hindsight, this flaw could be fixed by normalising the latent space values in order to deter focal points with extreme values.

As for the transfer learning module of the project, this was found to be successful and potentially useful for certain applications. Transfer learning was found to give a large head-start to models, however fresh models would catch up after around three hours of training. This means it could be useful to facilitate rapid development of different models to fit certain styles and subjects, however would not be necessary for regular use.

Overall this project was marked as a first-class dissertation. The paper written for this project may be found here, and the github repository for the application and transfer learning module may be found here.

Good latent space

Generation window

Transfer learning loss graph