Enabling configurable projects for ML experiments through GIN

Motivation

We encountered situation when we find ourselves in a very high experimental setup. The task given to us has many approaches out there in form of papers. On the top of that we have our own experience and gut feeling on what can help to achieve better results. Now we have many tools out there to track experiments and log hyper-parameters. But is that enough? There are two questions come to my mind:

One desirable property of combining both a good configurable project and experiment tracking tool

  1. is to keep all configurables in a configuration file,
  2. use experiment tracking tool to track metrics and losses and maybe some hyper-parameters. But these things are not only configurables right? What if I want to change the architecture of the generator network? Or what if I have changed the dataset by adding 100k images to it?
  3. upload configuration file as an artifact

These approach seems to enable first concern from above - now we can literally keep all configurable aspects of the project in the configuration file. And log everything we want to track over time.

Now how do we enable configuration flexible without harming the code-base ?

Let’s go over an example…

Pytorch Lightning example - vanilla

Here I would like to talk about gin-config that does not get attention it deserves. Maybe it is too complicated or maybe google folks were not able to motivate it enough in their “documentation”. So I have decided to spend some time to create a walkthrough of a “real world example” and show where it shines. Real world is a bit stretch here, cause it’s gonna be….good old MNIST example.

basic-gan example can serve us a guinea pig to understand the benefits of having gin-config around. Instead of having one single file containing all the code from example I have tried to create a project like structure:

run.py - entry point of the application Vanilla entry data_module.py - PytorchLightning DataModule - responsible for DataSet and DataLoader creation Vanilla data module model.py - contains the PytorchLightning module and training logic Vanilla model Vanilla model

Red parts above mean that ideally I would like to change those parts through configuration. One can argue - this is already a good setup and when someone changes something, one can commit changes and keep track of the through git. Yes, with the same luck try to convince me that there are dragons. In practice, what happens in experimental mode is that many ideas are tried out together and code is not commited until a “stable” version is reached (whatever “stable” means here).

So how do we do it?

Option 1: Command line arguments

Let’s try to imagine the number arguments we would need to have to achieve a flexibility we want - meaning, making the red areas configurable. For the sake of demonstration I will show what steps might look like only for making DataModule configurable and some aspects of the Model training. Hopefully this will demonstrate shortcomings.

run.py - entry point of the application CLI entry

model.py - contains the PytorchLightning module and training logic CLI model

Pros

Now to understand the cons, let’s look at the contents of the run.py

Cons

Option 2: Config yamls, tomls…

Let’s see what gets better and what gets worse…

config.yaml Yaml config

run.py - entry point of the application Yaml entry

Pros

Cons

Basides the first point nothing really improved compared to Command Line arguments

Solution: gin-configs

Now let’s see how gin-configs help you with above problems …

config.gin GIN Config

run.py - entry point of the application GIN entry

model.py - contains the PytorchLightning module and training logic GIN model

Pros (adding to the pros from yaml)

Cons

Above you have seen a nasty code snippet with optimizers…let’s see how we can leverage this elegantly with gin.

config.gin GIN Config

model.py - contains the PytorchLightning module and training logic GIN model GIN model

Notes