I just finished helping out at the machine learning summer school 2019, and thought it’d be good to write down what I’ve learned and what I will take away from the experience.
MLSS are summer schools being held all over the world, it’s quite amazing to see the various summer schools, you can check it out here. It is aimed towards researchers within ML and neighbouring fields (we had people from data mining, control theory, neurosciene to just name a few!) and for them to get exposure to other fields within machine learning and forming collaborations and friendships.
Tutorials and lectures
The tutorials and lectures were superb. The cohort came with different experiences and strengths, so everyone managed to be pretty advanced in some lectures while coming at it completely fresh in others. This was great since it enabled people to help each other out, explain concepts and ask thoughtful questions to the lecturers. In many cases the lecture notes and tutorials came in the day before (the horror for me who had to finalise the tutorials, but great as the material covered really was cutting edge in most cases) and many of the lecturers traveled far in order to be able to teach, so a big thank you to all of them!
The lectures were very varied. I come from a maths background and like to bury myself in theory and definitions, but I cherish the fact that the lectures spanned a large number of fields that varied in how applied they were. On one end there were lectures on learning theory, optimisation and kernels, but on the other end we got very interesting lectures on interpretability, fairness and ML in biology. It’s pretty awesome that we managed to cover so much of machine learning ground in just two weeks and at least I found some new nuggets of understanding I didn’t have before.
I’ve been to conferences and settings before when you put young people in a room who has a common cause, and it always seem to work wonders as people start to make friends, learn from each other and start discussing how to improve on problems that they are facing.
In the case of MLSS a good aspect of the two weeks is that it was long enough to let people find their place. After the first week most people seemed to have found a group and talked to most of the others about their research, where they come from and if people wanted to collaborate. It helped that their was food served at the Gatsby Institute which let all of us talk in a less formal setting and enjoy the (unusually warm) London sunshine.
A subset of the lectures dealt with issues such as how to make ML fair, if we can make ML fair and how to make ML interpretable. We even had lectures given by external teams, one about the pitfalls and promises of using ML to do charitable work and the work done to build the African Masters in Machine Intelligence (you can find the screencast of this here. If you want to be a mentor, the African Masters in Machine Intelligence is looking for people, please contact “toubasourah (at) gmail.com”).
We also had people from DSSG (Data Science For Social Good)1 talk about the projects that they are doing. In short, DSSG is a summer fellowship that puts researchers and industry professionals with quantitative experience but diverse backgrounds in teams. The aim is to enable social good by delivering projects for charities and governmental institutions using data science and technology. If you think this is something that would suit your skillset I would advice you to apply, it’s pretty competitive so it’s also a great thing to have on your CV. Always nice when doing the right thing also furthers your career.
I was very pleased at the number of people I spoke to that had opinions on whether charity is effective, how to make sure research lead to good societal outcomes and how to behave ethically in science. Personally I think that people in machine learning is in a better position than most to influence the direction of research and opinion, and that we have an obligation to make it so that it is going in a good direction and is a net benefit to society and the world.
Being a helper
Helping out was a great experience but also very tiring! Lectures are on from 9 in the morning each day and activites often went past 6 in the evening. Together with social activities (partying…) this makes for a long 2 weeks. Then again, no one but myself to blame!
I was generally in charge of the IT infrastructure and also ordered badges and Lanyards. Below I’ll outline what I did, what went well and what can be improved for other summer schools. This is a good list for future helpers!
Badges and Lanyards
I thought this would be a one-day job, but the number of hours I spent researching and reaching out to companies took way longer than I thought. We had a fixed budget so I was reaching out to printing companies asking if they could reach our constraints. It was surprisingly hard to get a response and when I did, many did not give a direct quote or quoted things way out of our budget.
I finally got a recommendation of https://www.go2dave.co.uk/ and he helped us get what we needed within budget, explaining our options and always being available when we needed to ask questions. I would really recommend Dave to anyone who are doing conferences, hackathons or summer schools.
In the future, It’d be worth thinking about the needed components of prints and badges way before thinking of ordering them. Coming from the outside I did not know the tools needed to produce files necessary for the printing process. It’d have saved me some time if I knew about bleed and how to use tools such as Inkscape in order to get images and artwork into the proper quality and dimensions. Finally it’d have helped to think about keeping the artwork lossless, so making sure that the conference save the artwork as SVG since this can be converted to other formats easily and plays well with Inkscape.
I would also have tried to make a group of the people involved (artwork creator and the decision makers, in our case Marc Deisenroth and Arthur Gretton) and specify beforehand the needed componenets. This way I wouldn’t have had to run back and forth between the printing company and take up Marc / Arthur’s for every little detail.
Nevertheless, we made it! Thanks to Ira for the beautiful artwork (the moving image on the landing page and the badge artwork) and to Marc and Arthur for being available when I needed to confirm things with them.
I basically took it on me to do the IT stuff. This was a deliberate decision as I like playing around with technology and am very techy. This was also a kind of frustrating job (at least it made me appreciate the IT group at UCL, thanks for your good work everyday) since not all people will have the same baseline understanding of technology and troubleshooting as you do.
My goal was to remove the failure points of the tutorials to the absolute minimum. The tutorials were 2 hours long, which might seem like a long time, but more often than not people weren’t able to finish them. It’s such a waste of time to dedicate a sizable proportion the time to debug software issues when it could be spent on interacting and learning about machine learning which is the main focus of the summer school after all.
Most of the tutorials were simply Jupyter Notebooks, but some of them were run as python script and we had Julia in one tutorial which made things slightly more complicated. Some of the tutorials also required non-trivial installation of external software to run, so I wanted to remove this as a potential of stress for the participants.
The way we did this was split into 3 approaches.
Initially this was done by providing conda environment files and writing a part
in the guide on how to make a conda environment from an environment file. We
quickly realised that this didn’t work as expected as dumping a conda
environment file from my local conda environment hardcoded the packages as
binaries which differ between platforms. I got around this by just writing the
name of the package instead of the full location, for example replacing
numpy in the dependencies. You can see the differences by inspecting the
mlss_gp.yml in the github repo.
The main drawback of this is that if you require specific software for the tutorials to run, you can’t do this through Conda. For example, the RL tutorials needed Malmo to run which required Java and minecraft to install properly. I had problems on my Linux distribution to get this to work so figured other people would to.
Google Colab is like a jupyter notebook run on one of Googles servers and worked very well for tutorials that used deep learning and GPUs. I think that this would actually have worked perfectly fine for most of the tutorials, but I also liked to make sure that we had options for the participants.
The downside is that Colab runs in headless (so anything requiring an X server or an attached screen such as minecraft) and getting data to it can be a bit of a pain unless you know what you are doing.
This was the best option in terms of reproducibility since you can make sure that software play nicely and everything is working. With this you’ll be guaranteed to be able to run it on someone elses computer as long as they can get docker working.
My workflow for docker was as follows:
- Base the image of jupyter/base-notebook using
- Install the necessary things using
- Note that this often result in a very big file as the package manager pulls dependencies of packages you install
- Clone the repo with the environment files and notebooks using git
- In hindsight this is not the best to do, would probably be better to volumes to avoid having files in the image that are unnecessary
- Install the necessary conda environments
- Create a script for running the jupyter notebooks as a server
- Create image
- Push image to dockerhub
In the beginning I tried to accumulatively add conda environments to the already existing image, but this ballooned it up in size (about 6gb+) which is not ideal when you have 150 people trying to download it at the same time as it takes long and puts stress on the local network. However, it worked exactly as expected and was very robust. Problems mostly ocurred when trying to download and install docker on Windows machines which I don’t know enough to help with.
If I was to redo it, I would create one docker image for each tutorial and keep pushing them to docker hub as unique images. Something to keep in mind is that the noteooks and tutorial material will change and often the night before, so doing this is a bit easier since you don’t have to rely on rebuilding potentially very big docker images when you want to install new conda environments. You will also run into you system running out of memory due to how docker works. Working with smaller images lets you clean up more efficiently since pruning (removing old dangling images) is needed from time to time and rebuilding images later will take less time.
Things to keep in mind
Things will change the night before. This is just part of it all, lecturers have things to do and MLSS is not their only commitment. You can mitigate this to some extent but the workflow should be able to accommodate changing requirements and last minutes updates.
Everything will be available on or linked from the README on the GitHub repo in due time.
The tutorials and slides are currently publicly available already. My goal is to streamline the tutorials and provide the necessary Conda files to run them together with the solutions. This is really a treasure trove of information so I want this to be freely available to others that could not participate!
I will update with links to the screencast, but here is a link for now.
If you have any problem with running the tutorials, do let us know by opening an issue and I will take care of it as fast as I can.
Finally, thanks everyone for making the MLSS happen. Special thanks for Arthur Gretton and Marc Deisenroth (who also wanted to acknowledge my contribution of the Slack emojiof MLSS2019!) for organising it in the first place, thank you to all of the lecturers that took time to be here and interact with the students and thank you to all my fellow helpers and admin. Thank you to the UCL staff for the screencast and being available.
And of course thank you to all participants that made MLSS great.