Machine Learning for Cheminformatics

6th - 10th August 2018

The STFC Hartree Centre will be repeating its well-received course on "Machine Learning for Cheminformatics" from 6th - 10th August 2018. The venue will be at the Science and Technology Facilities Council's Rutherford Appleton Laboratory, Harwell, Didcot OX11 0QX, United Kingdom. 

TARGET AUDIENCE The course is designed for data analysts working in the pharmaceutical industry and for trainees who are able to work independently but would require guidance for solving complex problems. Attendees should be familiar with using statistical methods in a chemical research context.The intended audience is people with knowledge of chemistry, some experience of coding, and an understanding of Ordinary Least Squares Regression and Principal Component Analysis. The course will equip them with an understanding of non-linear methods for machine learning in the context of chemistry.

The course will once again be FREE for ADDoPT partners, and available for a fee (£1000 commercial rate / £150 academic rate) to non-members of ADDoPT.

PREPARATION We will prepare computers with all the necessary software and datasets installed. We will also circulate installation scripts, for people who prefer to work on their laptops. We will also do a survey of attendees to identify skills they already had, and make a seating plan that ensures that neighbours have complementary skills. People without previous experience of using Python are encourage to travel early in order to attend a preparatory session on Python.

Bookings now open!


11:00 Introduction to Python (optional)

12:00 Arrival and welcome.


Introduction to JupyterRDKit, and scikit-learn, use in some real world examples. Introduction to methods for selection/transformation of variables for modelling - PCA, Isomap etc.

Exercise: do a linear regression and assess overfitting with a test set.


Naïve Bayes classifiers

Nearest Neighbour classifier and regression

Tree classifier and regression, random forest, gradient boosting.


Assessing quality of a classifier.

Exercise: build a classifier for herbicide-or-not.


Neural nets for classification and regression


Exercise: build a neural net classifier for herbicide-or-not (starting from a pre-trained autoencoder)


Kernel methods:

Support vector machine (regression and classification)

Designing kernels


Combined models: SVM then boosted

Exercise: an SVC+boosted forest classifier for herbicide-or-not.


Competition: who can build the best classifier for insecticide-or-not?

Wrap up


PM: Helpdesk: bring your data (optional)

Book a place now

For further information contact [email protected].