5th - 9th February 2018
The STFC Hartree Centre will be holding a course on "Machine Learning for Cheminformatics" at the Brunner-Mond Training Lab, STFC Daresbury Laboratory, Cheshire, from 5th - 9th February 2018.
TARGET AUDIENCE The intended audience is people with knowledge of chemistry, some experience of coding, and an understanding of Ordinary Least Squares Regression and Principal Component Analysis. The course will equip them with an understanding of non-linear methods for machine learning in the context of chemistry. The course will be free for ADDoPT partners, and available for a fee to non-members of ADDoPT.
PREPARATION We will prepare computers with all the necessary software and datasets installed. We will also circulate installation scripts, for people who prefer to work on their laptops. We will also do a survey of attendees to identify skills they already had, and make a seating plan that ensures that neighbours have complementary skills. People without previous experience of using Python are encourage to travel early in order to attend a preparatory session on Python.
11:00 Introduction to Python (optional)
12:00 Arrival and welcome.
Introduction to JupyterRDKit, and scikit-learn, use in some real world examples. Introduction to methods for selection/transformation of variables for modelling - PCA, Isomap etc.
Exercise: do a linear regression and assess overfitting with a test set.
Naïve Bayes classifiers
Nearest Neighbour classifier and regression
Tree classifier and regression, random forest, gradient boosting.
Assessing quality of a classifier.
Exercise: build a classifier for herbicide-or-not.
Neural nets for classification and regression
Exercise: build a neural net classifier for herbicide-or-not (starting from a pre-trained autoencoder)
Support vector machine (regression and classification)
Combined models: SVM then boosted
Exercise: an SVC+boosted forest classifier for herbicide-or-not.
Competition: who can build the best classifier for insecticide-or-not?
PM: Helpdesk: bring your data (optional)
For further information contact [email protected].