6th - 10th August 2018
The STFC Hartree Centre will be repeating its well-received course on "Machine Learning for Cheminformatics" from 6th - 10th August 2018. The venue will be at the Science and Technology Facilities Council's Rutherford Appleton Laboratory, Harwell, Didcot OX11 0QX, United Kingdom.
TARGET AUDIENCE The course is designed for data analysts working in the pharmaceutical industry and for trainees who are able to work independently but would require guidance for solving complex problems. Attendees should be familiar with using statistical methods in a chemical research context.The intended audience is people with knowledge of chemistry, some experience of coding, and an understanding of Ordinary Least Squares Regression and Principal Component Analysis. The course will equip them with an understanding of non-linear methods for machine learning in the context of chemistry.
The course will once again be FREE for ADDoPT partners, and available for a fee (£1000 commercial rate / £150 academic rate) to non-members of ADDoPT.
PREPARATION We will prepare computers with all the necessary software and datasets installed. We will also circulate installation scripts, for people who prefer to work on their laptops. We will also do a survey of attendees to identify skills they already had, and make a seating plan that ensures that neighbours have complementary skills. People without previous experience of using Python are encourage to travel early in order to attend a preparatory session on Python.
11:00 Introduction to Python (optional)
12:00 Arrival and welcome.
Introduction to JupyterRDKit, and scikit-learn, use in some real world examples. Introduction to methods for selection/transformation of variables for modelling - PCA, Isomap etc.
Exercise: do a linear regression and assess overfitting with a test set.
Naïve Bayes classifiers
Nearest Neighbour classifier and regression
Tree classifier and regression, random forest, gradient boosting.
Assessing quality of a classifier.
Exercise: build a classifier for herbicide-or-not.
Neural nets for classification and regression
Exercise: build a neural net classifier for herbicide-or-not (starting from a pre-trained autoencoder)
Support vector machine (regression and classification)
Combined models: SVM then boosted
Exercise: an SVC+boosted forest classifier for herbicide-or-not.
Competition: who can build the best classifier for insecticide-or-not?
PM: Helpdesk: bring your data (optional)
For further information contact [email protected].