Machine Learning for Cheminformatics

THIS COURSE IS BEING RUN AGAIN FROM 6th - 10th AUGUST 2018 - CLICK HERE FOR FULL DETAILS

5th - 9th February 2018

The STFC Hartree Centre will be holding a course on "Machine Learning for Cheminformatics" at the Brunner-Mond Training LabSTFC Daresbury Laboratory, Cheshire, from 5th - 9th February 2018.

TARGET AUDIENCE The intended audience is people with knowledge of chemistry, some experience of coding, and an understanding of Ordinary Least Squares Regression and Principal Component Analysis. The course will equip them with an understanding of non-linear methods for machine learning in the context of chemistry. The course will be free for ADDoPT partners, and available for a fee to non-members of ADDoPT.

PREPARATION We will prepare computers with all the necessary software and datasets installed. We will also circulate installation scripts, for people who prefer to work on their laptops. We will also do a survey of attendees to identify skills they already had, and make a seating plan that ensures that neighbours have complementary skills. People without previous experience of using Python are encourage to travel early in order to attend a preparatory session on Python.



DAY1

11:00 Introduction to Python (optional)

12:00 Arrival and welcome.

Lunch

Introduction to JupyterRDKit, and scikit-learn, use in some real world examples. Introduction to methods for selection/transformation of variables for modelling - PCA, Isomap etc.

Exercise: do a linear regression and assess overfitting with a test set.

DAY 2

Naïve Bayes classifiers

Nearest Neighbour classifier and regression

Tree classifier and regression, random forest, gradient boosting.

Lunch

Assessing quality of a classifier.

Exercise: build a classifier for herbicide-or-not.

DAY 3

Neural nets for classification and regression

Lunch

Exercise: build a neural net classifier for herbicide-or-not (starting from a pre-trained autoencoder)

DAY 4

Kernel methods:

Support vector machine (regression and classification)

Designing kernels

Lunch

Combined models: SVM then boosted

Exercise: an SVC+boosted forest classifier for herbicide-or-not.

DAY 5

Competition: who can build the best classifier for insecticide-or-not?

Wrap up

Lunch

PM: Helpdesk: bring your data (optional)

Book a place now

For further information contact  [email protected].