Following my introduction presentation to the Dublin R User Group on machine learning, I again had the pleasure of being invited to talk on more machine learning. My talk source and examples are in a Github repo. The talk was more intermediate than my previous introductory talk. It used two datasets focusing on computing cluster job usage statistics (e.g. similar to data from a HPC job scheduler) and machine level application and host usage statistics (e.g. similar to those from host monitoring software such as Nagios).
The techniques covered included:
This talk focused on the two datasets as well as highlightly how the business context was important to consider when developing the models. The talk looking at using machine learning to improvie the utilisation and scheduling of large scale clusters (significant hardware resources). It recapped some earlier material and focused a little more on feature selection and feature generation as well as applied domain expertise from the data set. It provides two datasets showing examples of how to use R to select, assess and create the models.