Every year organizations generate more data, and security teams are expected to make sense of not just a greater volume of data from the myriad of log sources that exist in corporate environments, but new sources of logs and data as well. In this talk we look at the data scientist methodology and some of the statistical and machine learning techniques available to defenders of corporate infrastructure. After explaining the strengths and weaknesses of the different techniques we will walk through analyzing some data and spend some time explaining the python code and what would be needed to scale the code from analyzing hundreds of thousands of data points to tens of millions. This is not a talk about SIEM, and related technologies. SIEM is good at collecting logs to a central location and performing on the fly inspection and correlations, but rarely has the ability to engage in deeper statistical analysis, or employ machine learning techniques.
A white paper, slides and code will be prepared for this presentation.
Shawn is an information security officer at the Independent Electrical System Operator, which is responsible for operating the electrical grid providing power to one third of Canadians. Having completed Harvard Universities’ graduate certificate in data science, he is eager to share some of the tools and techniques available to make sense of the deluge of data thrown at security teams every day. With more than a decade of cyber security experience Shawn is a seasoned professional who has held a variety of roles across critical infrastructure, the financial sector and higher education. He enjoys using bug bounties to pay for vacations, and fermenting barley into beer and milk into cheese.