Blogs

Cyber Security Powered by AI and Machine Learning

Post Comment
Sales Leader Big Data Analytics APAC, IBM

The latest executive report published by IBM Institute for Business Value puts the estimated cost of cyber crime to the global economy in a range of USD 375–575 billion per year. Reputational damage, which is hard to calculate, comes on top of all this. No industry and geography has remained untouched with recent spurt of cyber attacks. 

Most companies have cyber security systems in place with some Fortune 500 and government agencies having highly advanced sophisticated systems in place. The difficulty is, not many have constructed living, dynamic and vibrant cyber security strategy that can match the advances of hackers and cyber criminal activities. Staying ahead is the most critical component of corporate cyber security. 

With ever-evolving cyber threat landscape, it's no longer sufficient to leverage only traditional security solutions like SIEM, network and endpoint security tools, intrusion prevention systems etc. The current security tools are near perfect in identifying and preventing known attack vectors. These legacy solutions are built on known and identified rule sets with quantified responses and actions. Signature driven security monitoring capabilities cannot scale to fully meet the demand of advanced cyber security objectives. 

These solutions don't offer protection from new unknown emerging threat vectors, zero-day attacks, low and slow attacks, and, to top it all, compromised credential attacks. These attacks will eventually show a pattern but are not yet programmed to be detected.

A more flexible mechanism is needed to explore data sets in a holistic manner and uncover otherwise unknown threats. Modern big data analytics — powered by machine learning, data science and AI capabilities — is emerging as powerful solution. Building machine learning, powered with adaptive baseline behavior models, will be super effective in detecting new unknown attacks. Coupling past and current analytics (the knowns) with predictive analytics and machine intelligence — for security and intelligence — will boost the cyber security landscape tremendously. 

Cyber Security Data Lake for applying power of machine learning 

The first step to leverage power of machine learning and data science is creating a “Cyber Security Data Lake” which will augment existing security analytics and anomaly detection solutions. Incorporate this with additional data sets that are valuable for security intelligence, yet difficult to address with SIEM and other traditional security tools. 

Building machine learning and AI models requires lot of data which currently no system would be capturing in organization. The cyber security data lake will facilitate tasks ranging from profile persistence, log ingestion and IoT data captures. The next advance layer enabled by Apache Spark will support building machine learning models, develop algorithm for Forensics and Pattern Detection, provide discovery analytics, and automate alerting. 

With these advance cyber security data lakes, enterprises can move away from a break/fix reactive mode to proactive models built from larger data sets for countering unknown attacks. 

The majority of security data falls into the category of time-series data or log data. Common examples come from firewalls, intrusion detection systems, antivirus software, operating systems, proxies, and web servers.

The other important data type to ingest in these data lakes is contextual security data. Contextual data can be data from vulnerability scans, asset databases, configuration management systems, directories, or special-purpose applications. Contextual data in the form of threat intelligence is becoming more common.

Contextual data is handled separately from log records because it requires a different storage model. Mostly the data is stored in a key value store like Hbase in Hadoop ecosystem to allow for quick lookups. 

Machine Learning Cyber Security Models powered by Open Source framework Apache Spark 

In addition to the problem of scalability, openness is an issue of traditional tools like SIEMs. They were not built to let other products or sophisticated machine learning models reuse the data they collect. Enterprises can leverage an open source ecosystem to break down traditional, expensive cyber security analytics stack and data constraints in order to detect a new breed of sophisticated attacks.

Apache Spark provides strong framework that can perform batch processing to build a machine learning model from scratch or leveraging existing models from Github. It then uses Spark streaming functionality to apply the intelligence in real-time.  

Another important aspect of Apache Spark is powerful libraries like GraphX.  GraphX supports constructing a dynamic relationship graph of entities that allows for the building of baseline patterns of normalcy, flags anomalies dynamically on the fly, does in-depth analyses of the context of an event, and eventually identifies and protects against new unknown cyber threats. 

Bringing Powerful Data Science Engine to Open and Connected Data Platform 

IBM Data Science experience powered by Apache Spark enables detection of patterns and outliers to detect and eliminate emerging cyber threats. It supports building accurate and faster fraud models on the leading open source Hadoop platform, Hortonworks. Enterprises security teams can efficiently build and deploy machine learning models to unstructured and structured data to focus on the discovery of unknown attack vectors. Coupled with seamless access to data by using the industry leading SQL engine, IBM BigSQL, it enables security analysts to deliver insights and data points needed to build the signatures of abnormal behavior beyond traditional security tools.