Avoiding the Big Data Games of UEBA

Posted by Eran Cohen on Dec 15, 2016 6:21:46 AM
Find me on:

When thinking about some traditional User and Entity Behavior Analytics (UEBA) solutions today, I can’t help but think about a Rube Goldberg machine, an over engineered machine that performs a seemingly simple task.

Screen Shot 2016-12-15 at 1.15.46 AM.pngOne of my favorites is  “The page Turner”.   And I’ll admit it, I like playing with these useless contraptions -- and even build them. By the highview count on that video it seems  I’m not alone in enjoying them. But this does make me wonder what this says about us.   Why do we build overly complicated systems to effectively (in a way) complete tasks so inefficiently?

This brings me back to the current state of the traditional UEBA world (User and Entity Behavior Analytics). A world where Big Data and Security join forces to detect and mitigate threats.

I have witnessed common misconceptions when it comes to “Big Data.” It often happens in the industry when there is a new  technology hype cycle. When climbing up to the peak of the curve, it introduces complexity which later during the Trough of Disillusionment is simplified (I call it realization).

As a product manager, I have the opportunity to speak with many customers. I have found there are two primary misconceptions that relate to UEBA.  The first one is most common: Most people believe that the more data they have examined, the more advanced analytics they will get.

The second misconception is equating big data directly with unstructured data. What defines big data is the volume and velocity, not the structure. Yes, it can be confusing. And unfortunately there are some vendors who take advantage of this confusion (or are confused themselves) and build and over-engineered solution.

The truth is that big data analytics is agnostic to the data type or its source.  

The key factor for success is to use data that matters by clearly defining what you seek to find. In other words, more is not more. “More” can clog things up and make what you’re looking for harder to find.

You don’t need to take my word for it, smarter and more experienced experts agree. Here are two examples:


Bernard Marr in his book “Using SMART Big Data“ outlines that to reap the benefits of Big Data you must be clear about the data that is needed and build the smallest DB in the world when possible, “start with strategy” he writes.


Professor Ray Cooksey, an awarded expert in research methods and statistics, claims that having more means that there is more room for statistical mistakes.

To strengthen this argument even further, here is a quote from Market Guide for User and Entity Behavior Analytics 2016 conducted by the analyst firm Gartner:

“The quality of the predefined analytics is more critical to success than the variety of data sources fed to the UEBA tools. The effectiveness of an analytics engine greatly depends on:

  • Knowing which data and variables need to be analyzed.
  • Making sure it's reading the "right" data sources that will give it the full picture.
  • Knowing how much weight to give to key variables that are analyzed via risk rating functions.

Therefore, users should be highly selective about the entities and data they incorporate into analytics, in order to reduce unnecessary noise that the detection engine must filter out.”

It’s common knowledge that big data is valuable.  I don’t know many people who would argue against this. However, when considering the previous argument, it is clear that if not handled correctly, or if one builds an over-complicated solution, big data can introduce inefficiency and uncertainty rather than solve it.  In particular:

  1. While big data itself isn’t considered expensive, costs are relative to value. Take a close look at the data growth and be sensitive to the scale and pace. Complexity has its hidden costs  
  2. Without clear KPIs, it’s easy to build an inefficient solution and harder to detect threats
  3. Using the “right” data sources help you reduce the statistical mistakes and eliminate the false positives

So, when evaluating UEBA, organizations should be highly selective about the entities and data they incorporate into analytics in order to reduce unnecessary noise that the detection engine must filter out. Once the entities and variables are selected (which UEBA vendors help with), the more information extracted from those sources the better in order to help pinpoint "bad behaviors" and increase detection rates.“

Here are the key goals that your UEBA solution should deliver.  It must be able to detect threats in a way that is:

    • Precise
    • Doesn’t introduce false positives
    • Uses the best quality and simplest, most widely available data
    • Doesn’t manipulate or abstract the data set
    • Low on maintenance  
    • Helps the security team instead of adding burden

At Preempt, we have consciously chosen to focus on smart data using a core data resource that is widely available on all networks and then extending it with relevant data sources based on the use case and need. Simple to deploy, simple to use, simple to maintain.

You can learn more about the Preempt Behavioral Firewall here.  

Topics: ueba, big data