Cybersecurity: Without human intelligence, artificial intelligence is useless

BehavioSec erweitert Portfolio mit einer weiteren Branchenneuheit

There is an assumption circulating in the cyber security industry: in order to ensure security, an AI-based solution is needed that acts completely autonomously, and in order to achieve this, humans must be kept away from AI.

Dr. Sven Krasser, Chief Scientist at CrowdStrike

As someone who has been dealing with the topic of AI and cyber security for years, I find this claim strange. Because it is precisely with the help of human know-how that AI is particularly effective. But where does this misanthropic view of AI come from and what distinguishes a well-designed AI system?


Two errors of thinking are causing trouble with the misanthropic AI view. First, artificial intelligence is not actually intelligent. Every conversation with a smart speaker will prove this. Artificial intelligence is a set of algorithms and techniques that often give useful results. But sometimes they fail in a strange and non-intuitive way. Artificial intelligence itself even has its own attack surface, which can be used by attackers if it remains unprotected. It is therefore dangerous to consider AI as a panacea for the problems of our industry.

Secondly, we are all still dulled by the time of signatures. At that time, signatures were used that initially stopped threats, but then overlooked new threats, after which people wrote new signatures and the cycle started all over again the next day. With this approach, you can not win, because this model is not only purely reactive, but also significantly limited in its speed by the human reaction time. And, of course, AI models are not used in this way to defend against threats. For modern AI models, such as those used in the CrowdStrike Falcon platform, no human interaction is required to stop a threat immediately. Here, AI is used specifically to detect threats that no one has thought of before – and without any updates being required.

Data, data, data

But what does it take to successfully train an AI model? First of all, it needs data – and a lot of it. The CrowdStrike Security Cloud alone processes over a trillion events from endpoint sensors per day. For comparison: a ream of 500 pages of printer paper is about 50 millimeters thick. If we were to print out every event on an A4 sheet, these pages would pile up about 100,000 kilometers high after a day. That would be enough miles to get gold status on most airlines every day. However, it would take them about four days to cover this distance on the plane at normal cruising speed. And in these four days, the pile of paper would have reached the moon long ago.

However, it should be borne in mind that this metaphorical pile is not only high. In our example, the database includes very different facets such as endpoint security, cloud security, identity protection, threat intelligence and much more. For each of these facets, complex and nuanced data sets are contextualized and correlated. In order to process these data volumes effectively and meaningfully, we designed the Falcon platform as a cloud-native system from the very beginning. None of this is possible on an appliance. And none of this is possible with hybrid cloud solutions, i.e. with clouds that consist only of stacked vendor-managed appliances.

More data also allows us to detect weaker signals. Let’s say you start to apply the longitude and latitude of European cities on graph paper. Initially, you will see some randomly scattered dots. But if you do this for a larger number of cities, then the familiar form of Europe will slowly begin to emerge from a cloud of dots. However, this does not work if everyone has a “local” piece of graph paper on which a handful of nearby cities are recorded. However, with a global view, the combination of cloud and AI really comes into its own.

Structure and basic truth

How does the person fit into this picture? If there is so much information piled up on our metaphorical pile of printer paper that even an airliner could not keep up with it, then how does man have a chance to make a difference?

There are two options. First, stacking the sheets is not the most effective way to organize them. If they are laid out flat next to each other, a paper square of about 250 by 250 kilometers of side length results. This is much more manageable – such an area could be mapped. But if we instead arrange the stack of paper in a cube, it would be a cube of 180×180×180 meters edge length. Note that these are now meters and no longer kilometers, which makes the whole thing much more compact and mapable. For the same reason, books are stored in a library on floors, in corridors and on shelves instead of stacking loose pages. Skillful organization allows you to find the right data faster, and in the cloud we have the advantage that we are not limited to three dimensions like a library.

Secondly, not all data is created equal. There is another type of data that people can easily contribute to. We call this type of data “ground truth”, and it has a significant impact on the training of AI models. The basic truth is the kind of data that describes how an AI model should behave on certain inputs. For our metaphorical paper stack, an example of ground truth would be whether a piece of paper corresponds to a threat (for example, a red-colored sheet) or a benign activity (a green-colored sheet). If you organize your data sensibly, as described above, you only need a few colored sheets to derive information for entire stacks of paper. Imagine that somewhere in our paper cube you pull out a sheet from a pile, and it happens to be red. The other leaves in this pile are probably also red. And some of the neighboring giants will also contain predominantly red paper. In this way, certain types of AI learn: they figure out how to respond to similar (adjacent) inputs, based on the ground truth – this is called supervised learning.

Since ground truth is rarer than other data, other techniques mix these two approaches. In semi-supervised learning, an AI is trained unsupervised on large amounts of data and then optimized by supervised training with less basic data. In self-supervised learning, the AI draws clues from the structure of the data itself.

People, people, people

Ideally, systems are designed to generate as much basic truth as possible. For example, if Threat Hunters find an adversary on the network or classify suspicious activity as benign, these findings become a new ground truth. These data points help to train or evaluate AI systems.

AI systems can also detect incidents where the facts are thinner and there is a higher degree of uncertainty. While AI can still prevent threats without delay under these circumstances, the marked data can later be checked by humans to increase the amount of available facts, especially in those areas where there are bottlenecks. Alternatively, other means can also provide additional data, such as a detonation inside a sandbox, to observe the behavior of the threat in a controlled environment. Such solutions are based on a paradigm called active learning.

Active learning is a useful way to use the limited resource of human attention where it is most important. AI decisions are not slowed down – the AI will continue to analyze threats and stop them immediately. We call this the “fast loop”. Among other things, experts analyze what our AI systems reveal and give an assessment, which we feed back into our AI algorithms. In this way, our AI models receive a constant stream of feedback on where they have been successful and where we have detected and stopped new attacks by other means. The AI learns from this feedback and incorporates it into future detections. We call this part “the long loop”. As a result, our AI is constantly getting better, as new data and new basic truths are constantly flowing into the system.

Final considerations

AI is increasingly becoming an everyday tool to stop cyber threats, but it’s important to look beyond the mere presence of an AI algorithm somewhere in the data flow. To assess the effectiveness of an AI system, one must understand where the data comes from, including the necessary basic truths. Artificial intelligence can only learn if new facts are constantly incorporated into the system on a large scale, so that well-designed AI systems involve humans in a feedback loop.

Outsourced Software Development Services | Dedicated Software Development Team

Ready to see us in action:

More To Explore
Enable registration in settings - general
Have any project in mind?

Contact us: