Tue, Jun 18, 2024

Machine Learning in Cybersecurity: Models, Marketplaces and More

By 2026, more than 80% of enterprises will have used generative artificial intelligence (“GenAI”) APIs, models and/or deployed GenAI-enabled application in production environments. With this fast pace of adoption, it is no wonder that artificial intelligence (AI) application security tools are already in use by 34% of organizations, a number that will no doubt increase.

Applications are required to go through rigorous security testing to ensure that they do not introduce security risks compromising the application or its users. How do you ensure that an AI system has been thoroughly tested when the risk landscape is evolving as rapidly as the technology? Machine learning is critical in many new AI systems. The security evaluation of these systems is different from other types of system security reviews. A thorough security review starts with understanding of components involved in the system.

Following is an introduction to some machine learning concepts relevant for cybersecurity professionals. This material will be useful for security practitioners, including DevOps engineers, looking for an introduction to AI security. It also serves as a relevant background for our other AI focused articles – see “AI Security Risks and Recommendations: Demystifying the AI Box of Magic”.

What Is Machine Learning?

Machine learning (ML) is a subset of AI that empowers computers to execute tasks without explicit programming. It is built upon algorithms and statistical models designed to recognize patterns and relationships in data, iteratively improving performance. This iterative learning process involves training models with labeled data, enabling them to adjust internal parameters to minimize the difference between predicted and actual outputs. Through this approach, models learn to generalize from training data, facilitating accurate predictions on new or unseen data. ML encompasses various techniques, including supervised learning, unsupervised learning and reinforcement learning, each tailored to different data scenarios. Supervised learning trains models on labeled datasets, while unsupervised learning infers patterns from unlabeled data. Reinforcement learning enables agents to make sequential decisions by interacting with the environment and receiving feedback. An effective ML model can automate tasks, make predictions and uncover insights across diverse domains.

Large language models (LLM) are a specific type of machine learning that has led AI industry due to its capacity to understand and generate human-like text. LLMs, such as OpenAI's GPT series, are trained on vast amount of text data and use deep learning architectures, typically based on transformer models. These models learn to predict the next word or token in a sequence, based on the context provided by preceding words. What distinguishes LLMs is their ability to capture complex linguistic structures, semantics and context, enabling them to generate contextually relevant text responses. LLMs have demonstrated proficiency in various language-related tasks, including language translation, text summarization, question answering and text generation.

Whether you use custom trained ML models running on dedicated inference infrastructure or integrate with third-party LLM provider, ML can open up many cybersecurity risks, and create weaknesses in the overall security architecture.

What Is A Machine Learning Model?

A ML model is a file that has been created by using patterns of previously analyzed data. The model uses its previous learnings to predict things for any new data that is passed in the model. The model file contains weights structured as vectors, which can be interpreted as an extremely complicated graph that plots new data. More practically, a model is a raw file on a filesystem that can be used for inference.

Some model files can be packaged as model archives. Model archives contain the model file as well as code which can be used to transform or tokenize data before interacting with the model. The code is used to transform input data into vectors that are compatible with the model, and output from the model back into the desired format.

A model can be created by using one of the many different ML libraries. Some of the most popular libraries are:

  • PyTorch
  • TensorFlow
  • Jax

Understanding the components and risks associated with a ML framework and model format, is part of the due diligence that should be done to ensure vulnerabilities or unintended consequences, which are not being incurred due to security flaws in ML deployment.

What Is Inference?

Inference refers to the task of interacting with a model and obtaining a prediction from that model. After you have gone through the steps to train a model (or obtain it from an external source), you will have a static file or package that represents the model being used. Normally, you would interact with a model using the same framework generating the model. For example: if a model was trained with PyTorch, you would use that same library to “use” the model and perform inference.

At scale, and on production loads there are many reasons why directly running local inference is hard:

  • Models require GPUs or other expensive hardware to run fast, while your application code outside the models do not.
  • You may have multiple applications that are interacting with the same models, reproducing the models across systems is non-trivial.
  • The model weights and files could be sensitive, hosting them on web-accessible hosts can pose a threat.
  • Complex workflows may require multiple models, concurrent processing or workflows. It is hard to implement this in-line within applications.
  • Applications may be written in a language that is different than the language the models have been trained on.
  • The members of your team who are writing application stacks may be different than the people writing code to interact with machine learning models.

An inference server helps solve some of these problems. On a high level, it loads models in and exposes a set of APIs that can be consumed by upstream services. This lets a developer keep their application code running on their application servers and just make an API call out to interact with a model.

Open Source Machine Learning Model Marketplaces

Various open-source machine learning model marketplaces have emerged. They allow users to share pre-trained models and pipelines, which are required to interact with them. While not the only one, the Hugging Face ecosystem is one of the most popular of these platforms. The platform provides clear information about the training data for a given model and the required input and output formats. It also provides an inference API that lets you invoke a given model directly on their platform.

Open Source Inference Servers

Open source inference servers are an option, if you wanted to expose an inference endpoint to an application and run an inference server on your own infrastructure. Torchserve is an example of an open source inference server that is part of the PyTorch framework. TorchServe website states that this server is a good choice for production workloads. Another common open source inference server is NVIDIA Triton Interference Server.

Inference services are also available from the general cloud service providers AWS, Azure and GCP, as well as specialized AI inference service providers, such as Groq, Hugging Face and OpenAI.

If an organization is building technology to leverage the exciting capabilities of AI, ML, or LLMs, there are many security risks to consider. Machine learning, in a cybersecurity context, comes with many risks, including new attack surfaces, such as inference servers, which can be compromised by prompt injection, but also well-known application security risks that are now presenting in new AI systems and tools.

For a more comprehensive view into common AI security risks and recommendations, when building an AI architecture, read our recent article on “Demystifying the AI box of magic”

Also of interest may be our AI Security Testing Services, outlining the service we provide to assess the cyber resilience of AI systems.


Cyber Risk

Incident response, digital forensics, breach notification, managed detection services, penetration testing, cyber assessments and advisory.

AI Security Testing Services

AI is a rapidly evolving field and Kroll is focused on advancing the AI security testing approach for large language models (LLM) and, more broadly, AI and ML.

Cyber Threat Intelligence

Threat intelligence are fueled by frontline incident response intel and elite analysts to effectively hunt and respond to threats.


Incident Response and Litigation Support

Kroll’s elite security leaders deliver rapid responses for over 3,000 incidents per year and have the resources and expertise to support the entire incident lifecycle.