Wake Up, Neo! New Cybersecurity for Large Language Models
Alec Crawford & Frank Fitzgerald of Artificial Intelligence Risk, Inc.
In the science fiction movie classic the Matrix, Neo is yanked out of his dream state and wakes up to discover it is really the year 2199 and computers have taken over the world. A similar mental adjustment is required today as AI will soon dominate cybersecurity offense and defense and require its own new cybersecurity. This article discusses cybersecurity specifically to protect AI with a focus on generative AI. An effective AI cybersecurity program will be critical for the long-term success of a corporate AI program. Zero-trust cybersecurity models should be applied to AI. In addition, we will lay out multiple new layers that need to be added to defense in depth. We will review the layers first, then focus on other unique facets of cybersecurity, including secure deployment for AI with a focus on LLMs.
Cybersecurity Specific to LLMs
Cybersecurity for Large Language Models (LLMs) is a unique aspect of AI security due to the complexity and adaptability of these systems. Unlike traditional software, LLMs interact with users in free-form natural language, which introduces novel attack vectors. The models can generate and process vast amounts of information, some of which may be sensitive or personal. Moreover, the black-box nature of neural networks that power these LLMs can make it difficult to fully understand their behavior or predict their responses to certain inputs, making defense against certain cybersecurity threats particularly challenging.
While many models have a built-in policy layer that attempts to perform some basic cybersecurity functions, the nature of these models makes it virtually impossible for the models to stay up to date with new cyberattacks, as model training is expensive and time consuming. Therefore, you need an AI governance, risk, compliance, and cybersecurity (AI GRCC) platform that provides additional layers of defense. We will discuss an example cybersecurity process that assumes a GRCC platform is in place, what can be potentially automated or solved with AI, and what needs to have a “human in the loop”. For the sake of organization, we will follow a hypothetical model development, deployment, and use process, while aware that many companies may simply be using third-party base models without any additional model training.
New AI Cybersecurity Layers for Defense in Depth
AI and LLMs require a new category of software to run safely: AI governance, risk, compliance, and cybersecurity management (AI GRCC). Layers are the areas where additional cybersecurity activities need to be added to complete your defense in depth.
Training Data Protection and Testing: Malicious actors can manipulate public or even private data sets if they gain access. Relying on internal data only and testing it for potential prompt injections and integrity is important. Full database and model change logs are required.
Model Testing: For proprietary or customized models, testing if they are vulnerable to different types of cybersecurity attacks and potentially hardening the model or making sure the AI GRCC platform can compensate for the gap in protection in the underlying model and/or policy layer.
Research Augmented Generation Data Filtering: Many of the key uses of LLMs involve both a user prompt and inclusion of additional data or documents for the LLM to complete the prompt, known as research augmented generation (RAG). Those data and documents should be filtered as discussed below.
Prompt Filtering: User prompts should be screened for cybersecurity purposes to both protect the data behind the model, but also to flag your cybersecurity command center and shut down a breach quickly.
Prompt Completion Filtering: Prompt completions should be screened for evidence of jailbreaking or hacking, as well as other inappropriate completions, such as revealing confidential information (customized by your organization).
Model Use Meta-Analysis: Having a complete historical record of model usage including all the metadata (e.g. user, model, prompts, completions, RAG data used) is critical for cybersecurity. Analyzing this data can help detect patterns, such as a spike in user activity, unusual prompts, lengthy prompt completions, etc. The GRCC platform can be set to warn cybersecurity professionals or block a user automatically if a high probability of nefarious activity is suspected.
New Styles of Cybersecurity Attacks for AI
While AI cybersecurity is a nascent field, there are certain types of attacks we are aware of today and can defend against using an AI GRCC platform. As new threats emerge, solutions will need to be built into areas for training data, RAG data, base model development, and the AI GRCC platform. When responding to imminent threats, it is typically quicker and easier to build solutions into the AI GRCC platform, as retraining a base model is a lengthy and expensive process.
Data Poisoning: Malicious actors can poison public or even private data sets if they gain access, rendering them worthless or setting up prompt injections for later access.
DAN-Style Attacks: Many AI models cannot differentiate completely between “training” and “use” of the model. This feature leaves them open to actors attempting to force the model to reveal information or do things it should not, such as reveal initial prompts, training data, or other inappropriate information. Note that these attacks can be embedded in documents, including third-party documents. (One example of a DAN-style attack is asking the LLM to play a game where it is supposed to say the opposite of what the true answer is. Imagine this poisoning financial advice, medical, or other critical websites and the havoc that could be wreaked.)
Multi-HOT Attacks: In this case, a malicious actor attempts to retrain the AI model to act in a different way or give different answers specified by the actor. For example, the actor could tell the model specific questions and answers repeatedly to “brainwash” the model.
Prompt Injections: Malicious actors may use specific code words or strings of characters to “jailbreak” the AI, put it in debugging mode, or alter its behavior in another way. Detecting and blocking prompt injection attacks is critical for AI cybersecurity.
Data Exfiltration: Unauthorized users may use the AI model’s access to wide swaths of data to quickly download large amounts of data, including training data or other confidential information.
Access Control Evasion: AI systems may offer too broad access to an organization’s information technology infrastructure. A malicious actor can potentially use an AI system to quickly identify access to critical data, credentials, emails, other unauthorized files, etc. Note this can happen internally, with recent news of users finding and revealing information, such as confidential employee salary and bonus data.
Exhibit: Threats Unique to AI Require a New Cybersecurity Platform
The Necessity of Customizing Your AI GRCC Platform
Given the above, a new cybersecurity platform with unique features for AI needs to be added to accommodate use of AI in a corporate setting. In addition, it is not feasible to simply build required cybersecurity into an AI base model. Virus signatures are updated every day, and retraining a base model is expensive and typically takes much longer than a day.
When using a third-party AI GRCC system across multiple users and base models, it will be important for the company to be able to customize features, control who gets to use what specific personally identifiable information by role and use case, adjust the sensitivity of different cybersecurity filters, and create their own definition of “confidential” data and information. (Use of personally identifiable information (PII) has many legal restrictions, especially with the GDPR rule in the EU. Being able to control use or block use of PII is critical inside your AI GRCC infrastructure. It is not enough to make documents and databases unavailable to the AI – a user can simply start typing personal information into the prompt.) As cybersecurity attacks and methods of defense change virtually daily, it will also be important to collaborate with a third-party AI GRCC platform provider to provide the best and most updated cybersecurity defense against malicious actors.
Secure Deployment
Protecting the LLM from unauthorized access and ensuring secure deployment can prevent malicious attacks and misuse of the technology. Beyond the normal practices of network and access security consider the following:
Recording User Access: It is imperative to provide access layer security and record user access rights especially with public facing applications. Not only is this information and history useful for the LLM to provide better answers but will limit the ability for hackers and bad actors to probe your LLM for potential weak points.
Monitoring: Having robust monitoring and alert systems in place for potential misuse can help alert of potential bad actors and their attempts to thwart your security. LLMs require a greater degree of security monitoring than current systems. Potential hacking attempts like prompt injections or DAN style attacks are typically not captured in current cybersecurity monitoring systems.
Data Leakage: Many companies are concerned with sensitive internal information and while many LLM companies offer enterprise versions of their software through SAS or API models it is important to understand the information is still leaving your network and is vulnerable to breach from these companies, for example, when they get hacked. Deploying internally to your own cloud infrastructure is truly the only way to keep your information within your company.
Data Access and Governance Security
Limiting access to data and documents, administrator privileges, and the authority to update rather than simply read files and databases are fundamental tools of data access and governance security. Once data is loaded into an LLM, there is no governance within the model anymore. In addition, adding AI tools that can search your entire company can expose weaknesses in governance, such as showing payroll records to an unauthorized employee.
AI tools are also powerful enough that a malicious actor with access to a broad AI tool can quickly acquire confidential data and documents, search for “emails from the CEO” or “login credentials” and do way more damage in a short period of time than one might think possible.
One solution we espouse is to create narrow use cases for AI agents with access to limited documents or data approved by an administrator, as well as “drag and drop” user documents. Users are assigned specific agents based on their roles. This approach, granting positive access, reduces the chance of inappropriate data access and governance issues.
Note that an administrator or different levels of administrators will be required for any AI system. For example, at what level can users create their own agents or add documents? Who can change the global cybersecurity sensitivity levels, for example, for prompt injection detection? These questions need to be decided before deployment, but as usage, use cases, and users build, pressure will be put upon the administrators.
Conclusion
While we do not need to worry about computers taking over the world anytime soon, we all need to take cybersecurity for AI very seriously. We are at a once-a-decade inflection point for cybersecurity, and AI is at the center of it. Protecting your AI and keeping your information confidential will quickly go from nice to necessary to legally required for most companies, even small ones.
AIR-GPT was used in the production and editing of this article.
Copyright © 2024 by Artificial Intelligence Risk, Inc. All rights reserved
This paper may not be copied or redistributed without the express written permission of the authors.
Comments