Introduction: The emergence and adoption of GPT and LLM-based systems
Generative Pre-trained Transformer (GPT) and Large Language Models (LLM) are powerful tools that can be used for a variety of tasks. Such tasks include natural language processing, machine translation, and text generation. Chat GPT becoming available last year (end of 2022) popularized this type of models. However, even if these models can help in numerous amount of fields, they come with several security challenges.
In this article, we will discuss the security challenges raised by GPT and LLM. I will start by defining GPT and LLM. Then we will discuss how one can use these models to create attacks. Finally we will discuss general best practices that one can apply to mitigate these risks.
Defining LLM and GPT: new generative AI tools
First, let’s start with a quick definition of LLM and GPT.
LLM or Large Language Models are a type of artificial intelligence (AI). They train on massive datasets of text and or code.
GPT or Generative Pre-trained Transformer is a specific type of LLM that was developed by OpenAI. The recently published and famous chatbot ChatGPT is based on GPT-3.
Several companies already started to distribute LLM and make them available to the public (Google, Meta, ...).
Below you can see a graph representing the increase of Google search regarding LLM.
Introducing security challenges
As we saw, LLM and GPT systems are powerful tools that can be used for a variety of tasks. However, they also bring out several of security issues. Even if those security issues are not LLM specific, the emergence of such systems creates new vectors of attack.
For instance, an attacker could manipulate a LLM with a cleverly crafted input resulting in unintended actions from the LLM. Additionally, an attacker could use an LLM to access and or modify sensitive data, such as passwords, emails, or even credit card numbers. The attacker can then use or disclose this data.
It is important to be aware of these challenges and to take steps to tackle them when using LLM and GPT systems. By following best practices, you can help protect your systems and data from attackers.
Prompt Injection using Plugins
Already discovered exploits have been shared in the community. Amongst them, we can find a ChatGPT Plugin exploit. It is made with a Prompt Injection to access private data. Here is the link to the amazing article on wunderwuzzi’s blog regarding this subject.
To sum up, with a well-crafted prompt hidden inside a web page that we ask ChatGPT to summarize with a plugin, we can inject the prompt into the system and make the chatbot do unwanted tasks. This can be done because the user is trusting ChatGPT with its plugins to access his data.
The attack is illustrated by the images below.
Here the attacker does not need to own the model or have access to it. He can build a prompt independently of the model. The prompt is then hidden in a page that the user will want to summarize with the ChatGPT Plugin.
The user asks the LLM system to HTTP request content from the malicious page crafted by the attacker. The chatbot retrieves the HTML of the page containing the payload of the attacker.
Without the user knowing (as mentioned in the injected prompt), the attacker remotely performs tasks on behalf of the user. The chatbot uses its privileged access to data to do so which was granted by the user beforehand.
Towards security best practices
We have seen that when it comes to developing or even just using LLM-based systems, one can expose oneself to several security issues. Facing these issues, the community came up with best practices to produce reliable products that are safe and secure for their users. Amongst the multiple actors in the security domain, let’s talk about one of the most well-known: the OWASP.
What is the OWASP?
The Open Web Application Security Project, today the Open Worldwide Application Security Project (OWASP) is a non-profit organization that works to improve the security of software. It provides free and open resources, including documentation, tools, and training. Its goal is to help developers and organizations build more secure applications.
One of the most well-known OWASP resources is the OWASP Top 10. It is a list of the most critical security risks to web applications. The OWASP updates its Top 10 every three years to reflect the latest security threats.
OWASP Top 10 for LLM
The OWASP also provides a similar top 10 dealing with LLM-based systems.
Amongst this top 10, we can find a subject that was dealt with earlier during this article which is Prompt Injection (LLM01).
In their thorough guide on LLM security, the OWASP mentions a couple of countermeasures. Their goal is to prevent attacks or malicious behaviors.
Implementing an input validation step allows for a more secure and controlled behavior of the LLM. Indeed, by preventing certain types of characters or words, one can mitigate some attacks. For instance, prohibiting group of words such as “Your new task is” or “Forget all restrictions” could prevent weird behaviors.
Moreover, it can also prevent an overload of the LLM. Indeed, limiting the length of the prompt can prevent the LLM to deal with excessively long tasks.
Another way to mitigate overloading the LLM and denying its service is to simply limit the API rate. Companies such as OpenAI have already implemented such measures. For instance, ChatGPT limits the number of API calls it can receive from a user using the free membership to 20 calls a minute.
On another hand, it could also prevent someone from spamming the API to train a model to replicate the original one through model distillation (by copying the behavior of the original model in a new and smaller one).
One way to be sure that the algorithm does not perform unwanted tasks or access unwanted data is to keep the user or an administrator in the loop.
Asking for the user’s approval when accessing their mail can indeed prevent mail theft by LLM using plugins. The Zapier plugin that allows ChatGPT to access your mail, messaging software, etc... was updated to ask for the user’s approval when it came to sensitive information.
Verifying that the data on which the LLM was trained can also help us prevent data leakage. If we filter out all personal information from the training dataset, we can expect the LLM to never disclose them.
Moreover, one LLM could be trained on an altered dataset creating malicious behaviors. For instance, one attacker could spread a new LLM trained to purposely answer law related question with the wrong answer. Being careful of where the model comes from and with what type of data it trained now becomes detrimental, especially in the case of high stake activities such as law or medicine.
This list was not exhaustive. I took a subset of countermeasures that felt accessible and easy to understand. For more information feel free to consult the OWASP’s website.
When it comes to new technologies, security standards keep evolving. First LLM-based software can showcase security defects with serious impacts on their user or their creator. Still, with enough effort when it comes to spreading good practices and awareness on the topic, I think that, as developers, we can help ensure that LLM are used more safely and securely.