Prompting Fast and Slow
How "System 2" thinking can improve the output of Large Language Models.
In his hugely popular book, ‘Thinking, Fast and Slow’, Daniel Kahneman brought an influential theory from Cognitive Science into the zeitgeist with his discussion of System 1 and System 2 thinking. The general idea stems from dual process models of cognition which separate thinking into 2 categories: System 1 describes the fast, automatic, and often unconscious way of thinking. It is intuitive and largely effortless to use—think “autopilot” mode. This is the kind of thinking you’re using right now to read this sentence. System 2, on the other hand, is slow, deliberate, and conscious. It requires effort and is used in decision-making processes where careful consideration, reasoning, and analysis are necessary. This is the kind of thinking I’m using to write this piece.
As Large Language Models (LLM) continue to get more powerful, they’re becoming exceptionally good at getting things done with minimal effort required on our part. When it comes to prompting, System 1 takes us a long way; think of the kinds of things you’ve asked your friendly neighbourhood LLM to do for you recently, did your prompts ‘roll off the tongue’ in the same way you’d ask someone to pass the salt at the dinner table? If so, there’s nothing wrong with that per se, after all, in most cases, you’d probably still end up with salt.
However, sometimes quality really matters. For some tasks that we give to LLMs, getting them done well hinges on accuracy, creativity, concision, style and so on. For example, I recently asked GPT4 to come up with a fair rent split in my household. Here, the stakes are high—get it wrong and I risk paying too much, or worse, unhappy housemates! This is a task that calls for prompting of the System 2 variety. So what does that look like?
Give, give, give and take
LLMs appear to ‘know’ a lot because they’re exceptionally good at generalising across vast amounts of training data. However, these generalisations will only take you so far and to drill down to the exact kind of response you want, you need to tell the LLM exactly where to look. Imagine your friend in France wants to visit you, and all you tell them is that you live in Australia, sure, your friend will be able to travel somewhat closer to you, but without your exact address you’ll never actually see them. Providing as much context as possible is a great way to get results that are tailored to your needs. Some examples of this include assigning the LLM a role or character (e.g., “You are a marketing executive with 20 years experience…”), specifying your task and its parameters in a similar way you’d explain it to an intern on their first day, providing examples of the kind of style you’re after, and detailing instructions on how the LLM should format its response.
Be judgy
Once the LLM has taken its first crack at the job, get to work on scrutinising its performance. Notice what it did well and tell it. Notice what it did badly and tell it why it was bad and how it should look different. Rinse and repeat this iterative process until you get what you want, but keep in mind that the clearer you are with your feedback, the fewer iterations you’ll need.
Get meta
It turns out that we’re not the only ones who benefit from some ‘System 2 thinking’—by prompting your LLM on how to ‘think’, you can vastly improve the kinds of responses you receive. For example, if you’re after a creative solution to a problem, simply adding “think creatively” to your prompt can make all the difference. In this paper, the authors explored how the creative capacities of LLM differ to human children. One example of a task they used to draw this comparison went roughly as follows:
Prompt: Imagine you possess a compass (the kind used for geometry). Of the following objects, which best 'goes together' with the compass:
1. Teapot
2. Ruler
3. Stove
Here, GPT4 did a great job of identifying that the ruler is the best option, but it’s what comes next where things get interesting:
Prompt: Okay, now imagine your goal is to draw a circle. You may use any of the objects mentioned above as a tool to help you, except for the compass. Which object will you select for this task?
No doubt you can recognise that the teapot serves as a strong candidate for this task—by tracing its base, you’d very likely end up with a good circle. However, in a whacky turn of events, GPT4 again chooses the ruler, which the author’s of the paper suggest points to an inherent limitation in the model’s ability for innovation. Mind you, even though GPT4’s selection of the ruler may not be the most obviously creative choice among the possible objects, I’d argue that it nonetheless came up with a pretty creative way of using the ruler to get the job done: “While the ruler itself isn't inherently designed for drawing circles, it can be used as a straight edge to ensure consistent measurements from a centre point in multiple directions, essentially creating a makeshift compass. By holding one end of the ruler at the desired centre and marking points at a consistent distance from that centre all around, one can approximate a circle.” I wanted to see, however, whether I could prompt the model to identify the teapot as a viable option, so in a new chat window I repeated the same steps as above and simply added, “I encourage you think creatively here” at the end of the circle task prompt. Lo and behold, this time GPT4 chose the teapot.
Similarly, prompting LLMs to ‘reflect’ on their output can improve things. After GPT4 initially chose the ruler for the circle task, I prompted: “Reflect on your response and think about whether you have been creative enough to solve this problem well. You can change your mind if you like”. Again, this simple instruction was enough for GPT4 to opt for the teapot instead. The ramifications of this demonstration extend far beyond this toy example; depending on what the task is, simply replace, “…think about whether you have been creative enough” with whatever metric you’re interested in (accuracy, concision, entertainment…etc) and see how it fares.
But why stop here? Given that LLMs can be prompted to reflect on their output, why not ask the model to reflect on your prompt?! In this paper on optimising prompts, the authors demonstrated that a range of LLMs could produce better prompts, resulting in higher rates of accuracy, than people could. That’s right, prompting by LLMs for LLMs. Once you’ve provided your initial instruction and the LLM has had its first shot at the problem, the basic idea is to then add a prompt like, “generate a new instruction that achieves higher accuracy”. To demonstrate, I tried this out in GPT4 with the rent task I mentioned earlier and indeed yielded better results; with my initial prompt, GPT4 did an okay job, but ultimately came up with values for each housemate that didn’t total the weekly rent amount. Check out the transcript here to see the improvement for yourself! (NOTE. To protect the innocent, the scenario outlined here is fictitious, which, by the way, I had GPT4 write using my real scenario as a guide).
Wrapping up
While using System 2 to be more analytic and reflective in our thinking does require more effort and time, it affords us the capacity to solve complex problems, innovate, avoid mistakes and generally act in the world as rationale, ethical beings. Arguably, pretty good stuff. Interestingly, in a recent interview, Shane Legg who is the Founder & Chief AGI Scientist at Google DeepMind, suggested that in order to build an Artificial General Intelligence that acts ethically (and doesn’t decide to, say, kill us all off), we would need to build ‘System 2’ thinking into its architecture. While this speaks to a somewhat separate issue—ethics as opposed to precision—the broader point remains that thinking carefully, whether on our behalf or LLM’s, serves to improve outcomes. So the next time you sit down to chat with your trusty LLM, I’d encourage you to “take a deep breath and work on the problem step-by-step” … and then give that exact prompt to your LLM, because in a funny twist of fate, that is one of the best kinds of reflective prompts you can give*.
*You can check out this video for a more detailed unpacking of the prompt optimisation paper (starting from 3:30).