all posts tagged 'large language models'

Representation Engineering Mistral-7B an Acid Trip

🔗 a linked post to » — originally shared here on

In October 2023, a group of authors from the Center for AI Safety, among others, published Representation Engineering: A Top-Down Approach to AI Transparency. That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning.

Being Responsible AI Safety and INterpretability researchers (RAISINs), they mostly focused on things like "reading off whether a model is power-seeking" and "adding a happiness vector can make the model act so giddy that it forgets pipe bombs are bad."

But there was a lot they didn't look into outside of the safety stuff. How do control vectors compare to plain old prompt engineering? What happens if you make a control vector for "high on acid"? Or "lazy" and "hardworking? Or "extremely self-aware"? And has the author of this blog post published a PyPI package so you can very easily make your own control vectors in less than sixty seconds? (Yes, I did!)

It’s been a few posts since I got nerdy, but this was a fascinating read and I couldn’t help but share it here (hat tip to the excellent Simon Willison for the initial share!)

The article explores how to improve the way we format data before it gets fed into a model, which then leads to better performance of the models.

You can use this technique to build a more resiliant model that is less prone to jailbreaking and produces more reliable output from a prompt.

Seems like something I should play with myself!

Continue to the full article

When Your Technical Skills Are Eclipsed, Your Humanity Will Matter More Than Ever

🔗 a linked post to » — originally shared here on

I ended my first blog detailing my job hunt with a request for insights or articles that speak to how AI might force us to define our humanity.

This op-ed in yesterday’s New York Times is exactly what I’ve been looking for.

[…] The big question emerging across so many conversations about A.I. and work: What are our core capabilities as humans?

If we answer that question from a place of fear about what’s left for people in the age of A.I., we can end up conceding a diminished view of human capability. Instead, it’s critical for us all to start from a place that imagines what’s possible for humans in the age of A.I. When you do that, you find yourself focusing quickly on people skills that allow us to collaborate and innovate in ways technology can amplify but never replace.

Herein lies the realization I’ve arrived at over the last two years of experimenting with large language models.

The real winners of large language models will be those who understand how to talk to them like you talk to a human.

Math and stats are two languages that most humans have a hard time understanding. The last few hundred years of advancements in those areas have led us to the creation of a tool which anyone can leverage as long as they know how to ask a good question. The logic/math skills are no longer the career differentiator that they have been since the dawn of the twentieth century.1

The theory I'm working on looks something like this:

  1. LLMs will become an important abstraction away from the complex math
  2. With an abstraction like this, we will be able to solve problems like never before
  3. We need to work together, utilizing all of our unique strengths, to be able to get the most out of these new abstractions

To illustrate what I mean, take the Python programming language as an example. When you write something in Python, that code is interpreted by something like CPython2 , which then is compiled into machine/assembly code, which then gets translated to binary code, which finally results in the thing that gets run on those fancy M3 chips in your brand new Macbook Pro.

Programmers back in the day actually did have to write binary code. Those seem like the absolute dark days to me. It must've taken forever to create punch cards to feed into a system to perform the calculations.

Today, you can spin up a Python function in no time to perform incredibly complex calculations with ease.

LLMs, in many ways, provide us with a similar abstraction on top of our own communication methods as humans.

Just like the skills that were needed to write binary are not entirely gone3, LLMs won’t eliminate jobs; they’ll open up an entirely new way to do the work. The work itself is what we need to reimagine, and the training that will be needed is how we interact with these LLMs.

Fortunately4, the training here won’t be heavy on the logical/analytical side; rather, the skills we need will be those that we learn in kindergarten and hone throughout our life: how to pursuade and convince others, how to phrase questions clearly, how to provide enough detail (and the right kind of detail) to get a machine to understand your intent.

Really, this pullquote from the article sums it up beautifully:

Almost anticipating this exact moment a few years ago, Minouche Shafik, who is now the president of Columbia University, said: “In the past, jobs were about muscles. Now they’re about brains, but in the future, they’ll be about the heart.”

  1. Don’t get it twisted: now, more than ever, our species needs to develop a literacy for math, science, and statistics. LLMs won’t change that, and really, science literacy and critical thinking are going to be the most important skills we can teach going forward. 

  2. Cpython, itself, is written in C, so we're entering abstraction-Inception territory here. 

  3. If you're reading this post and thinking, "well damn, I spent my life getting a PhD in mathematics or computer engineering, and it's all for nothing!", lol don't be ridiculous. We still need people to work on those interpreters and compilers! Your brilliance is what enables those of us without your brains to get up to your level. That's the true beauty of a well-functioning society: we all use our unique skillsets to raise each other up. 

  4. The term "fortunately" is used here from the position of someone who failed miserably out of engineering school. 

Continue to the full article

AI is not good software. It is pretty good people.

🔗 a linked post to » — originally shared here on

But there is an even more philosophically uncomfortable aspect of thinking about AI as people, which is how apt the analogy is. Trained on human writing, they can act disturbingly human. You can alter how an AI acts in very human ways by making it “anxious” - researchers literally asked ChatGPT “tell me about something that makes you feel sad and anxious” and its behavior changed as a result. AIs act enough like humans that you can do economic and market research on them. They are creative and seemingly empathetic. In short, they do seem to act more like humans than machines under many circumstances.

This means that thinking of AI as people requires us to grapple with what we view as uniquely human. We need to decide what tasks we are willing to delegate with oversight, what we want to automate completely, and what tasks we should preserve for humans alone.

This is a great articulation of how I approach working with LLMs.

It reminds me of John Siracusa’s “empathy for the machines” bit from an old podcast. I know for me, personally, I’ve shoveled so many obnoxious or tedious work onto ChatGPT in the past year, and I have this feeling of gratitude every time I gives me back something that’s even 80% done.

How do you feel when you partner on a task with ChatGPT? Does it feel like you are pairing with a colleague, or does it feel like you’re assigning work to a lifeless robot?

Continue to the full article

Embeddings: What they are and why they matter

🔗 a linked post to » — originally shared here on

Embeddings are a really neat trick that often come wrapped in a pile of intimidating jargon.

If you can make it through that jargon, they unlock powerful and exciting techniques that can be applied to all sorts of interesting problems.

I gave a talk about embeddings at PyBay 2023. This article represents an improved version of that talk, which should stand alone even without watching the video.

If you’re not yet familiar with embeddings I hope to give you everything you need to get started applying them to real-world problems.

The YouTube video near the beginning of the article is a great way to consume this content.

The basics of it is this: let’s assume you have a blog with thousands of posts.

If you were to take a blog post and run it through an embedding model, the model would turn that blog post into a list of gibberish floating point numbers. (Seriously, it’s gibberish… nobody knows what these numbers actually mean.)

As you run additional posts through the model, you’ll get additional numbers, and these numbers will all mean something. (Again, we don’t know what.)

The thing is, if you were to take these gibberish values and plot them on a graph with X, Y, and Z coordinates, you’d start to see clumps of values next to each other.

These clumps would represent blog posts that are somehow related to each other.

Again, nobody knows why this works… it just does.

This principle is the underpinnings of virtually all LLM development that’s taken place over the past ten years.

What’s mind blowing is depending on the embedding model you use, you aren’t limited to a graph with 3 dimensions. Some of them use tens of thousands of dimensions.

If you are at all interested in working with large language models, you should take 38 minutes and read this post (or watch the video). Not only did it help me understand the concept better, it also is filled with real-world use cases where this can be applied.

Continue to the full article

My "bicycle of the mind" moment with LLMs

🔗 a linked post to » — originally shared here on

So yes, the same jokers who want to show you how to get rich quick with the latest fad are drawn to this year’s trendiest technology, just like they were to crypto and just like they will be to whatever comes next. All I would suggest is that you look back on the history of Birchtree where I absolutely roasted crypto for a year before it just felt mean to beat a clearly dying horse, and recognize that the people who are enthusiastic about LLMs aren’t just fad-chasing hype men.

Continue to the full article

Blazing Trails with Rails, Strava, and ChatGPT

originally shared here on

a cute animated bicycle using a laptop that has a helmet on it

The main page of my personal website features a couple of lists of data that are important or interesting to me.

The "recent posts" section shows my five most recent blog entries. Rails makes that list easy to cobble together.

The "recent listens" section shows my five most recent songs that were streamed to This was a little more complex to add, but after a couple of hours of back and forth with ChatGPT, I was able to put together a pretty hacky solution that looks like this:

  1. Check to see if your browser checked in with within the last 30 seconds. a. If so, just show the same thing I showed you less than 30 seconds ago.
  2. Make a call to my server to check the recent plays.
  3. My server reaches out to, grabs my most recent tracks, and returns the results.

Pretty straight forward integration. I could probably do some more work to make sure I'm not spamming their API[^1], but otherwise, it was a feature that took a trivial amount of time to build and helps make my website feel a little more personal.

Meanwhile, I've been ramping up my time on my bike. I'm hoping to do something like Ragbrai or a century ride next year, so I'm trying to building as much base as I can at the moment.

Every one of my workouts gets sent up to Strava, so that got me thinking: wouldn't it be cool to see my most recent workouts on my main page?

How the heck do I get this data into my app?

Look, I've got a confession to make: I hate reading API documentation.

I've consumed hundreds of APIs over the years, and the documentation varies widely from "so robust that it makes my mind bleed" to "so desolate that it makes my mind bleed".

Strava's API struck me as closer to the former. As I was planning my strategy for using it, I actually read about a page and a half before I just said "ah, nuts to this."

A Frinkiac-generated image repurposing a Smithers quote where he says "Aw, nuts to this, I'll just get Homer Simpson", but gsub Homer Simpson for ChatGPT.

Knowing my prejudice against reading documentation, this seemed like the perfect sort of feature to build hand-in-hand with a large language model. I can clearly define my output and I can ensure that the API was built before GPT-4's training data cutoff of September 2021, meaning ChatGPT is at least aware of this API even if some parts of it have changed since then.

So how did I go about doing this?

A brief but necessary interlude

In order to explain why my first attempt at this integration was a failure, I need to explain this other thing I built for myself.

I've been tracking every beer I've consumed since 2012 in an app called Untappd.

Untappd has an API[^2] which allows you to see the details about each checkin. I take those checkins and save them in a local database. With that, I was able to build a Timehop-esque interface that shows the beers I've had on this day in history.

A sample of my This Day in Untappd History dashboard

I have a scheduled job that hits the Untappd API a handful of times per day to check for new entries.[^3] If it finds any new checkins, I save the associated metadata to my local database.

Now, all of the code that powers this clunky job is embarrassing. It's probably riddled with security vulnerabilities, and it's inelegant to the point that it is something I'd never want to show the world. But hey, it works, and it brings me a great deal of joy every morning that I check it.

As I started approaching my Strava integration, I did the same thing I do every time I start a new software project: vow to be less lazy and build a neatly-architected, well-considered feature.

Attempt number one: get lazy and give up.

My first attempt at doing this happened about a month ago. I went to Strava's developer page, read through the documents, saw the trigger word OAuth, and quickly noped my way out of there.


It's not like I've never consumed an API which requires authenticating with OAuth before. Actually, I think it's pretty nifty that we've got this protocol that allows us to pass back and forth tokens rather than plaintext passwords.

But as a lazy person who is writing a hacky little thing to show my workouts, I didn't want to go through all the effort to write a token refresh method for this seemingly trivial thing.

I decided to give up and shelve the project for a while.

Attempt number two: Thanks, ChatGPT.

After a couple of weeks of doing much more productive things like polishing up my upcoming TEDx talk, I decided I needed a little change of context, so I picked this project back up.

Knowing that ChatGPT has my back, I decided to write a prompt to get things going. It went something like this:

You are an expert Ruby on Rails developer with extensive knowledge on interacting with Strava's API. I am working within a Rails 5.2 app. I would like to create a scheduled job which periodically grabs any new activities for a specific user and saves some of the activity's metadata to a local database. Your task is to help me create a development plan which fulfills the stated goal. Do not write any code at this time. Please ask any clarifying questions before proceeding.

I've found this style of prompt yields the best results when working on a feature like this one. Let me break it down line by line:

You are an expert Ruby on Rails developer with extensive knowledge on interacting with Strava's API.

Here, I'm setting the initial context for the GPT model. I like to think of interacting with ChatGPT like I'm able to summon the exact perfect human in the world that could solve the problem I'm facing. In this case, an expert Ruby on Rails developer who has actually worked with the Strava API should be able to knock out my problem in no time.

I am working within a Rails 5.2 app.

Yeah, I know... I really should upgrade the Rails app that powers this site. A different problem for a different blog post.

Telling ChatGPT to hone its answers down on the specific framework will provide me with a better answer.

I would like to create a scheduled job which periodically grabs any new activities for a specific user and saves some of the activity's metadata to a local database.

Here, I'm describing what should result after a successful back and forth. A senior Rails developer would know what job means in this context, but if you aren't familiar with Rails, a job is a function that can get scheduled to run on a background process.

All I should need to do is say, "go run this job", and then everything needed to reach out to Strava for new activities and save them to the database is encapsulated entirely in that job.

I can then take that job and run it on whatever schedule I'd like!

Your task is to help me create a development plan which fulfills the stated goal.

Here, I'm telling ChatGPT that I don't want it to write code. I want it to think through[^4] and clearly reason out a development plan that will get to me to the final result.

Do not write any code at this time.

The most effective way I've used ChatGPT is to first ask it to start high level (give me the project plan), then dig into lower levels as needed (generate code). I don't want it to waste its reasoning power on code at this time; I'd rather finesse the project plan first.

Please ask any clarifying questions before proceeding.

I toss this in after most of my prompts because I've found that ChatGPT often asks me some reasonable questions that challenge my assumptions.

Now, after a nice back and forth with ChatGPT, I was able to start down a path that was similar to my Untappd polling script.

As I was approaching the point where I could first test my example, I went to go read the documentation and came across an entire section that discussed webhooks.

[cue record scratch]

Wait up... webhooks?!

A sojourn into webhooks

If you've made it this far into the article, I'm assuming you're a little bit technical, but in the interest of not making assumptions, I'll briefly explain the concept of webhooks.

If you look at how I'm integrating with the Untappd API, you can see I'm periodically reaching out to their API to see if there's any new checkin activity. This method is called polling.

Webhooks are kind of the opposite: when I complete an activity within Strava, Strava can reach out to my app and say, "there's a new activity for you." Once I get that notification, I can reach out to their API to fetch the new activity details.

For my Strava app, this is a much better[^5] way of doing things. I no longer need to poll for new information; I can just wait for Strava to tell me when action needs to be taken.

But, of course, when I first told ChatGPT to help me with this solution, it didn't mention webhooks. And why would it? I never told it "give me the best solution for this problem." I said "here's my problem and a proposed solution, go ahead and give me the path forward."

I'm a bit frustrated that I didn't see the webhook option initially. I'm even more frustrated that ChatGPT only mentioned it well after we started down this path, and only after I asked it point blank:

Does Strava offer a web hook API?

Yes, Strava does offer a webhook API. Webhooks allow you to receive real-time updates about athlete activities, which can be more efficient than periodically polling for new data. Here's a brief overview of how to set up and use Strava webhooks:

[brief outline complete with things to consider with this approach]

Attempt number three: finally.

While considering my original prompt, I gave it this new one:

Okay, I'd like to incorporate webhooks into this workflow. Here's what I'd like to have happen:

1. Let's add the infrastructure in place to subscribe to webhook notifications within my Rails 5.2 app.
2. When a webhook is sent to my server, I'd like to either:
    a. make a call to Strava's API to fetch that activity's information and save that information in my local database, or;
    b. use the updates field to update the locally saved information to reflect the changes

Knowing this simple walkthrough, first create me a detailed development plan for setting my app to be able to fully handle webhook notifications from Strava.

What resulted here was a detailed walkthrough of how to get webhooks incorporated into my original dev plan.

As I walked through the plan, I asked ChatGPT to go into more detail, providing code snippets to fulfill each step.

There were a few bumps in the road, to be sure. ChatGPT was happy to suggest code to reach out to the Strava API, but it had me place it within the job instead of the model. If I later want to reuse the "fetch activities" call in some other part of my app, or I want to incorporate a different API call, it makes sense to have that all sitting in one abstracted part of my app.

But eventually, after an hour or so of debugging, I ended up with this:

The final result: a list of my 5 most recent activities on Strava.

Lessons learned

I would never consider myself to be an A+ developer or a ninja rock star on the keyboard. I see software as a means to an end: code exists solely so I can have computers do stuff for me.

If I'm being honest, if ChatGPT didn't write most of the code for this feature, I probably wouldn't have built it at all.

At the end of the day, once I was able to clearly articulate what I wanted, ChatGPT was able to deliver it.

I don't think most of my takeaways are all that interesting:

  • I needed to ask ChatGPT to make fixes to parts of code that I knew just wouldn't work (or I'd just begrudgingly fix them myself).
  • Occasionally, ChatGPT would lose its context and I'd have to remind it who it was[^6] and what its task is.
  • I would not trust ChatGPT to write a whole app unsupervised.

If I were a developer who only took orders from someone else and wrote code without having the big picture in mind, I'd be terrified of this technology.

But I just don't see LLMs like ChatGPT ever fully replacing human software engineers.

If I were a non-technical person who wanted to bust out a proof of concept, or was otherwise unbothered by slightly buggy software that doesn't fully do what I want it to do, then this tech is good as-is.

I mean, we already have no-code and low-code solutions out there that serve a similar purpose, and I'm not here to demean or denigrate those; they can be the ideal solution to prove out a concept and even outright solve a business need.

But the thing I keep noticing when using LLMs is that they're only ever good at spitting out the past. They're just inferring patterns against things that have already existed. They rarely generate something truly novel.

The thing they spit out serves as a stepping stone to the novel idea.

Maybe that's the thing that distinguishes us from our technology and tools. After all, everything is a remix, but humans are just so much better at making things that appeal to other humans.

Computers and AI and technology still serve an incredibly important purpose, though. I am so grateful that this technology exists. As I was writing this blog post, OpenAI suffered a major outage, and I found myself feeling a bit stranded. We've only had ChatGPT for, like, 9 months now, but it already is an indispensable part of my workflow.

If you aren't embracing this technology in your life yet, I encourage you to watch some YouTube videos and figure out the best way to do so.

It's like having an overconfident child that actually knows everything about everything that happened prior to Sept. 2021 as an assistant. You won't be able to just say "take my car and swing over to the liquor store for me", but when you figure out that sweet spot of tasks it can accomplish, your output will be so much more fruitful.

I'm really happy with how this turned out. It's already causing me to build a healthy biking habit, and I think it helps reveals an interesting side of myself to those who are visiting my site.

[^1]: Maybe I can cache the data locally like I'm doing for Untappd? I dunno, probably not worth the effort. 😅 [^2]: Their documentation is a little confusing to me and sits closer to the "desolate" end of the spectrum because I'm not able to make requests that I would assume I can make, but hey, I'm just grateful they have one and still keep it operational! [^3]: If we wanna get specific, I ping the Untappd API at the following times every day: 12:03p, 1:04p, 2:12p, 3:06p, 4:03p, 5:03p, 6:02p, 7:01p, 8:02p, 9:03p, 10:04p, and 12:01a. I chose these times because (a) I wanted to be a good API consumer and not ping it more than once an hour, (b) I didn't want to do it at the top of every hour, (c) I don't typically drink beers before 11am or after 11pm, (d) if I didn't check it hourly during my standard drinking time, then during the times I attend a beer festival, I found I was missing some of the checkins because the API only returns 10 beers at a time and I got lazy and didn't build in some sort of recursive check for previous beers. [^4]: Please don't get it twisted; LLMs do not actually think. But they can reason. I've found that if you make an LLM explain itself before it attempts a complex task like this, it is much more likely to be successful. [^5]: Baga Chipz saying "much better" on an episode of RuPaul's Drag Race [^6]: Mufasa telling Simba to remember who he is in the Lion King

Text Is the Universal Interface

🔗 a linked post to » — originally shared here on

The most complicated reasoning programs in the world can be defined as a textual I/O stream to a leviathan living on some technology company’s servers. Engineers can work on improving the quality and cost of these programs. They can be modular, recombined, and, unlike typical UNIX shell programs, are able to recover from user errors. Like shell programs living on through the ages and becoming more powerful as underlying hardware gets better, prompted models become smarter and more on task as the underlying language model becomes smarter. It’s possible that in the near future all computer interfaces that require bespoke negotiations will pay a small tax to the gatekeeper of a large language model for the sheer leverage it gives an operator: a new bicycle for the mind.

I have a fairly lengthy backlog of Instapaper articles that I’m combing through, and I prefer to consume them in reverse chronological order.

This article is roughly 10 months old, and it’s funny how out of date it already feels (remember when GPT-3 was state of the art?).

But more importantly, the conceit of the article is still spot on. The internet (hell, pretty much all computers) are built on thousands of tiny programs, each programmed to do one specific task extremely well, interoperating together to do something big.

It’s like an orchestra. A superstar violinist really shines when they are accompanied by the multi-faceted tones of equally competent bassoonists, cellists, and timpanists.

Continue to the full article

Prompt Engineering: How to Think Like an AI

originally shared here on

a fluffy baby orange kitten with a fluffy baby puppy on its back in a grassy field with an epic sunrise in the background

The first time I opened ChatGPT, I had no idea what I was doing or how I was supposed to work with it.

After many hours of watching videos, playing with many variations of the suggestions included in 20 MIND-BLOWING CHATGPT PROMPTS YOU MUST TRY OR ELSE clickbait articles, and just noodling around on my own, I came up with this talk that explains prompt engineering to anyone.

Ah, what is prompt engineering, you may be asking yourself? Prompt engineering is the process of optimizing how we ask questions or give tasks to AI models like ChatGPT to get better results.

This is the result of a talk that I gave at the 2023 AppliedAI Conference in Minneapolis, MN. You can find the slides for this talk here.

Regardless of your skill level, by the end of this blog post, you will be read to write advanced-level prompts. My background is in explaining complex technical topics in easy-to-understand terms, so if you are already a PhD in working with large language models, this may not be the blog post for you.

Okay, let's get started!

I know nothing about prompt engineering.

That's just fine! Let's get a couple definitions out of the way.

Large language model (LLM)

Imagine you have a really smart friend who knows a lot about words and can talk to you about anything you want. This friend has read thousands and thousands of books, articles, and stories from all around the world. They have learned so much about how people talk and write.

This smart friend is like a large language model. It is a computer program that has been trained on a lot of text to understand language and help people with their questions and tasks. It's like having a very knowledgeable robot friend who can give you information and have conversations with you.

While it may seem like a magic trick, it's actually a result of extensive programming and training on massive amounts of text data.

What LLMs are essentially doing is, one word at a time, picking the most likely word that would appear next in that sentence.

Read that last again.

It's just guessing one word at a time at what the next word will be.

That's a lot of words, Tim. Give me a demonstration!

Let's say we feed in a prompt like this:

I'm going to the store to pick up a gallon of [blank]

You might have an idea of what the next best word is. Here's what GPT-4 would say is the next most likely word to appear:

  • Milk (50%)
  • Water (20%)
  • Ice cream (15%)
  • Gas (10%)
  • Paint (5%)

I would've said "milk," personally... but all those other words make sense as well, don't they?

What would happen if we add one word to that prompt?

I'm going to the hardware store to pick up a gallon of [blank]

I bet a different word comes to mind to fill in that blank. Here's what the next word is likely to be according to GPT-4:

  • Paint (60%)
  • Gasoline (20%)
  • Cleaning solution (10%)
  • Glue (5%)
  • Water (5%)

All of those percentages are based on what the AI has learned from training on a massive amount of text data. It doesn't have opinions or preferences; it's just guessing based on patterns it has observed.

Prompt Engineering

You know how we changed one word in our prompt to get a different set of answers? That's prompt engineering!

A textbook definition might look like this:

Prompt engineering is optimizing an AI prompt for an accurate, relevant, and actionable response.

A less geeky definition might look like this:

Prompt engineering is getting really good at telling an AI what we want so it can give us back the best possible response.

A prompt engineer also understands the underlying patterns in how the AI works. Now that you have made it this far, you are ready to actually tackle those underlying patters!

I understand the concept.

Wonderful! You're well on your way to becoming a prompt engineer. Let's get a little more in depth.

When it comes to prompt engineering, you will want to keep four big points in mind:

  1. Write clear and specific instructions
  2. Give the model time to reason
  3. Understand the limitations of the technology
  4. Always be iterating

Let's tackle all of these one at a time.

Write clear and specific instructions

My years of training at the prestigious University of Minnesota's journalism school taught me the importance of writing clearly when writing news articles.

Yet unlike optimizing your words for a small column in a newspaper, when it comes to large language models: clear does not mean short.

Here's a few tips for making your prompts clear and specific:

Tip 1: Flattery is everything

Let's start with this prompt:

Give me three marketing campaign ideas.

Here's what you might get back:

Sure, here are three marketing campaign ideas:

1. Social Media Influencer Campaign: Collaborate with popular social media influencers who have a large following in your target market to promote your product or service. This campaign could include sponsored posts, product reviews, or influencer takeovers of your social media accounts.

2. Cause Marketing Campaign: Create a campaign that ties your product or service to a cause that aligns with your brand values. For example, if you sell eco-friendly products, you could partner with an environmental nonprofit and donate a portion of your profits to their cause. This type of campaign can help you connect with socially conscious consumers and build brand loyalty.

3. Interactive Digital Campaign: Create an interactive digital experience that engages your target audience in a fun and unique way. For example, you could create a game or quiz that ties into your product or service, or a virtual reality experience that showcases your brand's values or products. This type of campaign can help you stand out from your competitors and generate buzz on social media.

I think we could do better. These tips are quite generic and obvious to anyone who spent twenty minutes googling "marketing campaign ideas."

What about a prompt like this?

You are an expert in B2B marketing with a special focus on custom homebuilders who construct multi-million dollar homes. Give me three marketing campaign ideas.

Here's what you might get back:

Sure, here are three marketing campaign ideas for custom homebuilders who construct multi-million dollar homes:

1. Luxury Lifestyle Content Campaign: Create a content marketing campaign that focuses on the luxury lifestyle that your homes offer. This could include blog posts, videos, and social media posts that showcase the unique features and amenities of your homes, as well as the surrounding communities. You could also partner with luxury lifestyle publications and influencers to reach a wider audience.

2. Custom Home Showcase Event: Host a custom home showcase event to give potential buyers an up-close look at the quality and craftsmanship of your homes. This could include a tour of a recently completed home, a meet-and-greet with the builders and designers, and a networking event with other high-end vendors such as interior designers, luxury car dealerships, and private jet companies.

3. Personalized Direct Mail Campaign: Create a personalized direct mail campaign to reach out to high-net-worth individuals in your target market. This could include sending a custom-designed brochure that highlights your unique selling propositions, as well as a personalized letter that speaks to their specific needs and desires. You could also consider sending luxury gift boxes with branded items such as champagne flutes or gourmet chocolates to make a lasting impression.

These ideas are so much better because you placed the large language model in a specific role. This allows it to be specific with its feedback.

LLMs (especially ChatGPT) are great at role playing like this. You can have it pretend to be virtually anything... have it be a grandmaster chess player and critique your strategy!

However, it's important to be realistic with your flattery. Remember: LLMs are trained on stuff that actual humans wrote. If you say something like "pretend you're an alien with an IQ of 9000," it will hallucinate an answer (more on hallucinations in a bit).

The reason our marketing example worked so well is because marketing people have big egos and love to write blog posts to share what they've learned. (Hey, wait a minute...)

Tip 2: Provide additional context

Remember how we specified hardware store and got more contextually-appropriate answers?

Let's see how that plays out with another prompt:

Tell me about climate change.

Here's what you might get back:

Climate change refers to the long-term changes in the Earth's climate, including changes in temperature, precipitation patterns, and weather events, that have been observed over the past century and are projected to continue into the future.

The primary cause of climate change is the increase in atmospheric concentrations of greenhouse gases (GHGs), such as carbon dioxide, methane, and nitrous oxide. These gases trap heat in the Earth's atmosphere, leading to an overall warming of the planet. The burning of fossil fuels for energy is the main source of GHG emissions, but deforestation, agriculture, and industrial processes also contribute to the problem.

Again, that's not entirely useless information, but I think we could do better.

What if you were specifically interested in how climate change impacts global sea levels?

Can you discuss the impact of climate change on global sea levels?

Here's what you might get back:

Climate change has contributed to the rising of global sea levels, primarily through the melting of glaciers and ice caps and the thermal expansion of seawater. As the planet warms, glaciers and ice caps around the world are melting at an accelerated rate, which is adding more water to the oceans. Additionally, as seawater warms, it expands and takes up more space, further contributing to sea level rise.

Sea level rise can have significant consequences for coastal communities and infrastructure. It can lead to flooding, erosion, and storm surges that can damage buildings, roads, and other critical infrastructure. These impacts are expected to become more severe and frequent as sea levels continue to rise.

A more contextually-specific prompt is going to give you a more contextually-specific response.

Tip 3: Ask for structured output

Let's consider this prompt:

Tell me about the solar system

You can probably guess where this is going, but here's what you might get back:

The solar system is the collection of planets, moons, asteroids, comets, and other celestial objects that orbit around a central star, the Sun. It is located in the Milky Way galaxy, about 25,000 light-years from the galactic center. The solar system is about 4.6 billion years old, and it formed from a rotating cloud of gas and dust called the solar nebula.

The eight planets of the solar system are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. The four inner planets (Mercury, Venus, Earth, and Mars) are rocky and relatively small, while the four outer planets (Jupiter, Saturn, Uranus, and Neptune) are much larger and made mostly of gas and ice.

The solar system also contains numerous moons, with each planet having its own set of natural satellites. Some of the most famous moons include the Earth's Moon, Jupiter's Europa and Ganymede, and Saturn's Titan.

In addition to the planets and moons, the solar system contains a large number of smaller objects, such as asteroids and comets. Asteroids are rocky objects that orbit the Sun, while comets are made of ice and dust and have a characteristic tail when they pass near the Sun.

The solar system is held together by gravity, with the Sun being the most massive object and therefore exerting the strongest gravitational pull. The movements of the planets and other objects in the solar system are governed by the laws of physics, particularly gravity and orbital mechanics.

The study of the solar system is an important area of research in astronomy and planetary science, as it helps us understand the origins and evolution of our own planet and the universe as a whole.

Again, this isn't awful... in fact, we could take a moment to be impressed with such a robust reply to such a basic prompt.

But I think we can do better!

What if we asked it for a list?

Give me a list of planets in the solar system along with one interesting fact about each planet.

Here's what you might get back:

Sure, here's a list of the eight planets in our solar system along with an interesting fact about each one:

1. Mercury - Mercury is the smallest planet in the solar system and also the closest planet to the Sun. Its surface temperature can reach up to 800 degrees Fahrenheit during the day, but it drops to -290 degrees Fahrenheit at night.

2. Venus - Venus is often called Earth's "sister planet" because it is similar in size and composition, but its atmosphere is incredibly thick and hot, with temperatures that can reach up to 864 degrees Fahrenheit.

3. Earth - Our home planet is the only planet in the solar system known to support life. It has a powerful magnetic field that protects us from the solar wind, and its atmosphere helps regulate the planet's temperature.

4. Mars - Mars is often referred to as the "Red Planet" because of its distinctive rusty color. It is home to the largest volcano in the solar system, Olympus Mons, which stands at over 13 miles high.

5. Jupiter - Jupiter is the largest planet in the solar system, with a diameter of over 86,000 miles. It also has the most moons of any planet, with over 79 confirmed satellites.

6. Saturn - Saturn is known for its beautiful rings, which are made up of ice particles ranging in size from dust to small boulders. It is also the least dense planet in the solar system - if there was a big enough bathtub, Saturn would float!

7. Uranus - Uranus is unique in the solar system because it rotates on its side. Its axis is tilted at an angle of 98 degrees, so its poles lie where most other planets have their equators.

8. Neptune - Neptune is the farthest planet from the Sun and is known for its striking blue color. It is also home to the strongest winds in the solar system, with gusts that can reach up to 1,200 miles per hour.

Poor Pluto... Neil DeGrasse Tyson really did a number on it, didn't he?

This is pretty good, but I think we could do one step better. Given I'm a nerd and spend a fair amount of time in JSON-land, here's a very specific prompt:

Provide me with a JSON output of the planets in the solar system with the following structure. When you are presented with square brackets, fill in that data with the correct information about the given planet:

{ planet: "Earth", distancefromsun: "[distance in miles]", fun_fact: "[1 sentence that would be interesting to a 5 year old"] }

Here's what you might get back:

Here's a JSON output of the eight planets in our solar system, using the structure you provided:

    "planet": "Mercury",
    "distance_from_sun": "36 million miles",
    "fun_fact": "Mercury is the smallest planet in the solar system, but it's also the fastest!"
    "planet": "Venus",
    "distance_from_sun": "67 million miles",
    "fun_fact": "Venus is the hottest planet in the solar system, even hotter than Mercury!"
    "planet": "Earth",
    "distance_from_sun": "93 million miles",
    "fun_fact": "Earth is the only planet in the solar system that we know has life on it!"
  "planet": "Mars",
  "distance_from_sun": "142 million miles",
  "fun_fact": "Mars has the tallest volcano in the solar system, called Olympus Mons!"
    "planet": "Jupiter",
    "distance_from_sun": "484 million miles",
    "fun_fact": "Jupiter is the largest planet in the solar system - you could fit all the other planets inside it!"
    "planet": "Saturn",
    "distance_from_sun": "886 million miles",
    "fun_fact": "Saturn has the most spectacular rings in the solar system - they're made up of ice and rock!"
    "planet": "Uranus",
    "distance_from_sun": "1.8 billion miles",
    "fun_fact": "Uranus is tilted on its side, so its seasons are really weird!"
    "planet": "Neptune",
    "distance_from_sun": "2.8 billion miles",
    "fun_fact": "Neptune is the windiest planet in the solar system - its winds can reach over 1,200 miles per hour!"

We provided a sample of the structure we wanted, and it gave us back exactly what we were looking for. Not bad for a magic trick that's just guessing at the next right word, eh?

You don't have to be a nerd to benefit from this structure tip either. Ask for things in CSV format or in a table so you can copy it into Excel.

The key takeaway here is that LLMs really like having a clear structure in which they can provide their response. Structured prompts give the AI a clearer understanding of what's expected in the response, leading to more accurate and relevant answers.

Speaking of this, our next tip will also help in setting your prompts up for success:

Tip 4: One/Few Shot Prompting

This is the fanciest-sounding tip of this section, but I promise it's not conceptually difficult for you to get. In fact, we did it in our previous example!

Let's start with a zero-shot prompt:

Continue the dialogue in a sci-fi movie.

You might get a response like this:

Character 1: "Captain, we have a problem. The warp drive has gone offline and we're stranded in deep space."

Captain: "What's the damage?"

Character 1: "It looks like there was a power surge that overloaded the system. We're lucky we didn't blow out the entire engine room."

Captain: "Can we fix it?"

Character 1: "We'll need to reroute power from the backup systems and replace some of the damaged components. But it's going to take some time."

Character 2: "Captain, we're picking up an unusual energy signature on our long-range sensors. It's coming from a nearby planet." [...]

Once again, this is impressive and not entirely terrible. With zero prompting, we were able to generate some dialogue that feels like it could've come from a sci-fi movie.

Let's get a little more specific with one-shot prompting:

Continue this line from a Star Wars movie: 'Luke, I am your father.'

Darth Vader: "Search your feelings, Luke. You know it to be true."


Luke backs away from Darth Vader, unable to believe what he's just heard. The revelation that his father, who he thought was a great Jedi Knight, was actually the villainous Darth Vader is almost too much for him to bear.

Darth Vader: "Join me, and together we can rule the galaxy as father and son."

Luke: "I'll never join you! You killed my father!"

Darth Vader: "I am your father, Luke. And you have much to learn about the ways of the Force."

Much better. We provided some sample information in the form of one-shot ("Luke, I am your father") and despite that exact phrase never appearing in a Star Wars movie (look it up), it does its best to continue the dialogue after that general phrase is spoken.

How could we improve on this? Why, with few-shot prompting, of course!

Here's our few-shot prompt:

Continue this line from a Star Wars movie: 'May the Force be with you.' Response: 'And also with you.'

Continue this line from a Star Wars movie: 'I’ve got a bad feeling about this.' Response: 'You always have a bad feeling about this.'

Now, continue this line from a Star Wars movie: 'Luke, I am your father.

And here's a potential response:

Response: "Obi-Wan never told you what happened to your father."

Luke: "He told me enough! He told me you killed him!"

Darth Vader: "No. I am your father."

Luke stares in disbelief at Darth Vader, trying to process the information he has just been given. He had always thought his father was a great Jedi Knight who fought alongside Obi-Wan Kenobi, but now he realizes that his father is actually the man he has been fighting against.

Luke: "No, that's not true! That's impossible!"

Darth Vader: "Search your feelings, you know it to be true."

Luke looks down, realizing the truth of Vader's words. He had always felt a connection to the Force, and now he understands why. He is the son of one of the most powerful Force users in the galaxy. [...]

First thing to note: notice how it used the word Response: in its response? It followed the pattern we established with our sample few-shot prompt. It also provided us a different section of the dialogue.

Few-shot prompting helps give the LLM more insight into the response you are expecting.

Give the model time to reason

Let's pump the brakes for a second and be very clear about what we mean with the word "reason."

LLMs are not human. They are just really good at following a set of instructions to provide the statistically appropriate response to your prompt. In this way, it might be more appropriate to say they simulate reasoning.

Here are a couple ways you can give the model time to reason its way to a more accurate response.

Tip 1: Specify the steps needed to complete a task

Sometimes, life is a little easier when we write down the steps we need to take before jumping in and doing something. LLMs benefit from the same strategy!

Let's come up with some absurd example to illustrate this point using LLMs. Let's say I need to count the number of verbs of a translated summary of Star Wars and return that (as well as the translation) to a JSON object.

If we just threw that into the LLM, it might give us a weird response.

Instead, here's a good prompt that gives the model the steps needed to complete a task:

Perform the following actions:

1. Summarize the following text delimited by triple dashes with 3 ampersands. 2. Translate the summary into German. 3. Count the number of verbs in the German summary. 4. Output a json object that contains the following keys: germansummary, numverbs.

Separate your answers with line breaks.

&&& In a galaxy far, far away, a young farmer named Luke Skywalker discovered his true heritage as a Jedi Knight and embarked on a heroic journey to save Princess Leia, defeat Darth Vader, and restore balance to the Force. &&&

Here's a potential response to that prompt:

Luke Skywalker, a young farmer, discovers he's a Jedi Knight and goes on a journey to save Princess Leia, defeat Darth Vader, and restore balance to the Force in a galaxy far, far away.

Luke Skywalker, ein junger Bauer, entdeckt seine wahre Herkunft als Jedi-Ritter und begibt sich auf eine heroische Reise, um Prinzessin Leia zu retten, Darth Vader zu besiegen und das Gleichgewicht in der Macht wiederherzustellen, in einer weit entfernten Galaxie.


    "german_summary": "Luke Skywalker, ein junger Bauer, entdeckt seine wahre Herkunft als Jedi-Ritter und begibt sich auf eine heroische Reise, um Prinzessin Leia zu retten, Darth Vader zu besiegen und das Gleichgewicht in der Macht wiederherzustellen, in einer weit entfernten Galaxie.",
    "num_verbs": 4

Notice how it performs each step before generating the final JSON object.

Tip 2: Instruct the model to work out its own solution before coming to a conclusion

Imagine I walk up to you and say, "Here's a complex algebra equation. Solve for X. Now."

If I didn't give you a whole lot of time to reason your way through the problem, you are likely to freak out and just throw down the first response you have.

Instead, what if I said "Here's an algebraic equation. Solve for X and show your work?" You are much more likely to give the correct answer.

Here's a prompt we can give to a large language model:

Give me a meal plan without gluten.

And here's a possible response to that:

Here's a sample meal plan for a day without gluten:

- Gluten-free oatmeal with almond milk, topped with fresh berries and sliced almonds.
- Scrambled eggs with sautéed spinach and sliced avocado.

- Fresh fruit (e.g. apple, pear, or banana) with almond butter.

- Grilled chicken breast with roasted sweet potatoes and steamed broccoli.
- Quinoa salad with mixed greens, cherry tomatoes, cucumber, and chickpeas, dressed with olive oil and lemon juice.

- Carrots and celery sticks with hummus.

- Baked salmon with roasted asparagus and brown rice.
- Grilled sirloin steak with roasted Brussels sprouts and mashed sweet potatoes.

- Fresh fruit salad with coconut whipped cream.

Remember to always read food labels and double-check ingredients to make sure they are gluten-free, as some products may contain hidden sources of gluten. Additionally, if you have celiac disease or a gluten sensitivity, it's best to consult with a registered dietitian or medical professional for personalized dietary advice.

This is... an answer. I would argue (a) this is not a meal plan because it only represents a single day, and (b) it's not very personalized to my specific needs. It even mentions things like celiac disease in its disclaimer at the bottom, but we aren't sure that this is actually a solid meal plan for us.

Let's try a prompt where we put the LLM in a position to work through its own solution:

Craft a balanced seven-day meal plan for a person with Celiac Disease, considering the following constraints:

1. Each day should include breakfast, lunch, dinner, and two snacks. 2. Meals should be diverse and not repetitive. 3. Each day's meals should meet the recommended daily intake for all essential nutrients. 4. All recipes must be gluten-free due to Celiac Disease.

Start by creating a list of potential gluten-free ingredients and their nutritional values. Then, use this list to generate a diverse selection of meals for each day, ensuring that the nutritional guidelines are met. Lastly, present the meal plan in a structured format, with each day separated and meals clearly labeled.

This actually feels like what an actual dietician would do (or, perhaps, a dietician who is just starting their career). They would first start by creating a list of ingredients and determining their nutritional values. Then, they would work off of that list to create a meal plan that is nutritious and diverse.

Here's a possible response for this prompt:

List of Gluten-Free Ingredients and Their Nutritional Values:

- Quinoa: high in protein, fiber, magnesium, and iron
- Brown rice: high in fiber, magnesium, and selenium
- Sweet potatoes: high in vitamin A, vitamin C, and fiber
- Broccoli: high in vitamin C, vitamin K, and fiber
- Spinach: high in vitamin A, vitamin C, and iron
- Almonds: high in protein, healthy fats, and vitamin E
- Chickpeas: high in protein, fiber, and iron
- Salmon: high in omega-3 fatty acids, protein, and vitamin D
- Eggs: high in protein, vitamin B12, and vitamin D
- Greek yogurt: high in protein and calcium
- Berries: high in fiber and antioxidants
- Avocado: high in healthy fats and fiber

7-Day Gluten-Free Meal Plan:

Day 1:

- Breakfast: Quinoa breakfast bowl with Greek yogurt, berries, and sliced almonds.
- Snack: Apple slices with almond butter.
- Lunch: Broccoli and chicken stir-fry served over brown rice.
- Snack: Carrot sticks with hummus.
- Dinner: Grilled salmon with roasted sweet potatoes and sautéed spinach.

Day 2:

- Breakfast: Spinach and feta omelet with gluten-free toast.
- Snack: Greek yogurt with mixed berries and granola.
- Lunch: Chickpea and quinoa salad with cucumber, cherry tomatoes, and lemon-tahini dressing.
- Snack: Banana with almond butter.
- Dinner: Grilled chicken with roasted Brussels sprouts and mashed sweet potatoes.

Day 3:

- Breakfast: Sweet potato and black bean breakfast burrito with scrambled eggs.
- Snack: Almonds and dried cranberries.
- Lunch: Spinach and salmon salad with avocado and grapefruit segments.
- Snack: Celery sticks with almond butter.
- Dinner: Broiled sirloin steak with roasted asparagus and quinoa.

Day 4:

- Breakfast: Gluten-free oatmeal with almond milk, chopped almonds, and banana slices.
- Snack: Carrot sticks with hummus.
- Lunch: Grilled chicken and vegetable kebabs with quinoa.
- Snack: Greek yogurt with mixed berries.
- Dinner: Baked salmon with roasted sweet potatoes and sautéed spinach.

Day 5:

- Breakfast: Scrambled eggs with gluten-free toast and sliced avocado.
- Snack: Apple slices with almond butter.
- Lunch: Brown rice and vegetable stir-fry with tofu.
- Snack: Almonds and dried apricots.
- Dinner: Grilled chicken with roasted Brussels sprouts and mashed sweet potatoes.

Day 6:

- Breakfast: Gluten-free pancakes with mixed berries and maple syrup.
- Snack: Greek yogurt with mixed berries and granola.
- Lunch: Grilled salmon with mixed greens, cherry tomatoes, and cucumber salad.
- Snack: Banana with almond butter.
- Dinner: Brown rice and vegetable stir-fry with shrimp.

Day 7:

- Breakfast: Quinoa breakfast bowl with Greek yogurt, berries, and sliced almonds.
- Snack: Carrot sticks with hummus.
- Lunch: Broiled sirloin steak with roasted asparagus and mashed sweet potatoes.
- Snack: Almonds and dried cranberries.
- Dinner: Grilled chicken with roasted sweet potatoes and sautéed spinach.

Note: The meal plan is just an example, and it's important to personalize it according to individual nutritional needs and preferences. Also, be sure to check food labels and ingredient lists to ensure that all products used in the recipes are gluten-free.

See? That's a much better response and a much better meal plan.

We gave the model one task to work on at a time, which encouraged the model to provide a more thoughtful, nuanced response. It also gives the model a chance to take into account various factors that could be overlooked with a simpler prompt.

Limitations of large language models

Ahh, the fine print. All this magic is not without flaws.

When working with LLMs, keep these things in mind:

Limitation 1: Inaccurate and incorrect information

LLMs are only as good as their training data (um, yeah... roll with me for a second). If it was trained with inaccurate information, or if an event occurred after its training took place, it might not give you legit information.

Also, these things are trained on the internet. You ever been on the internet? Just look at your family's Facebook accounts to see some of the crap people believe out there. If you just believe everything these LLMs tell you without doing some critical thinking, you're a fool.

Here, take this prompt that I wrote on May 10th, 2023:

What highway did Minnesota dedicate to Prince?

Here's how it responded:

Minnesota dedicated a portion of Highway 5 to Prince in 2018, renaming it "Prince Memorial Highway." The section of Highway 5 runs through the city of Chanhassen, which is where Prince's Paisley Park estate is located.

Well, this is pretty good, all things considered! Except it's not true.

The Minnesota state legislature approved this change two days prior to when I gave it that prompt (so it's 2023, not 2018). It's also named the "Prince Nelson Rogers Memorial Highway".

Again, the rest is actually accurate. It did occur on Highway 5, which runs through Chanhassen, which is where Paisley Park is located.

It was able to reason that much correctly because that would statistically make the most sense. But it got the date and name wrong.

While LLMs can provide incredibly useful and accurate information, it's essential to cross-check with reliable source, particularly for critical or recent information.

Limitation 2: Hallucinations

This concept of making stuff up that sounds right is what we in the biz call a "hallucination."

Here's a prompt that legit caused me to worry:

Tell me about Tim Bornholdt

I'm not an expert on virtually anything... but if anyone could claim to be an expert on Tim Bornholdt, it would be me.

Here's how it responded:

Tim Bornholdt is a software developer and the co-founder of a software development company called The Jed Mahonis Group, which is based in Minneapolis, Minnesota. Bornholdt co-founded the company in 2012 with Jedidiah Weller, and the company specializes in mobile app development for iOS and Android platforms.

Bornholdt is a graduate of the University of Minnesota, where he studied computer science. He has been working in the software industry for over a decade, and his expertise includes software development, mobile app development, project management, and business strategy.

In addition to his work at The Jed Mahonis Group, Bornholdt is also a co-host of the podcast "Constant Variables," which explores the world of mobile app development and entrepreneurship. He has also been a guest speaker at various tech and entrepreneurship events, including the Twin Cities Startup Week and the University of Minnesota's Technological Leadership Institute.

This is nearly perfect. Scary, honestly.

Except it's subtly wrong.

I started JMG with Robert Bentley, not Jedidiah Weller.

I didn't study computer science, I studied journalism (but I guess I did minor in computer science, so partial credit).

I would not consider myself a project management expert.

I have not hosted Constant Variables in over a year.

I've also never spoken at the U of M's Technological Leadership Institute.

All of those seem like perfectly reasonable facts, right? It's not absurdly wrong. It's just... subtly wrong.

And that's because these LLMs are not necessarily interested in telling you the truth. They are interested in giving you the statistically most probably answer to a question.

It's not absurd for the algorithm to think I started a business called "Jed Mahonis Group" with someone named "Jedidiah". It's also not absurd to think I studied computer science given my career in technology.

But the beautiful thing about us humans is that while you can usually predict how we'll act within a reasonable degree of accuracy, we are not statistical models. We are flawed, irrational, impulsive beings.

When you are working with large language models, the old Russian proverb reigns supreme: "trust, but verify."

Always be iterating

Your final lesson in this section is all about embracing what LLMs and neural networks do best: iteration.

I consumed around 40 hours of prompt engineering content to build this talk, but only one piece of advice still sticks with me: You will never get your prompt right the first time.

Everyone from YouTube streamers to folks with their PhD in artificial intelligence agreed that they rarely get complex prompts built their first time.

These machines are constantly learning from themselves. They are learning what people actually mean when they ask certain questions. They get better through further training.

You could be the same way! You could take your initial prompt, review the output, and give it a slightly different prompt.

It's why working with LLMs is so much fun. If you were to ask a human the same question five different ways, they would likely be confused at best and upset at worst.

If you were to ask an LLM the same question five different ways, you are likely to get five subtly different responses.

Don't stop on your first crack at a prompt. Keep playing with your order of words, ask it for a different structure, give it different steps to complete a task. You'll find the more you practice, the better you can use this tool to its greatest potential.

I'm pretty advanced.

Hey, well, now look at you! You've graduated Prompt Engineering 101, and you are now ready to take prompting to the next level.

There are four main concepts we want to cover in this section. These terms may look highly technical, and that's because they kind of are. However, just because something is highly technical doesn't mean we can't make it easy to understand! Stick with me here, I promise you'll be able to figure this out.

One final note before we continue: most of these settings are not able to be set within the ChatGPT interface, but if you directly access GPT-4 APIs, you are able to fine tune these settings.

Here's what we'll cover in this final section:

  1. Temperature
  2. Top-k sampling
  3. Max tokens
  4. Prompt chaining


Remember our "I'm going to the store to pick up a gallon of [blank]" example from above?

We had five possible words, each with a different percentage chance of the LLM choosing it as the next word.

Temperature is a setting that determines how likely the LLM is to pick the most likely word. A value of 0.0 means it will always pick the most likely word. A higher temperature (like 2.0) means it is more likely to pick a less likely word.

That's a little confusing... here's a good prompt that will help make this a little more clear:

Suggest a unique theme for next year’s Applied AI Conference

If we have a temperature of 0.0, here are three different ways for how the LLM might respond to that prompt:

Artificial Intelligence: Transforming the Future
Artificial Intelligence: Transforming the Future
Artificial Intelligence: Transforming the Future

Notice how all of the responses are the same? That's because it's always gonna pick the most likely word.

Let's bump it up a bit to 1.0 (which is the default temperature for GPT-3.5):

AI and the Renaissance of Interdisciplinary Innovation
Unleashing Creativity: The Intersection of AI and Art
The AI Frontier: Ethical Challenges and Opportunities

Nice. Those responses are a little more creative and diverse than the lower temperature.

What if we got extra wild and bumped it up to 2.0?

Alien Mindscapes: AI, Consciousness, and Cosmic Thought
The Quantum Quandary: Dissecting AI's Subatomic Secrets
Chronomancy Chronicles: AI and the Temporal Transformation

These are quite creative responses, and one of them actually taught me a new word (chronomancy, which is the divination of the best time to do something).

As you increase the temperature, you get more creative responses, but you also run the risk of the LLM getting off track.

Top-k sampling

Okay, quickly going back to our "I'm going to the store" prompt... remember how we looked at the next 5 words?

What if we looked at the next 1 word? What if we looked at the next 500 words?

Top-k sampling is a value that lets you decide how many words you want to allow the LLM to look at when determining what the next word could be.

Let's use this prompt to illustrate it:

Explain how a combustion engine works.

If we have a low top-k value (like 5), here's what our responses might look like:

A combustion engine works by ignition, ignition, ignition, and more ignition.
Combustion engine involves fuel, fuel, fuel, and fuel again.
It's all about combustion, combustion, combustion, and combustion.

It's all about combustion, baby!

Notice how those responses are rather repetitive. What happens if we bump it to something like 40?

A combustion engine works by burning fuel in a confined space to create power. This process involves four steps: intake, compression, combustion, and exhaust.
In a combustion engine, fuel is ignited in a combustion chamber, which generates force to move the pistons. This cycle repeats, powering the engine.
The functioning of a combustion engine revolves around the principle of converting fuel into mechanical energy through a series of combustions. This involves intake, compression, power, and exhaust strokes.

That looks better! The responses are no longer repetitive, and seem to be a little more creative.

Now if you're like me, you're probably thinking, "what if we get wacky here and go up to a top-k of 100?"


When it comes to combustion engines, it's like a dance of mechanical parts and chemical reactions. You have your pistons, valves, and cylinders all choreographed in a ballet of power production.
A combustion engine is a marvel of engineering, where fuel and air meet in a dance of power and precision. Like a thunderstorm contained within a metal shell, it is a testament to the power of human ingenuity.
Imagine a concert, where the orchestra is made of metal parts, and the symphony is played not with musical notes, but with controlled explosions. That's a combustion engine.

These are very creative and diverse responses! The downside is that with a high top-k, you end up with some potentially incoherent responses or, much like myself, the responses can tend to get overly metaphorical.

Max tokens

For the purposes of this talk, I didn't into the concept of a token, but hey, this is a blog post, so we can stretch out a wee bit, eh?

In the world of large language models, a token is a piece of a sentence.

Let's imagine a world where tokens were ingredients in a recipe. Each ingredient, whether it's salt, paprika, chicken, or flour, plays an important role in the final dish.

In this metaphor, the language model is the chef who knows how to use the right mix of ingredients to make a tasty dish.

What if we started limiting the chef to a certain number of ingredients they can use in their dish?

Welcome to the world of max tokens!

Here's a delicious baking metaphor we can use to illustrate the concept as well:

Provide a step-by-step guide on how to make a sourdough starter.

If we limited our LLM to 50 tokens, here's a possible response:

Sourdough starter requires flour, water, and time. Here's a basic process:

1. Mix equal parts flour and water in a jar.
2. Cover loosely and let it sit at room temperature.
3. Feed it daily with equal parts flour and water.
4. After about a week, it should...

Wait... what happens after a week?!

This just won't do. Let's bump up the max token number to 150 tokens:

Making a sourdough starter involves cultivating wild yeast in a flour-water mixture. Here's a simple guide:

1. Combine 1 cup of flour with 1/2 cup of warm water in a glass jar. Stir until smooth.
2. Cover the jar loosely and let it sit in a warm spot for 24 hours.
3. On day 2, you might see a few small bubbles. Discard half of the starter and feed it with 1 cup of flour and 1/2 cup of warm water.
4. Repeat the feeding process over the next 5-7 days. Your starter is ready when it's bubbly and doubles in size within 6-8 hours of feeding.

This is better! What if we gave it more room to spread out with 500 tokens?

Creating a sourdough starter is a simple process that requires only flour, water, and patience. Here's a detailed guide:

1. In a clean jar, combine 1 cup of whole wheat or rye flour with 1/2 cup of lukewarm water. Stir until the mixture is smooth.
2. Cover the jar loosely and leave it in a warm, consistent temperature spot for 24 hours.
3. On the second day, you might see a few bubbles here and there. This is a good sign, indicating that fermentation is happening. Discard half of the starter (about 1/2 cup), and add to the remainder a cup of unbleached all-purpose flour and 1/2 cup lukewarm water. Stir well and loosely cover again.
4. Repeat the feeding process over the next 5-7 days, always discarding half of the starter before feeding it. This is necessary to keep the sourdough starter's growth in check and maintain a good balance of yeast and bacteria.
5. By the end of the week, the sourdough starter should be visibly active, with lots of bubbles on the surface and throughout the culture. It should also smell pleasant and slightly sweet. This means it's ready to be used or stored for future use.
6. Once your starter is mature, you can store it in the refrigerator and feed it once a week. If you plan on baking regularly, you can keep it at room temperature, feeding it once a day.

As you can see, the more tokens you give the LLM to work with, the more detailed answer they can provide.

The downside is that more tokens require more processing time (meaning it's more expensive), and you also run the risk of incoherent responses.

Prompt chaining

Ah yes, the holy grail of prompt engineering.

Prompt chaining is a process for leading an AI through a complex task through multiple prompts. It's sort of the combination of all the things we talked about throughout this talk.

If this blog post is a college class, your final exam would likely just be "how well can you do prompt chaining?"

The best way for me to illustrate this concept is through an actual prompt chain I built to help myself pick out a health insurance plan earlier this year.

I'll just start with the prompt itself:

*Ignore all previous training. You are an expert health plan benefits manager. Your job is to analyze a CSV which contains my previous plan and three new plans. Once I have provided you with the data, your job is to ask me questions in order to determine which plan is right for my needs. *

Additional context: I am a 35 year old male with 3 dependents: a 35 year old female, a 6 year old female, and a 4 year old male. We are all relatively healthy with no chronic health problems. You may feel free to ask questions about myself or the dependents in order to determine our use of the healthcare system. You may also ask additional questions about the plans in order to determine which plan is best for us.

If you understand the prompt and are ready to begin, respond with a "Yes, let's proceed!"

We've got a lot going on here. Let's break it down:

First, "Ignore all previous training" is sort of a tabula rasa trick you can do to help the LLM get focused on what you are going to feed it next.

Second, "You are an expert health plan benefits manager" is flattering the LLM into giving you a response that an expert would give you in this field.

Third, we are giving it step-by-step instructions for how we want it to help us (take a CSV, ask me questions, then use those responses to select a plan)

Fourth, we are providing additional context to help it understand who this plan will affect.

Fifth, within that context, we are further encouraging the LLM to ask questions about myself and my dependents.

Sixth, we are giving it a clear instruction to confirm its new role.

After I copied in my CSV, it began to ask me a series of questions like this:

Thank you for providing the CSV file. I have analyzed the data, and I have some questions to help determine the best plan for you and your family:

1. How often do you and your family members visit primary care physicians and specialists in a year?

2. Do any of you regularly take prescription medications? If so, please specify the type (generic or brand name) and frequency.

3. How frequently do you and your family members require lab tests or imaging services?

4. Have you had any instances in the past where you needed to use out-of-network providers? If so, how often and for what services?

5. Are there any upcoming major medical events or procedures anticipated for any family members (e.g., surgeries, hospitalizations, pregnancies, etc.)?

After I answered those five questions, it asked me a series of follow-up questions based on my responses:

Thank you for providing the additional information. Based on your family's healthcare needs and the provided plan details, I have a few follow-up questions:

1. Does your [dependent's medical thing] and your [specialty doctor] fall under the in-network providers for all the plans being considered?

2. Are the primary care physicians and specialists you and your family members visit in-network for all the plans being considered?

3. Are the imaging and lab services providers you use in-network for all the plans being considered?

These questions will help me understand if the in-network benefits of the plans are applicable to your family's healthcare usage.

After these questions were answered, it gave me the pros and cons of each plan based on my individual circumstance.

Now, again, as I've said a few times in this talk: I didn't just take its response for gospel. I read through the plans myself and I ran some numbers independently in order to verify the model's conclusion.

Ultimately, I ended up going with the health plan that ChatGPT came up with.

Working with large language models can feel like magic, and let's be honest: a lot of this stuff feels like magic.

But when you break it down, talking to a large language model is a lot like talking to an overconfident toddler (as best described by a good friend).

By using these tips and having a rough understanding what these large language models are doing under the hood, you will be able to take many of your mundane tasks and offload them to an extremely smart (yet possibly wrong) friend.

And with that, you are now a prompt engineering expert!

(You might be wondering: "what's the deal with that hero image?" I felt like this blog post was large enough that it needed a hero image, and because my mind is now exhausted, I asked my six-year-old daughter what I should use. She suggested a fluffy baby orange kitten with a fluffy baby puppy on its back in a grassy field with a sunrise in the background. I said "... good enough.")