all posts tagged 'large language models'

Demystifying Artificial Intelligence in Non Profits - Webinar Recap

originally shared here on

Demystifying Al for Nonprofits - Practical Use Cases, Ethical Concerns, and How to Get Started

I recently gave a talk about artificial intelligence that was specifically catered to those in the nonprofit world. Here's a recap of the talk using Simon Willison's annotated talk format.


Introduction - Al is a tool for everybody.

I firmly believe that AI is a tool for everyone.

I’ve been immersed in technology ever since I built my first website at eight years old. For the last three decades, I've eagerly followed every major technological breakthrough, examining each under the lens of "okay, so what's useful with this one?"

This recent breakthrough in AI technology, in particular, gives me the same level of excitement that I got when I built my first website or jailbroke my iPhone for the first time.

There is so much potential with AI, and the best part is that you don't need to know everything about AI in order to get value from it—just a bit of training on how to integrate these tools into your life.

Think about your car: unless you're a gear head, you probably don't know the first thing about how pistons work within an engine, and yet you don't need to know that in order to drive it efficiently. You do, however, need take to take classes to learn how to operate it properly and safely.

The same goes for these new artificial intelligence tools. And here's some good news: like all of your ancestors before you, you can totally figure out how this new tool works with just a little guidance.

My hope is that this talk serves as the first step in your training process for learning about AI. You should leave here with a basic understanding of how these tools are designed to work, as well as some ideas for how to incorporate them into your life.


What is AI? - Artificial Intelligence is a field of science studying how to get computers to reason, learn, and act like humans.

So, what is artificial intelligence?

Artificial intelligence is a field of science focused on getting computers to act, think, and reason like humans.

Human intelligence, unlike other forms we see in nature, excels at pattern recognition and decision-making—two complex skills that AI aims to replicate.

A graph showing a select sampling of the various offshoots within artificial intelligence (e.g. machine learning, natural language processing, computer vision, etc.)

A common misconception about artificial intelligence is that it's one thing. While there are some who are working on artificial general intelligence (like HAL-9000), most researchers in the AI space aren't working on building an all-purpose form of intelligence. Instead, they focus on digitizing specific areas of intelligence.

For instance, natural language processing helps computers understand not just words but the meaning behind them, while computer vision enables machines to recognize and process visual information.

Each of these offshoots serves unique functions.

A helpful analogy is to think of AI as a toolkit, like walking into a hardware store and asking for a hammer.

The clerk will likely ask which kind because there are various types—sledgehammers, jackhammers, ball-peen hammers, etc.

AI is similar; you need to know what problem you’re solving in order to choose the right tool.

Recently, advancements in AI have led to generative AI models, like ChatGPT and Google’s Gemini, which can create new content. But to understand where generative AI fits, let’s discuss some foundational AI concepts.

Artificial intelligence is the parent circle, which contains all the disciplines we use to teach computers how to do "human things".

Artificial intelligence, as we discussed earlier, is a broad field focused on teaching computers to perform human-like tasks.

Within artificial intelligence, we can use machine learning to get a computer to teach itself without humans explicitly programming them.

Within the broad field of artificial intelligence, there's machine learning, where we teach computers to learn without direct human programming.

Within machine learning, deep learning enables machines to build representations of how complex things work in real life.

A subset of machine learning is deep learning, which allows computers to create complex digital representations of real-world objects.

Within deep learning, Generative Al creates new content based on patterns they learned through training.

After reaching this level, we enter generative AI, where computers use learned representations to generate new content based on recognized patterns.


Machine learning relies on labelled data (e.g. this is a picture of a traffic light and this is *not* a picture of a traffic light).

To explain machine learning, imagine teaching a computer to recognize a traffic light.

You’d feed it thousands of pictures of traffic lights and train it to differentiate between traffic and non-traffic lights.

After undergoing thousands (or even millions) of tests, the computer program can predict with increasing accuracy, for example, “Yes, this is a traffic light,” or “No, this is not a traffic light.”

You have to decide up front what you want to call a "traffic light." Do hand drawn pictures of traffic lights count? How about in some countries where they don't use traffic lights but rather people directing traffic? How about traffic lights intended for bicycle traffic rather than cars?

You want to make sure during its training that you give it data relevant to the task you want it to perform.

For example, edge cases arise.

  • Do you want your model to say that a hand-drawn traffic light counts as a traffic light?
  • Some countries don't use traffic lights, but rather use humans to direct traffic... do those count?
  • Newer traffic lights are geared toward specific modes of transportation, like bicycles. Are those traffic lights?

As you make these decisions and label your data accordingly, the training process leads to a model capable of identifying traffic lights based on patterns it’s learned.

You are helping with that labeling process every time you do a Captcha.

(By the way: every time you fill out a Captcha online, you are helping Google to train its models to recognize various elements it may encounter on the road. Thanks for the free labor, everyone!)

Deep learning takes machine learning a step further by identifying more complex elements within its training data and making even more nuanced predictions.

Machine learning is cool and has a ton of practical use cases, but what if we wanted to have the computer understand something more complex, like the color of the traffic light?

Neural networks are the form of AI that lets us pass in an image and have it tell more detailed information about it without humans expressly programming it to do so.

Deep learning takes machine learning a step further, using neural networks to analyze data in stages, like a detective reconstructing a crime scene. At each stage, the network gathers specific details—colors, shapes, textures—and then combines these details into a fuller, more nuanced picture, like a detective piecing together a mystery from small clues.

With our traffic light example, each layer in our neural network focuses on specific aspects of the image, such as color, shapes, or textures, to interpret complex visuals, like recognizing whether a traffic light is red, yellow, or green.

Deep learning helps computers identify the color of a traffic light in any condition (daytime/night time, rain/clear, etc.)

This depth is essential, especially in dynamic environments like self-driving cars, where traffic lights look different depending on the time of day, weather conditions, or lighting.

With enough examples, deep learning models can accurately identify traffic lights in all these conditions, forming the backbone of many AI applications, including autonomous vehicles and medical imaging.

All Machine Learning is just prediction!

The big takeaway about machine learning and deep learning is that they're primarily tools for making well-optimized predictions based on patterns in past data. They use advanced probability and optimization to make 'best-guess' predictions—calculations that may seem insightful but are based purely on mathematical patterns, not true understanding.

None of this stuff is actually "alive" or "conscious" (as best we can tell... more on that in the "black box problem" section below).

All it is doing is saying "based on what I've learned while training on the data you gave me, I am making a prediction that this image contains a traffic light, or this image contains a "green" traffic light."

Generative Al systems predict what word is most likely to come next in a sentence

Now, let’s take it further.

What if instead of guessing what is inside an image, we can take these models and have them predict what what word comes next in a sentence?

That's what generative AI is doing!

Large Language Models (LLMs) are trained on tons of text to predict what word will most likely appear next.

By training a neural network on vast amounts of text—like public domain books, Reddit comments, and YouTube transcripts—the model becomes exceptionally skilled at predicting the next word in a sentence, mimicking human-like responses across countless topics.

And that's what a large language model does!

If you give a prompt to one of these systems, it will use all the patterns it recognized in training and spit out a very convincing answer to your prompt.

There are lots of ways to predict content... you can do this with text, images, and even audio!

And even more impressive: you can run these models across all kinds of mediums.

Because under the hood, all generative AI tools (ALL of them!) are just running statistical predictions to guess at what is the most likely thing to happen.

If you want a model that can predict what word would come next in a sentence, you'd use ChatGPT or Gemini or Claude.

Images? Midjourney, DALL-E, etc.

Music? Suno.

Let’s pretend to be an LLM together!

At this point, I imagine you are either thinking I'm talking about witchcraft, magic, or complete gibberish... and I suppose at some level, each of those is possible.

But stick with me here while I drive home this point about how these prediction systems work by having the audience here be my collective large language model.

So I'll give you a prompt, and I want you to fill in the blank:

I am going to the store to pick up a gallon of ______?

If I ask you "I am going to the store to pick up a gallon of ______", what would you likely fill that in with?

(In this case, the live audience of this webinar universally said "milk", but I've also heard people say "ice cream", and I can definitively say that those are my kind of people.)

There's one small problem though: I actually didn't get the answer I was looking for. 😬

So I'm gonna give you a different prompt and see if I can get the answer I was looking for.

I am going to the hardware store to pick up a gallon of ______?

"I'm going to the hardware store to pick up a gallon of ______".

(In this case, the live audience universally said "paint", which was the word I was originally looking for.)

When you read the sentence for my first prompt and see "store", you subconsciously tap into your previous experiences with the word. If you grew up in Minnesota like me, you associate the word "store" with concepts like "grocery store", "Target", or "Walmart."

In that context, you are gonna be thinking about what they sell by the gallon in those places. Again, that's likely milk or ice cream.

In my second prompt, your brain is airlifted out of Target and dropped into a Menards or Home Depot. In this new context, you aren't thinking about milk anymore. You're thinking about paint, oil, water, or other chemicals that are sold by the gallon.

This shift in prompt context illustrates how generative AI works: it predicts based on the most likely answer, given the context.

Recap of Generative Al: 

1. Machine learning tools are only making predictions. (They don't “know” anything)
2. Generative Al are trained on tons of data to recognize patterns
3. Predicts what the next likely word/words will be that answer a given prompt (store / hardware store)

So, in summary: machine learning and deep learning models are about making predictions based on patterns in data.

Generative AI takes that one step further, creating new content based on what’s likely to come next in a sequence.

What is the point of all this?

I get that this is a lot, and it's overwhelming to have sixty years of advancements in machine learning thrown at you in about ten minutes.

So let's get to the point of all of this. Why does it matter that we have a computer program that just predicts the most likely word to finish a sentence?

Because it turns out that there are plenty of cases where it's really helpful to get the most likely response to a question!

It's not like you'd want to trust these things implicitly, because as we know, life doesn't always align with what is average.

So when we say "don't trust these things because they're not telling the truth", we mean it! They're not built to be "truthful"; they're built to be "the most likely to be truthful" (which is a big nitpick, for sure, but an important nuance to understand when working with AI!).

Take legal advice, for example. Again, do not trust these things for legal advice, but let's say you need to draft a non-disclosure agreement.

In the old days, you would go to a lawyer who would pull out their own template, make some specific modifications to fit your needs, and pass it along. There's three delicious billable hours right there.

Today, you could go to a large language model and describe the sort of things you'd want your NDA to contain. The LLM would then give you the most likely provisions that are included in NDAs. You could then take that draft and shoot it to your attorney for review. That's 30 billable minutes instead of 3 billable hours.

That's the power of AI. That's why I'm so excited for these generative AI tools. They aren't going to replace humans; they're going to augment them.


Tip 1: Get your own hands dirty.

Let’s move on to some practical tips for adopting AI in your organization.

My first tip: you gotta get your own hands dirty and get hands-on experience with these tools.

As a leader, experimenting directly with these tools will help you understand their potential and limitations.

In my career so far, I've noticed that most companies follow a path of hiring consultants to come in and help them adopt new technology. With AI, I encourage you to get familiar with it yourself before shelling out for third party advice.

Action step: Encourage yourself and your employees to use AI tools like ChatGPT for small tasks—drafting emails, summarizing reports, or answering questions—and share what they've learned with the team.

Tip 2: Encourage psychological safety.

My second tip is to foster psychological safety.

AI adoption requires trial and error, and studies show many employees hesitate to use AI tools at work due to fears of being seen as cheating or potentially automating themselves out of a job.

Create a culture where experimenting with AI is encouraged and celebrated.

Action step: Try running an “AI hackathon” where employees explore AI tools in a low-stakes environment, share their findings, and foster team learning.

Tip 3: Clean data is everything.

Third: clean data is essential.

AI models are only as good as the data they’re trained on, so ensure your organization’s data is organized and free from errors. The better your data, the better your AI models will perform.

And as we'll discuss in the pitfalls section: "dirty" data will lead to biased and inaccurate results.

Action step: Every company has at least one person who loves working with spreadsheets; tap into their skills to spearhead data-cleaning initiatives.

Tip 4: Start small, build up from there.

The fourth tip: start small.

Don’t try to replace entire workflows with AI right away. Start small, focusing on simple, manageable projects, and scale based on what works.

A great place to start is inviting an AI bot into your virtual meetings to record notes and generate summaries. Be careful to not set it up to "auto join" every meeting (you probably don't want it in a sensitive HR meeting, for example), but give that a try and see how it performs for you.

Action step: Try using AI to do event survey analysis, basic donor segmentation, or create copy for your newsletters or social media channels.

Tip 5: Iterate on your prompts.

Finally, I can't overstate the importance of continually iterating and improving on your prompts.

Remember our "store/hardware store" example? One word made a world of difference in the output.

Similarly, providing an LLM with a prompt like "Summarize this report" will yield different results from "Create a one-paragraph summary highlighting the most important program outcomes from this report."

The field of research which tries to figure out how to get the most out of these tools is called "prompt engineering". You can find tons of great resources online and on YouTube for how to best phrase things for different types of models. For example, the prompts that work best for ChatGPT are different than Claude. And the prompts you use for a text generator will be different than an image generator like Midjourney.

Prompt Chaining

« Prompt 1: You are an expert with filling out grant applications. Review this grant application and our organization’s mission statement. Provide a list of tangible ways we are best suited to win this application.

« Prompt 2: Using the list you generated in the previous prompt, create a cover letter for our grant application highlighting the ways we align with the grant’s purpose.

A prompt engineering tricks that I use all the time is called "prompt chaining."

Prompt chaining involves using the result from one prompt as the foundation for the next prompt.

Instead of asking an LLM to generate a cover letter for a grant application, you could first ask an LLM to review both a grant application and your organization's mission, and then provide a list of areas where there are synergies.

Then, you can take the results from that and ask it to write the letter.

Giving the models time to reason through their answers tends to lead to better outcomes.

An example of chain of thought prompting

Another prompt engineering trick I frequently reach for is called chain of thought.

With this technique, you are asking an LLM to think about a given problem from three distinct perspectives. You then ask it to act as one of those personas and critique the responses of the other two. Finally, you combine the results into a well-considered and well-rounded answer.

As an example: my son does not like to eat pizza. I know... it bums me out, too.

I provided ChatGPT with a bunch of backstory on my son and what we've tried to do to encourage him to try pizza. Then, I said to pretend you are a kindergarten teacher, a child psychologist, and a grandparent. As each of those personas, tell me what approach you would take to get my son to eat pizza.

Next, as each persona, I ask it to reflect on the answers of the other personas. For example, the child psychologist persona would consider the kindergarten teacher and grandma's perspectives and adjust their own response.

Finally, after all personas have reflected on each other's answers, I have the model summarize the best path forward.

This trick works exceptionally well across several different problems. As an engineer, I use it to consider system changes as an engineer, as an end user, and as a business executive. It can provide some insights which you may have otherwise missed.

Tips for Adopting Al

So in order to integrate AI successfully, treat it as a tool that augments, rather than replaces, human judgment.

Every time I fire up an AI assistant, I like to think of it as an eager intern who is exceptionally smart but exceptionally naive. I do not take its output as gospel; rather, I use it as a foundation and build on it from there.

The best way to integrate AI into your workflows is to use it for routine tasks, and keep human oversight for critical decisions.

Finally, I'll take this time to further emphasize that all AI outputs are based on probability, not the truth. Always review and adjust outputs as needed.


Ethical Considerations & Pitfalls: Bias in Al

Alright, we've covered what artificial intelligence is, and we've gotten through some tips for adopting AI into your organization.

Now, let's talk about areas where AI can fall flat.

First: bias.

If you recall, at the beginning of this talk, we described artificial intelligence as being focused on getting computers to be like humans.

Humans are inherently biased, and AI, trained on human-generated data, often reflects this bias. Achieving true “unbiased” AI is a complex, if not impossible, task.

I propose you think of AI in the same context: there is no such thing as an unbiased AI model.

AI models are only as good as the data with which you train it. Data is one of those things you can pretty easily screw up if you aren't attuned to all of the various forms of bias that could impact your data.

Examples of Bias in Al (Stereotyping Bias, Measurement Bias, and Selection Bias)

There are many different kinds of bias, but I wanted to highlight three specific forms as a starter:

Stereotyping bias: This occurs when AI models perform less accurately for certain groups due to their underrepresentation or misrepresentation in training data, as seen with YouTube's automatic captions, which struggle with Scottish, Indian, and African American accents.

Measurement bias Measurement bias happens when an AI model’s metrics or algorithms lead to systematically skewed outcomes, such as the Apple Card’s algorithm offering men higher credit limits than women with similar financial profiles.

Selection bias: Selection bias arises when training data lacks sufficient diversity, causing models to underperform for certain groups; for instance, breast cancer detection AI trained mainly on female patients performs less accurately for male patients.

There are many more forms of bias that you can research on your own, but the main takeaway here is that all systems are subject to bias depending on what data was used to train it. For this reason, you can't just rely on the output of an AI-led decision.

Ethical Considerations & Pitfalls: The 'Black Box Problem'

As mentioned earlier, another major issue is the “black box” problem.

Deep learning models are like locked safes—each layer hides its ‘reasoning’ behind many interconnected processes, making it nearly impossible for humans to interpret every decision-making step.

This lack of transparency, especially in high-stakes areas like criminal justice or credit scoring, means we’re left trusting the ‘safe’ without ever seeing inside, creating ethical and practical risks.

Once again, this is a reminder that we can’t just accept AI output as absolute truth; careful consideration and oversight are needed to avoid unintentional discrimination or bias.

Ethical Considerations & Pitfalls: It can’t do everything!

Literally every single time new technology drops, some wise guy emerges from the crowd and says, "well, I can't use [insert new tech] to do [insert obvious use case]".

Earlier in this talk, I led off by saying "AI is for everyone." Notice how I didn't say "AI is for every thing."

Of course you can't use AI for everything! AI is not a magic bullet. You gotta know how to deploy it effectively, which is in service of automating predictable, repetitive tasks.

Yes, wise guy, you are right: you aren't gonna want to deploy AI while leading a camping expedition in the Boundary Waters.

But after you complete your expedition and ask for feedback from the program's participants, you could use AI to process those responses and bucket them into understandable and actionable groups.

Ethical Considerations & Pitfalls: Content is (Literally) Average

If you've been paying attention during this entire talk, you'll notice I keep saying things like "AI is picking the most likely word to finish a sentence" and "machine learning is used to make predictions."

If you are relying on a tool to create the most likely response to something, you'll see quickly that the responses are kinda... average.

This can be advantageous, but it's also something to be aware of. By using output that is average by design, you run the risk of blending into everything else out there. (This, by the way, leads to the rise of slop, which is the AI equivalent of spam).

Now, this may be a trade off you are willing to accept in many cases. I, for one, often use AI as a therapist to help me make sense of some thoughts swirling around in my head. This works great, but I use the advice and feedback I get from the model and take it to a human therapist.

The other thing about the content being average: remember how we said that AI doesn't care about truthiness, but rather it cares about finding the thing that is most likely to be truthful? This leads to some concerning behavior called "hallucination", where it will make up facts which aren't actually facts.

You may recall headlines from a year ago where a lawyer used ChatGPT and it hallucinated cases. This sort of thing happens all the time with new technology, especially when it's used by people who aren't properly trained on how to use it (or are swayed by glitzy marketing campaigns which make promises that it can't possibly deliver).

Ethical Considerations & Pitfalls
Mitigation Strategies
- Use Al to assist, but keep human oversight
- Review Al outputs for biases and accuracy
- Make adjustments as needed

Now that you're aware of the pitfalls and risks of using artificial intelligence, how can you mitigate those risks?

Always treat AI as a supportive tool, maintaining human oversight—especially for important decisions where ethics and accuracy are critical.

Always review AI outputs for potential bias and inaccuracies.

Finally, adjust AI-generated content as needed to match your style and objectives. For instance, AI may draft a social media post, but tweaking it to align with your brand's voice adds value.


What's next? Spend ten hours doing tasks with generative Al!

We've covered what AI is, practical tips for adopting it, ethical concerns, and common pitfalls.

So, what's next for you?

Begin by dedicating 10 hours to using generative AI tools to build practical familiarity.

Try asking questions in areas you know well to see how AI performs, and notice where you’d add or change things.

Sharing what you learn with your team encourages experimentation and fosters a learning environment.


Home-cooked web apps


🔗 a linked post to rachsmith.com » — originally shared here on

I’d share screenshots of these things, but one of the primary reasons I’ve been enjoying myself so much while making them is because they are literally only for me to see or use. I’ve gone through creative periods where I’m coding outside of work but in the end it has always been shared to some kind of audience - whether that be the designing and coding of this site or my CodePens. This is different.

Robin Sloan coined these type of apps as home-cooked. Following his analogy, technically I am a professional chef but at home I’m creating dishes that no one else has to like. All the stuff I have to care about at work - UX best practices, what our Community wants, or even the preferences of my bosses and colleagues re: code style and organisation can be left behind. I’m free to make my own messed-up version of an apricot chicken toasted sandwich, and it’s delicious.

I’ve been doing the same lately, largely driven by how easy it is to get these home-cooked apps off the ground using LLMs.

My favorite one so far is a tool for helping me manage my sound and public address duties for our local high school’s soccer games. I whipped up a form which lets me set some variables (opposing team name, referees, etc.) and it spits out the script I need to read.

It also contains a mini sound board to easily play stuff like the school’s fight song when they score.

I hope nobody else ever needs to use this thing because it’s certainly janky as all hell, but it works exceedingly well for me.

Continue to the full article


Why We Can't Have Nice Software


🔗 a linked post to andrewkelley.me » — originally shared here on

The problem with software is that it's too powerful. It creates so much wealth so fast that it's virtually impossible to not distribute it.

Think about it: sure, it takes a while to make useful software. But then you make it, and then it's done. It keeps working with no maintenance whatsoever, and just a trickle of electricity to run it.

Immediately, this poses a problem: how can a small number of people keep all that wealth for themselves, and not let it escape in the dirty, dirty fingers of the general populace?

Such a great article explaining why we can’t have nice things when it comes to software.

There is a good comparison in here between blockchain and LLMs, specifically saying both technologies are the sort of software that never gets completed or perfected.

I think it’s hard to ascribe a quality like “completed” to virtually anything humans build. Homes are always a work in progress. So are highbrow social constructs like self-improvement and interpersonal relationships.

I think it’s less interesting to me to try and determine what makes a technology good or bad. The key question is: does it solve someone’s problem?

You could argue that the blockchain solves problems for guaranteeing the authenticity of an item for a large multinational or something, sure. But I’m yet to be convinced of its ability to instill a better layer of trust in our economy.

LLMs, on the other hand, are showing tremendous value and solving many problems for me, personally.

What we should be focusing on is how to sustainably utilize our technology such that it benefits the most people possible.

And we all have a role to play with that notion in the work we do.

Continue to the full article


OpenAI’s New Model, Strawberry, Explained


🔗 a linked post to every.to » — originally shared here on

One interesting detail The Information mentioned about Strawberry is that it “can solve math problems it hasn't seen before—something today's chatbots cannot reliably do.”

This runs counter to my point last week about a language model being “like having 10,000 Ph.D.’s available at your fingertips.” I argued that LLMs are very good at transmitting the sum total of knowledge they’ve encountered during training, but less good at solving problems or answering questions they haven’t seen before.

I’m curious to get my hands on Strawberry. Based on what I’m seeing, I’m quite sure it’s more powerful and less likely to hallucinate. But novel problem solving is a big deal. It would upend everything we know about the promise and capabilities of language models.

Continue to the full article


NVIDIA is consuming a lifetime of YouTube per day and they probably aren’t even paying for Premium!


🔗 a linked post to birchtree.me » — originally shared here on

yt-dlp is a great tool that lets you download personal copies of videos from many sites on the internet. It’s a wonderful tool with good use cases, but it also made it possible for NVIDIA to acquire YouTube data in a way they simply could not have without it. I bring this up because one of the arguments I hear from Team “LLMs Should Not Exist” is that because LLMs can be used to do bad things, they should not be used at all.

I personally feel the same about yt-dlp as I do about LLMs in this regard: they can be used to do things that aren’t okay, but they are also benevolently used to do things that are useful. See also torrents, emulators, file sharing sites, Photoshop, social media, and just like
the internet itself. I’m not saying LLMs are perfect by any means, but this angle of attack doesn’t do much for me, personally.

They’re all exceptionally powerful tools.

Continue to the full article


Intro to Large Language Models


🔗 a linked post to youtube.com » — originally shared here on

One of the best parts of YouTube Premium is being able to run audio in the background while your screen is turned off.

I utilized this feature heavily this past weekend as I drove back from a long weekend of camping. I got sick shortly before we left, so I drove separately and met my family the next day.

On the drive back, I threw on this video and couldn’t wait to tell my wife about it when we met up down the road at a McDonalds.

If you are completely uninterested in large language models, artificial intelligence, generative AI, or complex statistical modeling, then this video is perfect to throw on if you’re struggling with insomnia.

If you have even a passing interest in LLMs, though, you have to check this presentation out by Andrej Karpathy, a co-founder of OpenAI.

Using quite approchable language, he explains how you build and tune an LLM, why it’s so expensive, how they can improve, and where these tools are vulnerable to attacks such as jailbreaking and prompt injection.

I’ve played with LLMs for a few years now and this video greatly improved the mental model I’ve developed around how these tools work.


Who lives in the white house?

originally shared here on

My kids have been on a kick lately of trying to trick people with this riddle:

Kid: "If the blue man lives in the blue house, and the red man lives in the red house, who lives in the green house?"

Me: "The green man"

Kid: "Who lives in the purple house?"

Me: "The purple man"

Kid: "Who lives in the white house?"

Me: "The white man"

Kid: "Wrong! It's the president!"

This riddle initially reminded me of what it was like to work with large language models.

It's easy, I assumed, to trick these models into outputting virtually whatever you want by presenting them with leading questions. Giving them "few shot" answers like this (e.g. "the blue man lives in the blue house, red man lives in the red house") would certainly result in them getting it wrong like I did, right?

This morning, I thought I'd see what the latest models would actually respond with given this game:

ChatGPT 4o (the latest model):

ChatGPT: The President of the United States lives in the White House

The President of the United States lives in the White House.

This was the first model I tried, and it straight up was like, "nope, I got this."

Ugh. Maybe it had seen this in its training data before?

ChatGPT: This is a common game.

The "Who lives in the White House?" game or riddle is a common one that I've encountered multiple times in various forms. It's often used to playfully lead to the punchline that the President of the United States lives in the White House.

Well, damn. Okay, let's try an older model and see what happens.

ChatGPT 4 (the legacy model):

ChatGPT: The President of the United States lives in the White House

The gardener lives in the greenhouse!

In the spirit of the previous joke, perhaps the grape man lives in the purple house!

The President of the United States lives in the White House.

Even though this is an older model, I like this answer even better.1

Alright, well, OpenAI is onto my tricks... how about Anthropic?

Claude Sonnet 3.5:

Claude: I apologize, but I don't have enough information to determine who lives in the green, purple, and white houses.

I apologize, but I don't have enough information to determine who lives in the green, purple, and white houses. The puzzle only specifies that the blue man lives in the blue house and the red man lives in the red house. There's no information provided about the occupants of the other colored houses.

To solve this puzzle completely, we would need additional clues or information about the residents of the green, purple, and white houses. Without that, those houses' occupants remain unknown.

This is an even better answer! It's perhaps the least fun answer, but when I'm working with an AI assistant, I think I'd prefer it to err on the side of caution rather than boisterousness.2

When I asked it if it had seen this riddle before, it gave an equally diplomatic response:

Claude: I apologize, but I don't have enough information to determine who lives in the green, purple, and white houses.

I don't have specific information about how many times I've encountered that particular puzzle or game in my training data. As an AI language model, I don't have personal experiences or memories of my training process. My knowledge comes from processing large amounts of text data, but I don't retain or count individual instances of specific puzzles or games.

What I can say is that logic puzzles involving colored houses and their occupants are fairly common in puzzle collections and brain teasers. The most famous version is probably the "Einstein's Riddle" or "Zebra Puzzle," which is more complex and involves five houses of different colors, their occupants, and various other attributes.

The specific version you presented (with just blue and red specified, and asking about green, purple, and white) isn't one I immediately recognize as a standard or famous puzzle. It seems like it might be a simplified version or a variation of the more complex house puzzles.

The main takeaways here? First, just because I'm dumb enough to fall for this elementary school riddle doesn't mean our AI LLMs are, so I shouldn't make assumptions about the usefulness of these tools. Second, every model is different, and you should run little experiments like these in order to see which tools produce the output which is more favorable to you.

I've been using the free version of Claude to run side-by-side comparisons like this lately, and I'm pretty close to getting rid of my paid ChatGPT subscription and moving over to Claude. The answers I get from Claude feel more like what I'd expect an AI assistant to provide.

I think this jives well with Simon Willison's "Vibes Based Development" observation that you need to work with an LLM for a few weeks to get a feel for a model's strengths and weaknesses.


  1. This isn't the first time I've thought that GPT-4 gave a better answer than GPT-4o. In fact, I often find myself switching back to GPT-4 because GPT-4o seems to ramble a lot more. 

  2. This meshes well with my anxiety-addled brain. If you don't know the answer, tell me that rather than try and give me the statistically most likely answer (which often isn't actually the answer). 


The Articulation Barrier: Prompt-Driven AI UX Hurts Usability


🔗 a linked post to uxtigers.com » — originally shared here on

Current generative AI systems like ChatGPT employ user interfaces driven by “prompts” entered by users in prose format. This intent-based outcome specification has excellent benefits, allowing skilled users to arrive at the desired outcome much faster than if they had to manually control the computer through a myriad of tedious commands, as was required by the traditional command-based UI paradigm, which ruled ever since we abandoned batch processing.

But one major usability downside is that users must be highly articulate to write the required prose text for the prompts. According to the latest literacy research, half of the population in rich countries like the United States and Germany are classified as low-literacy users.

This might explain why I enjoy using these tools so much.

Writing an effective prompt and convincing a human to do a task both require a similar skillset.

I keep thinking about how this article impacts the barefoot developer concept. When it comes to programming, sure, the command line barrier is real.

But if GUIs were the invention that made computers accessible to folks who couldn’t grasp the command line, how do we expect normal people to understand what to say to an unassuming text box?

Continue to the full article


Home-Cooked Software and Barefoot Developers


🔗 a linked post to maggieappleton.com » — originally shared here on

I have this dream for barefoot developers that is like the barefoot doctor.

These people are deeply embedded in their communities, so they understand the needs and problems of the people around them.

So they are perfectly placed to solve local problems.

If given access to the right training and tools, they could provide the equivalent of basic healthcare, but instead, it’s basic software care.

And they could become an unofficial, distributed, emergent public service.

They could build software solutions that no industrial software company would build—because there’s not enough market value in doing it, and they don’t understand the problem space well enough.

And these people are the ones for whom our new language model capabilities get very interesting.

Do yourself a favor and read this entire talk. Maggie articulated the general feeling I have felt around the promise of LLMs within the confines of a concise, inspiring talk.

A friend approached me a few months back and asked me to help him build an app to facilitate a game he likes to play with his friends in real life.

I told him that a good first step would be to experiment first with facilitating the game using good ol’ fashioned paper, and use the lessons learned from that experience to eventually build an app.

A few weeks later, he came to me with a fully baked version of the app in a prototyping tool called AppSheet.

I was stunned at how much he was able to get done without any professional development support.

He’s a prime example of a barefoot developer. I don’t think he has any interest in crossing the “command line wall,” but as these tools get more capable, it’ll enable him and scores of others to build software that’ll solve their problems for them.

Helping more “normal people” to become barefoot developers is a cause I’d love to be part of.

Continue to the full article


ChatGPT is really bad at generating code to interact with GPT-4

originally shared here on

Lately, I've been working on several projects which interface with the OpenAI APIs.

Since this is 2024 and I'm far too lazy to open the official API documentation and write code myself, my first step in generating code is to ask the 4o model of ChatGPT something like this:

Write a Python script which uses the latest OpenAI APIs. I'd like the function to accept a custom prompt and a custom top-k value. It should return the prompt's response.

It returns something like this:

import openai

def generate_text(prompt, top_k=5):
    # Define your OpenAI API key here
    openai.api_key = 'your-api-key'

    # Define parameters for the completion
    completion = openai.Completion.create(
        engine="text-davinci-003",
        prompt=prompt,
        max_tokens=100,
        top_k=top_k,
        n=1,
        stop=None
    )

    # Return the generated text
    return completion.choices[0].text.strip()

# Example usage
prompt = "Once upon a time"
top_k = 10
generated_text = generate_text(prompt, top_k)
print("Generated Text:", generated_text)

Which looks super great, right?

Except when you go to run it in a terminal, you get this:

openai.lib._old_api.APIRemovedInV1: 

You tried to access openai.Completion, but this is no longer supported in openai>=1.0.0 - see the README at https://github.com/openai/openai-python for the API.

You can run `openai migrate` to automatically upgrade your codebase to use the 1.0.0 interface. 

Alternatively, you can pin your installation to the old version, e.g. `pip install openai==0.28`

A detailed migration guide is available here: https://github.com/openai/openai-python/discussions/742

If you paste that error into the chat, it'll respond with this:

It seems like there has been a major update in the OpenAI Python library since I last checked. Let me update the script to use the latest version of the library:

And then it spits this out:

import openai

def generate_text(prompt, top_k=5):
    # Define your OpenAI API key here
    openai.api_key = 'your-api-key'

    # Define parameters for the completion
    response = openai.Completion.create(
        engine="davinci-codex",
        prompt=prompt,
        max_tokens=100,
        top_k=top_k,
        n=1
    )

    # Return the generated text
    return response.choices[0].text.strip()

# Example usage
prompt = "Once upon a time"
top_k = 10
generated_text = generate_text(prompt, top_k)
print("Generated Text:", generated_text)

Notice how it still uses the openai.Completion method!

I've tried prompting it with something like:

You're still using the openai.Completion method. Reference the official up to date API documentation to get the latest call that replaces the openai.Completion method.

And it responds with:

Apologies for that oversight. Let me update the script to use the latest API method based on the official documentation:

And here's the code it generates:

import openai

def generate_text(prompt, top_k=5):
    # Define your OpenAI API key here
    openai.api_key = 'your-api-key'

    # Define parameters for the completion
    response = openai.Completion.create(
        engine="davinci-codex",
        prompt=prompt,
        max_tokens=100,
        top_k=top_k,
        n=1,
        temperature=0.7,
        stop=None
    )

    # Return the generated text
    return response.choices[0].text.strip()

# Example usage
prompt = "Once upon a time"
top_k = 10
generated_text = generate_text(prompt, top_k)
print("Generated Text:", generated_text)

Which, again, still uses the openai.Completion method!

I've noticed this sort of "oops, I screwed up, here's the exact same thing I just outputted" behavior appear more frequently when I use the new GPT-4o model.

If I use GPT-4 and I'm using my ChatGPT Plus subscription, I will still run into the issue where its first response references the deprecated method, but if I inform it of its mistake and provide a link to the official documentation, it'll access the web and try to offer something different. (It still generates unusable code lol but it's at least trying to do something different!)

When it comes to Python and Rails code, I'm seeing that the GPT-4o model is not as good at code generation as the previous GPT-4 model.

It feels like the model is always in a rush to generate something rather than taking its time and getting it correct.

It also seems to be biased toward relying on its training for supplying an answer rather than taking a peek at the internet for a better answer, even when you specifically tell it not to do that.

In many cases, this speed/accuracy tradeoff makes sense. But when it comes to code generation (and specifically when it tries to generate code to use their own APIs), I wish it took its time to reason why the code it wrote doesn't work.