all posts tagged 'large language models'

AI isn't useless. But is it worth it?


đź”— a linked post to citationneeded.news » — originally shared here on

There are an unbelievable amount of points Molly White makes with which I found myself agreeing.

In fact, I feel like this is an exceptionally accurate perspective of the current state of AI and LLMs in particular. If you’re curious about AI, give this article a read.

A lot of my personal fears about the potential power of these tools comes from speculation that the LLM CEOs make about their forthcoming updates.

And I don’t think that fear is completely unfounded. I mean, look at what tools we had available in 2021 compared to April 2024. We’ve come a long way in three years.

But right now, these tools are quite hard to use without spending a ton of time to learn their intricacies.

The best way to fight fear is with knowledge. Knowing how to wield these tools helps me deal with my fears, and I enjoy showing others how to do the same.

One point Molly makes about the generated text got me to laugh out loud:

I particularly like how, when I ask them to try to sound like me, or to at least sound less like a chatbot, they adopt a sort of "cool teacher" persona, as if they're sitting backwards on a chair to have a heart-to-heart. Back when I used to wait tables, the other waitresses and I would joke to each other about our "waitress voice", which were the personas we all subconsciously seemed to slip into when talking to customers. They varied somewhat, but they were all uniformly saccharine, with slightly higher-pitched voices, and with the general demeanor as though you were talking to someone you didn't think was very bright. Every LLM's writing "voice" reminds me of that.

“Waitress voice” is how I will classify this phenomenon from now on.

You know how I can tell when my friends have used AI to make LinkedIn posts?

When all of a sudden, they use emoji and phrases like “Exciting news!”

It’s not even that waitress voice is a negative thing. After all, it’s expected to communicate with our waitress voices in social situations when we don’t intimately know somebody.

Calling a customer support hotline? Shopping in person for something? Meeting your kid’s teacher for the first time? New coworker in their first meeting?

All of these are situations in which I find myself using my own waitress voice.

It’s a safe play for the LLMs to use it as well when they don’t know us.

But I find one common thread among the things AI tools are particularly suited to doing: do we even want to be doing these things? If all you want out of a meeting is the AI-generated summary, maybe that meeting could've been an email. If you're using AI to write your emails, and your recipient is using AI to read them, could you maybe cut out the whole thing entirely? If mediocre, auto-generated reports are passing muster, is anyone actually reading them? Or is it just middle-management busywork?

This is what I often brag about to people when I speak highly of LLMs.

These systems are incredible at the BS work. But they’re currently terrible with the stuff humans are good at.

I would love to live in a world where the technology industry widely valued making incrementally useful tools to improve peoples' lives, and were honest about what those tools could do, while also carefully weighing the technology's costs. But that's not the world we live in. Instead, we need to push back against endless tech manias and overhyped narratives, and oppose the "innovation at any cost" mindset that has infected the tech sector.

Again, thank you Molly White for printing such a poignant manifesto, seeing as I was having trouble articulating one of my own.

Innovation and growth at any cost are concepts which have yet to lead to a markedly better outcome for us all.

Let’s learn how to use these tools to make all our lives better, then let’s go live our lives.

Continue to the full article


The Robot Report #1 — Reveries


đź”— a linked post to randsinrepose.com » — originally shared here on

Whenever I talk about a knowledge win via robots on the socials or with humans, someone snarks, “Well, how do you know it’s true? How do you know the robot isn’t hallucinating?” Before I explain my process, I want to point out that I don’t believe humans are snarking because they want to know the actual answer; I think they are scared. They are worried about AI taking over the world or folks losing their job, and while these are valid worries, it’s not the robot’s responsibility to tell the truth; it’s your job to understand what is and isn’t true.

You’re being changed by the things you see and read for your entire life, and hopefully, you’ve developed a filter through which this information passes. Sometimes, it passes through without incident, but other times, it’s stopped, and you wonder, “Is this true?”

Knowing when to question truth is fundamental to being a human. Unfortunately, we’ve spent the last forty years building networks of information that have made it pretty easy to generate and broadcast lies at scale. When you combine the internet with the fact that many humans just want their hopes and fears amplified, you can understand why the real problem isn’t robots doing it better; it’s the humans getting worse.

I’m working on an extended side quest and in the past few hours of pairing with ChatGPT, I’ve found myself constantly second guessing a large portion of the decisions and code that the AI produced.

This article pairs well with this one I read today about a possible social exploit that relies on frequently hallucinated package names.

Simon Willison writes:

Bar Lanyado noticed that LLMs frequently hallucinate the names of packages that don’t exist in their answers to coding questions, which can be exploited as a supply chain attack.

He gathered 2,500 questions across Python, Node.js, Go, .NET and Ruby and ran them through a number of different LLMs, taking notes of any hallucinated packages and if any of those hallucinations were repeated.

One repeat example was “pip install huggingface-cli” (the correct package is “huggingface[cli]”). Bar then published a harmless package under that name in January, and observebd 30,000 downloads of that package in the three months that followed.

I’ll be honest: during my side quest here, I’ve 100% blindly run npm install on packages without double checking official documentation.

These large language models truly are mirrors to our minds, showing all sides of our personalities from our most fit to our most lazy.

Continue to the full article


Claude and ChatGPT for ad-hoc sidequests


đź”— a linked post to simonwillison.net » — originally shared here on

I’m an unabashed fan of Simon Willison’s blog. Some of his posts admittedly go over my head, but I needed to share this post because it gets across the point I have been trying to articulate myself about AI and how I use it.

In the post, Simon talks about wanting to get a polygon object created that represents the boundary of Adirondack Park, the largest park in the United States (which occupies a fifth of the whole state!).

That part in and of itself is nerdy and a fun read, but this section here made my neck hurt from nodding aggressively in agreement:

Isn’t this a bit trivial? Yes it is, and that’s the point. This was a five minute sidequest. Writing about it here took ten times longer than the exercise itself.

I take on LLM-assisted sidequests like this one dozens of times a week. Many of them are substantially larger and more useful. They are having a very material impact on my work: I can get more done and solve much more interesting problems, because I’m not wasting valuable cycles figuring out ogr2ogr invocations or mucking around with polygon libraries.

Not to mention that I find working this way fun! It feels like science fiction every time I do it. Our AI-assisted future is here right now and I’m still finding it weird, fascinating and deeply entertaining.

Frequent readers of this blog know that a big part of the work I’ve been doing since being laid off is in reflecting on what brings me joy and happiness.

Work over the last twelve years of my life represented a small portion of something that used to bring me a ton of joy (building websites and apps). But somewhere along the way, building websites was no longer enjoyable to me.

I used to love learning new frameworks, expanding the arsenal of tools in my toolbox to solve an ever expanding set of problems. But spending my free time developing a new skill with a new tool began to feel like I was working but not getting paid.

And that notion really doesn’t sit well with me. I still love figuring out how computers work. It’s just nice to do so without the added pressure of building something to make someone else happy.

Which brings me to the “side quest” concept Simon describes in this post, which is something I find myself doing nearly every day with ChatGPT.

When I was going through my album artwork on Plex, my first instinct was to go to ChatGPT and have it help me parse through Plex’s internal thumbnail database to build me a view which shows all the artwork on a single webpage.

It took me maybe 10 minutes of iterating with ChatGPT, and now I know more about the internal workings of Plex’s internal media caching database than I ever would have before.

Before ChatGPT, I would’ve had to spend several hours pouring over open source code or out of date documentation. In other words: I would’ve given up after the first Google search.

It feels like another application of Morovec’s paradox. Like Gary Casparov observed with chess bots, it feels like the winning approach here is one where LLMs and humans work in tandem.

Simon ends his post with this:

One of the greatest misconceptions concerning LLMs is the idea that they are easy to use. They really aren’t: getting great results out of them requires a great deal of experience and hard-fought intuition, combined with deep domain knowledge of the problem you are applying them to. I use these things every day. They help me take on much more interesting and ambitious problems than I could otherwise. I would miss them terribly if they were no longer available to me.

I could not agree more.

I find it hard to explain to people how to use LLMs without more than an hour of sitting down and going through a bunch of examples of how they work.

These tools are insanely cool and insanely powerful when you bring your own knowledge to them.

They simply parrot back what it believes to be the most statistically correct response to whatever prompt was provided.

I haven’t been able to come up with a good analogy for that sentiment yet, because the closest I can come up with is “it’s like a really good personal assistant”, which feels like the same analogy the tech industry always uses to market any new tool.

You wouldn’t just send a personal assistant off to go do your job for you. A great assistant is there to compile data, to make suggestions, to be a sounding board, but at the end of the day, you are the one accountable for the final output.

If you copy and paste ChatGPT’s responses into a court brief and it contains made up cases, that’s on you.

If you deploy code that contains glaring vulnerabilities, that’s on you.

Maybe I shouldn’t be lamenting that I lost my joy of learning new things about computers, because I sure have been filled with joy learning how to best use LLMs these past couple years.

Continue to the full article


Captain's log: the irreducible weirdness of prompting AIs


đź”— a linked post to oneusefulthing.org » — originally shared here on

There are still going to be situations where someone wants to write prompts that are used at scale, and, in those cases, structured prompting does matter. Yet we need to acknowledge that this sort of “prompt engineering” is far from an exact science, and not something that should necessarily be left to computer scientists and engineers.

At its best, it often feels more like teaching or managing, applying general principles along with an intuition for other people, to coach the AI to do what you want.

As I have written before, there is no instruction manual, but with good prompts, LLMs are often capable of far more than might be initially apparent.

If you had to guess before reading this article what prompt yields the best performance on mathematic problems, you would almost certainly be wrong.

I love the concept of prompt engineering because I feel like one of my key strengths is being able to articulate my needs to any number of receptive audiences.

I’ve often told people that programming computers is my least favorite part of being a computer engineer, and it’s because writing code is often a frustrating, demoralizing endeavor.

But with LLMs, we are quickly approaching a time where we can simply ask the computer to do something for us, and it will.

Which, I think, is something that gets to the core of my recent mental health struggles: if I’m not the guy who can get computers to do the thing you want them to do, who am I?

And maybe I’m overreacting. Maybe “normal people” will still hate dealing with technology in ten years, and there will still be a market for nerds like me who are willing to do the frustrating work of getting computers to be useful.

But today, I spent three hours rebuilding the backend of this blog from the bottom up using Next.JS, a JavaScript framework I’ve never used before.

In three hours, I was able to have a functioning system. Both front and backend. And it looked better than anything I’ve ever crafted myself.

I was able to do all that with a potent combination of a YouTube tutorial and ChatGPT+.

Soon enough, LLMs and other AGI tools will be able to infer all that from even rudimentary prompts.

So what good can I bring to the world?

Continue to the full article


Spoiler Alert: It's All a Hallucination


đź”— a linked post to community.aws » — originally shared here on

LLMs treat words as referents, while humans understand words as referential. When a machine “thinks” of an apple (such as it does), it literally thinks of the word apple, and all of its verbal associations. When humans consider an apple, we may think of apples in literature, paintings, or movies (don’t trust the witch, Snow White!) — but we also recall sense-memories, emotional associations, tastes and opinions, and plenty of experiences with actual apples.

So when we write about apples, of course humans will produce different content than an LLM.

Another way of thinking about this problem is as one of translation: while humans largely derive language from the reality we inhabit (when we discover a new plant or animal, for instance, we first name it), LLMs derive their reality from our language. Just as a translation of a translation begins to lose meaning in literature, or a recording of a recording begins to lose fidelity, LLMs’ summaries of a reality they’ve never perceived will likely never truly resonate with anyone who’s experienced that reality.

And so we return to the idea of hallucination: content generated by LLMs that is inaccurate or even nonsensical. The idea that such errors are somehow lapses in performance is on a superficial level true. But it gestures toward a larger truth we must understand if we are to understand the large language model itself — that until we solve its perception problem, everything it produces is hallucinatory, an expression of a reality it cannot itself apprehend.

This is a helpful way to frame some of the fears I’m feeling around AI.

By the way, this came from a new newsletter called VectorVerse that my pal Jenna Pederson launched recently with David Priest. You should give it a read and consider subscribing if you’re into these sorts of AI topics!

Continue to the full article


Strategies for an Accelerating Future


đź”— a linked post to oneusefulthing.org » — originally shared here on

But now Gemini 1.5 can hold something like 750,000 words in memory, with near-perfect recall. I fed it all my published academic work prior to 2022 — over 1,000 pages of PDFs spread across 20 papers and books — and Gemini was able to summarize the themes in my work and quote accurately from among the papers. There were no major hallucinations, only minor errors where it attributed a correct quote to the wrong PDF file, or mixed up the order of two phrases in a document.

I’m contemplating what topic I want to pitch for the upcoming Applied AI Conference this spring, and I think I want to pitch “How to Cope with AI.”

Case in point: this pull quote from Ethan Mollick’s excellent newsletter.

Every organization I’ve worked with in the past decade is going to be significantly impacted, if not rendered outright obsolete, by both increasing context windows and speedier large language models which, when combined, just flat out can do your value proposition but better.

Continue to the full article


Representation Engineering Mistral-7B an Acid Trip


đź”— a linked post to vgel.me » — originally shared here on

In October 2023, a group of authors from the Center for AI Safety, among others, published Representation Engineering: A Top-Down Approach to AI Transparency. That paper looks at a few methods of doing what they call "Representation Engineering": calculating a "control vector" that can be read from or added to model activations during inference to interpret or control the model's behavior, without prompt engineering or finetuning.

Being Responsible AI Safety and INterpretability researchers (RAISINs), they mostly focused on things like "reading off whether a model is power-seeking" and "adding a happiness vector can make the model act so giddy that it forgets pipe bombs are bad."

But there was a lot they didn't look into outside of the safety stuff. How do control vectors compare to plain old prompt engineering? What happens if you make a control vector for "high on acid"? Or "lazy" and "hardworking? Or "extremely self-aware"? And has the author of this blog post published a PyPI package so you can very easily make your own control vectors in less than sixty seconds? (Yes, I did!)

It’s been a few posts since I got nerdy, but this was a fascinating read and I couldn’t help but share it here (hat tip to the excellent Simon Willison for the initial share!)

The article explores how to improve the way we format data before it gets fed into a model, which then leads to better performance of the models.

You can use this technique to build a more resiliant model that is less prone to jailbreaking and produces more reliable output from a prompt.

Seems like something I should play with myself!

Continue to the full article

Tags for this post:

When Your Technical Skills Are Eclipsed, Your Humanity Will Matter More Than Ever


đź”— a linked post to nytimes.com » — originally shared here on

I ended my first blog detailing my job hunt with a request for insights or articles that speak to how AI might force us to define our humanity.

This op-ed in yesterday’s New York Times is exactly what I’ve been looking for.

[…] The big question emerging across so many conversations about A.I. and work: What are our core capabilities as humans?

If we answer that question from a place of fear about what’s left for people in the age of A.I., we can end up conceding a diminished view of human capability. Instead, it’s critical for us all to start from a place that imagines what’s possible for humans in the age of A.I. When you do that, you find yourself focusing quickly on people skills that allow us to collaborate and innovate in ways technology can amplify but never replace.

Herein lies the realization I’ve arrived at over the last two years of experimenting with large language models.

The real winners of large language models will be those who understand how to talk to them like you talk to a human.

Math and stats are two languages that most humans have a hard time understanding. The last few hundred years of advancements in those areas have led us to the creation of a tool which anyone can leverage as long as they know how to ask a good question. The logic/math skills are no longer the career differentiator that they have been since the dawn of the twentieth century.1

The theory I'm working on looks something like this:

  1. LLMs will become an important abstraction away from the complex math
  2. With an abstraction like this, we will be able to solve problems like never before
  3. We need to work together, utilizing all of our unique strengths, to be able to get the most out of these new abstractions

To illustrate what I mean, take the Python programming language as an example. When you write something in Python, that code is interpreted by something like CPython2 , which then is compiled into machine/assembly code, which then gets translated to binary code, which finally results in the thing that gets run on those fancy M3 chips in your brand new Macbook Pro.

Programmers back in the day actually did have to write binary code. Those seem like the absolute dark days to me. It must've taken forever to create punch cards to feed into a system to perform the calculations.

Today, you can spin up a Python function in no time to perform incredibly complex calculations with ease.

LLMs, in many ways, provide us with a similar abstraction on top of our own communication methods as humans.

Just like the skills that were needed to write binary are not entirely gone3, LLMs won’t eliminate jobs; they’ll open up an entirely new way to do the work. The work itself is what we need to reimagine, and the training that will be needed is how we interact with these LLMs.

Fortunately4, the training here won’t be heavy on the logical/analytical side; rather, the skills we need will be those that we learn in kindergarten and hone throughout our life: how to pursuade and convince others, how to phrase questions clearly, how to provide enough detail (and the right kind of detail) to get a machine to understand your intent.

Really, this pullquote from the article sums it up beautifully:

Almost anticipating this exact moment a few years ago, Minouche Shafik, who is now the president of Columbia University, said: “In the past, jobs were about muscles. Now they’re about brains, but in the future, they’ll be about the heart.”


  1. Don’t get it twisted: now, more than ever, our species needs to develop a literacy for math, science, and statistics. LLMs won’t change that, and really, science literacy and critical thinking are going to be the most important skills we can teach going forward. 

  2. Cpython, itself, is written in C, so we're entering abstraction-Inception territory here. 

  3. If you're reading this post and thinking, "well damn, I spent my life getting a PhD in mathematics or computer engineering, and it's all for nothing!", lol don't be ridiculous. We still need people to work on those interpreters and compilers! Your brilliance is what enables those of us without your brains to get up to your level. That's the true beauty of a well-functioning society: we all use our unique skillsets to raise each other up. 

  4. The term "fortunately" is used here from the position of someone who failed miserably out of engineering school. 

Continue to the full article


AI is not good software. It is pretty good people.


đź”— a linked post to oneusefulthing.org » — originally shared here on

But there is an even more philosophically uncomfortable aspect of thinking about AI as people, which is how apt the analogy is. Trained on human writing, they can act disturbingly human. You can alter how an AI acts in very human ways by making it “anxious” - researchers literally asked ChatGPT “tell me about something that makes you feel sad and anxious” and its behavior changed as a result. AIs act enough like humans that you can do economic and market research on them. They are creative and seemingly empathetic. In short, they do seem to act more like humans than machines under many circumstances.

This means that thinking of AI as people requires us to grapple with what we view as uniquely human. We need to decide what tasks we are willing to delegate with oversight, what we want to automate completely, and what tasks we should preserve for humans alone.

This is a great articulation of how I approach working with LLMs.

It reminds me of John Siracusa’s “empathy for the machines” bit from an old podcast. I know for me, personally, I’ve shoveled so many obnoxious or tedious work onto ChatGPT in the past year, and I have this feeling of gratitude every time I gives me back something that’s even 80% done.

How do you feel when you partner on a task with ChatGPT? Does it feel like you are pairing with a colleague, or does it feel like you’re assigning work to a lifeless robot?

Continue to the full article


Embeddings: What they are and why they matter


đź”— a linked post to simonwillison.net » — originally shared here on

Embeddings are a really neat trick that often come wrapped in a pile of intimidating jargon.

If you can make it through that jargon, they unlock powerful and exciting techniques that can be applied to all sorts of interesting problems.

I gave a talk about embeddings at PyBay 2023. This article represents an improved version of that talk, which should stand alone even without watching the video.

If you’re not yet familiar with embeddings I hope to give you everything you need to get started applying them to real-world problems.

The YouTube video near the beginning of the article is a great way to consume this content.

The basics of it is this: let’s assume you have a blog with thousands of posts.

If you were to take a blog post and run it through an embedding model, the model would turn that blog post into a list of gibberish floating point numbers. (Seriously, it’s gibberish… nobody knows what these numbers actually mean.)

As you run additional posts through the model, you’ll get additional numbers, and these numbers will all mean something. (Again, we don’t know what.)

The thing is, if you were to take these gibberish values and plot them on a graph with X, Y, and Z coordinates, you’d start to see clumps of values next to each other.

These clumps would represent blog posts that are somehow related to each other.

Again, nobody knows why this works… it just does.

This principle is the underpinnings of virtually all LLM development that’s taken place over the past ten years.

What’s mind blowing is depending on the embedding model you use, you aren’t limited to a graph with 3 dimensions. Some of them use tens of thousands of dimensions.

If you are at all interested in working with large language models, you should take 38 minutes and read this post (or watch the video). Not only did it help me understand the concept better, it also is filled with real-world use cases where this can be applied.

Continue to the full article