May 15, 26

Do LLMs employ variable rewards, Spike Lee's hat, and a chilling video [Friday Wrap-Up]

Welcome to the Friday wrap up for May 15th, 2026. This short episode is where I talk about three things. What's on my mind, recommended reading, and recommended media.

Streamline Solopreneur is the show to help you build more reliable solopreneur systems so you can take time off worry-free. I hope this roundup and reflection will help you think more about your own systems. I'm Joe Casabona and here's what's on my mind.

Okay, so earlier this week, I found myself fighting with Claude about something I felt was pretty basic.

A problem that I've actually used Claude to solve before.

I kept going back and forth with the LLM and I would ask it questions, and then it would do things that I didn't even remotely ask it to do.

And so I started to form this weird theory that in Opus 4.7, it's designed to waste tokens,

which I know is like a weird conspiracy, but maybe it's not that weird. I don't know.

Sound off in the comments.

But I'm actually worried that it's worse than that.

And I, first of all, this is like wild unsubstantiated speculation.

But we've seen it before.

So in his book, Hooked, Near Aal talks about how social media sites and addictive products employ variable rewards.

The general idea is that our craving for the reward is stronger than the reward itself.

So we invest time and money into pursuit of the award.

And that's what actually satiates the craving.

So if you are scrolling on a social media site like TikTok, maybe 75% of the videos you don't care about.

But 25% of the time, you're going to get a video that you really like.

And it gives you that endorphin hit.

And so you keep scrolling on TikTok.

This is why social media sites, gambling sites, prediction markets, and so many other things are so

addicting. As I had my argument with Claude convinced that I could get a computer program to act

logically, which I don't think is a wild request, I started wondering if large language models

offer a sort of variable reward system. After all, it actually does perform certain tasks really,

really well.

And what if that variable reward is enough to convince most people, as well as momentarily

trick others like me, that the LLMs are actually good at a lot more than they are good at?

We are pleasantly surprised by the results of a single task or a category of task like

vibe coding something or going out to our calendar and, and,

grabbing stuff or going to our email and sending stuff to our to do list, all very like

computery things. So we start to crave that feeling. The I can't believe a robot actually did

this so I don't have to do this anymore feeling. And we pursue that craving for efficiency.

We pursue it with time and tokens. And again, this is totally unsubstantiated. It's a probably a weird

theory. I don't think it's whereas with social media sites and gambling sites and prediction

markets, it's in the best interest of those websites to employ variable rewards because it's in

their best interest for people to stay on those sites. I actually don't think it is in the best

interest of large language models to employ variable rewards. They want to want to. They want to

want a higher hit rate because the productivity or efficiency angle is the thing that keeps

people coming back.

But it's just something.

It was so weird in the beginning of the week where I had successfully vibe coded something

running.

And then I tried to do it again with a different project.

And it just like went off the rails.

And so as I am wasting time trying to bend the LLM to my will, I thought, is this a variable

rewards thing where like it pleases me a certain amount of the time?

And so I keep coming back to it.

Again, I don't know.

I don't think that large language models would actually benefit from that.

But it's something I was thinking about.

What do you think?

Is it plausible, likely?

Way off base?

Let me know either in the comments below or over.

at streamlinedfeedback.com. I'd love to hear your thoughts on variable rewards in large language

models. Now, moving on to recommended reading. Usually I like to make recommended reading like

a heavier, interesting think piece and the recommended media to be something fun and light to bring you

into your weekend. But it's reversed this week. So the recommended reading is from the

it's called the colorful impact of Spike Lee's Red Yankee Hat request 30 years ago.

I am a chronic Yankees hat collector.

And I suspect that my collection of about a dozen or so Yankee hats pales in comparison to some.

But still, I have those dozen or so Yankee hats emblazoned with the classic interlocking

NY that has persisted for over a hundred years.

In other words, I love a dope hat.

And arguably, we would not have the vibrant dope hat market that we have today without Spike Lee.

30 years ago, 1996, Spike Lee, a diehard Yankees fan, wanted a red Yankees hat to match his

red jacket.

And when he tried to get it made, he was unable to do to licensing.

New Era, the company that makes, I believe it's New Era, the company that makes the on-field

hats only had license to make hats the teams actually wore on-field.

So Spike Lee doing something that only Spike Lee and a handful of other people could do,

Went to the boss, Yankees owner George Steinbrenner.

And he got George to approve a red Yankees hat for him to wear.

And now, of course, we have all sorts of hats, all sorts of on-field hats, different colors.

I have that red Yankee hat, and I wore it on red hat, red Yankee hat day,

whatever that was called a couple of weeks ago.

And I have blue hats and yellow, a yellow Yankees hat to match my brand.

And it's just, it's so, it's a, it's a fun thing to have.

And I love this article because it provides such an interesting bit of history and context

to something that wouldn't necessarily seem to have an interesting backstory, right?

You would think, oh, MLB realized they could make a lot more money if they printed different color hats.

But Spike Lee kind of pioneered this because he also wanted to wear a dope hat.

So love that story.

I'll link it in the description and the show notes.

I think it's a fun read.

And now, because I'm not following the news cycle of like hit you with the bad story and then

end with a nice story. It's called the recency effect or recency bias where you remember the last

thing they talked about. But this video, I think, is too important to not mention. It is by, oh,

I linked something else recently to them. Right. More Perfect Union. It's called, I tracked down

the hidden workers secretly powering chat GPT.

this is something totally different, right?

It's the video talks about companies that recruit people who train large language models.

And so a lot of large language models are trained with, I forget what it's called,

it's like human reinforced learning, something like that.

This used to happen in, like, countries that were struggling economically.

Which is why, for example, chat GPT would use delve so much because the humans that were reinforcing the training, I think in South Africa, used the word delve a lot.

So there's a little tidbit as to why delve is used so much in large language models.

But now, as the companies are trying to be more Ph.D. level, right?

you've heard Sam Altman say this.

You've heard Amodi say this, Dario Amadhi from Anthropic, right?

That older models could do these things, but now they have PhD level knowledge.

And that is not true.

They are hype men trying to make billions of dollars.

And so they need to paint their products in the best way possible.

But they are hiring PhDs to do human reinforced training on these.

large language models. And so the problems this video highlights is twofold. It is the predatory

nature of recruiting experts in a way that is dehumanizing, like selling to the lowest bidder,

making them or like offering like higher contracts, but like you've got to be available at all

hours to do it. No price negotiation. You get what we give you. And, and it's,

It's terrible work, and the people who are doing it are possibly in dire economic straits.

So that's one side of it.

But the other side of it is this chilling mindset behind AI companies who want to own knowledge

and sell it back to us.

So, you know, I think Sam Altman is quoted in this video as saying, like, yeah, we want to make knowledge available

and you purchase tokens to get a little bit of that knowledge.

And that's awful.

It cuts against what got us here, right?

The reason the internet was created

was so that researchers could quickly share knowledge with each other.

And large language models appear to be trying to paywall that knowledge.

So this video is really interesting.

I think like more perfect union.

I've been watching a few of their videos and I want to dig a little bit deeper into them and like where they came from and this and that.

But it's at the very least seems like they do this like deeply researched work and it's well produced.

And it's just an it's an interesting thing that one might not think about when it comes to large language models.

Now there is a hopeful call to action at the end.

but ultimately I'm sharing this because I think it's an important message for anybody who uses large language models to hear.

All right, that is it for the Friday wrap on May 15th, 2026.

If you want to get a written version of this delivered directly to your inbox, as well as an exclusive automation of the week,

join my newsletter over at streamlined.fm slash wrap.

This week, maybe somewhat hypocritically,

given what I just talked about,

I am sharing an automation I've set up in Claude.

But that is it for this episode of the streamlined solopreneur

and the Friday wrap-up.

I hope you enjoyed it.

If you did, again, sign up for my newsletter.

Thanks so much for listening.

And until next time, I hope you find some space.

in your weekend.