Hello Full Stack PMs!

Welcome to the third Weekly Stack, where we concentrate the AI firehose into tactical insights for PM builders. We've got 127 new subscribers this week – welcome to the stack! 🥞

There wasn’t a ton of news this week, but PM builders still got two incredible new models giving us an almost unbelievable leap in AI image generation and super-advanced speech-to-speech tools. It’s like we’re 10 years in the future.

So, today’s piece is short and sweet – use the extra time to try out this crazy new tech!

Let's do this.

Subscribe – Join the Stack! 🥞

🔧 AI tools & tactics for builders

Resource I Made – How I AI Database

I'm a big fan of Claire Vo's Podcast How I AI. It's a newish podcast in the Lenny universe. It's live demos of a variety of interesting use cases for PMs.

There's already 20 episodes, so to help you decide which ones to watch, I put together this database with:

Quick-scan summaries + details
Tags
My ranking of relevance to PM builders

100% free, of course. Here's the link.

❝

Quick note: If you’re building something cool, either at work or for a personal project, let me know! I’m going to start sharing more things from the community to inspire others on things to build. Just reply to this email.

Google’s new image model… is almost unbelievably good

ICYMI a mysterious new model amazingly named “nano banana” popped up in LLM arenas last week and blew people away with never-before-seen capabilities.

As widely suspected, this week it was revealed to be Google’s Gemini 2.5 Flash model.

It’s basically cracked the code on keeping characters looking exactly the same across different scenes - you can put the same person in a desert, then underwater, then in medieval armor and they actually look like the same person. It’s also just smart. It can combine objects into one image, generate different perspectives, and make many edits at once.

AND it's ridiculously cheap at ~4 cents per image compared to OpenAI’s 19 cents.

You can play with this model right now in Google’s AI Studio. Find it in the model picker and look for this one:

Here are some examples of what it can do. It’s really, really fun:

My cat Piper. She’d wear this if she could.

Prompt: What would the photo of the photographer taking this photo look like?
Incredible spatial understanding.

Prompt: What would it look like from the perspective of that arrow?

Give it multiple perspectives and it can generate many other angles.

"Put these characters in these positions”

Here’s my version

And my cat Winter. I’d bet on Winter.

Nano banana + Runway 2…. Anything is possible

Google also put together this thread to show what developers with early access built.

OpenAI finally gives us access to a speech-to-speech model

It’s here: gpt-realtime, a powerful speech-to-speech model from OpenAI.

You talk in, it talks out. It can follow complex instructions, call tools with precision, and produce speech that sounds more natural and expressive.

This isn’t completely new functionality – speech-to-text → LLM creates new text → text-to-speech was manually possible before. But now we have a single, low-latency, production-ready pipeline, coupled with better instruction following, tool/function-calling, image input, and even SIP phone calling.

This just came out yesterday, so we’re still waiting for more demo threads to see what people do with it, but I stitched together OpenAI’s examples here.

Zillow: Ask complex search questions and get a guided overview of options, with actions taken in-app as it explains them.
AT&T: Real-time product comparison while shopping, user can ask complex questions comparing options to current phone.
Stubhub: Integrated customer support that understands what's on the screen and can guide the user.
Oscar Health: Agent calling on behalf of the patient to schedule an appointment, checking calendar and handling follow up questions.
Lemonade Insurance: Real-time phone call with AI support to give instant quote and enroll user all in <1 minute.

How does this compare to other AI voice tech, like ElevenLabs?

If you’re building a voice agent that needs to listen, think, call tools, and speak, OpenAI’s Realtime is now the cleanest single-API path.
If you mainly need ultra-fast, high-quality TTS (and you’ll handle the LLM separately), ElevenLabs gives you better latency-per-dollar for pure synthesis.

You can play with it here, right now.

📈 Industry trends

A new Sensor Tower report confirms: After the initial wave of AI slop wrappers, AI-driven apps are reaccelerating as big players enter the fray. For PMs, this means the mobile app market is standardizing AI as a core feature, from AI companions to content generators. And it points to new monetization opportunities (and user willingness to pay) for AI features.

This won’t be a surprise to any Full Stack PMs, but there are more and more signs – AI is not a fad, and we’re officially starting to figure out how to navigate the wild west. Of course, new wilder frontiers await.

😄 Memes of the week

With the release of realtime, this South Park bit on ChatGPT voice mode is quite aptly timed…

— # (#)

And this one is mine:

This guy hit #1 on r/engineering and #2 on r/programmerhumor

📚 Other good reads & listens

Wikipedia’s guide to spotting AI writing: Wikipedia is in an all-out war against AI slop. Their team just a master class in the clichés, tropes, tones of voice, and other oddities of AI writing. If you use AI for writing anything, or if you want to be able to spot AI writing, it’s definitely worth a read.
Anthropic’s data policy shift for Claude: Did you see that terms of service pop up this week? Anthropic announced an update to its consumer ToS: by default it will now use customer chats from free/pro users to train future Claude models, and they’re extending data retention from 30 days to 5 years for those who allow their chat data to be used for training. Even “AI-safety-first” companies feel pressure to leverage user data to keep up
Most students are using AI to enhance learning, not outsource it, research shows: Some good news! This research found that students were using AI pretty responsibly – explaining technical terms and proofreading, but not doing their work. I think this is the natural order of things, if no other reason than this is what the models are actually capable of. As people get more experienced with them, they learn to use them in the ways that are actually beneficial.

🥞 The Last Pancake

If you only have 30 minutes this week, we got two amazing new capabilities this week from Google and OpenAI. Check out the examples from today’s email and test one!

Try Gemini Flash 2.5 for image generation
(If you AI your pet you must send me a pic.)
Try gpt-realtime for speech-to-speech

The best use cases for this tech are waiting to be discovered by builders like you.

Keep building,

Carl

How did you like today's newsletter?

Piper had me working overtime for you all this week. She’s (obviously) the boss.

🥞 Nano Banana Breaks Reality, OpenAI Drops Speech-to-Speech, AI Apps "Reaccelerate"