Pokédrones

Posted on June 12, 2026

Back in December 2024, Niantic posted about a large geospacial model they were building using data from Pokémon Go.

That led to a lot of online talk, with takes ranging from:

Well yeah, that’s what we figured they were doing with the “Scan a Pokéstop to build better AR models” feature.
to
Watch out! Niantic is building a global AR model using every image that passes in front of your phone’s camera!

I was in the “of course that’s what it’s for” camp. When the game rolled out the Pokéstop scanning feature a few years earlier, it seemed obvious that it was training 3D machine vision, like how all the “pick the squares with bicycles” CAPTCHAs are obviously training for self-driving cars. I figured there was a good chance someone would use it for some harmful purpose or another, probably surveillance, so I skipped those tasks.

Anyway, after a week or so, Niantic updated the article to clarify* that it was using the deliberate Pokéstop scans in public places for Pokémon Playground, not any of the other AR features like taking a photo of your buddy in the kitchen.

This made sense, because if they were using that data, it would have eventually gotten better at placing a Pokémon in my kitchen. (The floor’s a grid. You’d think that would help, but noooo….)

Drones (And not just Beedrils or Combees)

Those scans are back in the news, because as DroneXL reports, that geospatial model is being used for camera-based drone navigation.

Including military drones.

Because of course everything has to be weaponized. Allegedly even Pixar’s RenderMan.

Admittedly, GPS itself started as a military technology long before it became civilian infrastructure. Military and civilian tech really do just have a revolving door between them, don’t they?

Training Data

Among other sources, DroneXL cites a Dutch-language article at Trouw, who asked the defense contractor (Vantor) directly whether it uses Pokémon Go data: Vantor initially said no, but later walked back any guarantee. Niantic Spatial, however, has stated that the Pokéstop scans were used to train an “early version” of their model. That means the data (or weights produced from it) is still in there, just blended so much by training process that it can’t be identified anymore.

Kind of like you probably couldn’t confirm my old blog posts are in the training data for an LLM by looking at the LLM weights, but you can find pages from hyperborea.org in Common Crawl data, and assume any model trained on Common Crawl still has it in there somewhere.

Maybe scans made since Scopely (US-based, Saudi owned) bought Niantic’s gaming division last year haven’t gone into the map built by Niantic Spatial (still independent), so Vantor technically isn’t using current player data. Or maybe Niantic Games continued passing scans along to Niantic Spatial for a while, under the separate TOS, and Vantor’s spokesperson just hadn’t made the connection.

Quietly Dropped

Curiously, the Pokéstop scanning task I’d left in my list for years just disappeared a few days ago.

At first I deleted the tasks as I got them, but every time I scanned an eligible stop it would add a new one if I didn’t have one in my list. So after a while I just left one there and ignored it like an ad banner.

It turns out Pokémon Go discontinued the features on June 2, just three days before the Trouw article was published. (New tasks stopped appearing that day, and it took a few days for old tasks to disappear.)

Coincidence? Maybe. But the timing’s certainly suspicious.

Notes

* Before Niantic published their update, I e-mailed them asking for clarification. It took them over a month, but they did eventually reply:

Hi Trainer, we appreciate your patience. Thanks for your questions about AR Mode and our Privacy Policy. I’ve shared some additional information below:

For Pokémon GO, only AR scans from the PokéStop Scanning feature will contribute to the development of the Large Geospatial Model. As noted in the PokéStop Scanning Help Article (https://niantic.helpshift.com/hc/en/6-pokemon-go/faq/2519-scanning-a-pokestop/): information gathered during PokéStop Scanning allows Niantic to generate accurate, dynamic 3-D maps of real-world objects and their relative locations, and help devices understand the surroundings in AR real-time. As noted in the Editor’s note to the blog post, merely playing the game does not train an AI model.

When using AR or AR+ mode, we do not store your photos on our servers. For PokéStop Scanning, once a PokéStop scan is voluntarily uploaded, the video recording and associated camera data is retained on our servers in accordance with our data retention policies. For more information please see our Privacy Policy (https://nianticlabs.com/privacy).

Wall! Of! Text!

Posted on May 28, 2026

Kelson

A story’s been making the rounds about a software project that enforced a no-LLM-use policy by using prompt injection to delete itself. An “AI” agent-using coder filed a bug report (understandable), but filled it with a bunch of long-winded, clearly LLM-generated comments.

I looked at those comments. I can’t say I read them, because my eyes started glazing over a couple of paragraphs in. The contrast with the posts by the maintainer and other commenters is…stark.

Though I did notice the bit about how nobody reads the docs, which seems rather telling.

One of the problems with letting an “AI” write for you: If you aren’t reading it, and you assume the person at the other end is just going to summarize it anyway, there’s no motivation to make it readable. And no motivation to think about it and narrow down what’s important. And if you’re rewriting the prompt to focus on what matters most, consider that the prompt would get the idea across more effectively.

New agents.txt file found on DreamHost

Posted on May 13, 2026

Kelson

I host most of my websites on a DreamHost VPS*. This morning I discovered that a new file had been added, agents.txt, to the root of each site, on May 7.

It was easy to confirm that this is a new default file similar to the default robots.txt and favicon.ico DreamHost puts in every new site to get you started. Apparently they retroactively added it to sites that don’t already have one. So it’s a host action, not a hack. That’s good at least.

The contents are simple, and sensible for a new website: Discourage LLM training and actions, allow on-the-fly “AI”-generated summaries, disallow access to some common folders that shouldn’t be used for any of the above.

Though I am annoyed that they added it retroactively, particularly since it includes what looks like an explicit opt-in to retrieval-augmented generation, even if it’s something that’s happening already and less of a problem than a model vacuuming up your entire website for regurgitation. (Guess who’s already in Common Crawl!)

# Data use policy
Allow-Training: no
Allow-RAG: yes
Allow-Actions: no

# Default rules for all agents
[Agent: *]
Allow: /
Disallow: /admin/
Disallow: /config/
Disallow: /tmp/
Disallow: /logs/
Disallow: /backup/
Disallow: /.env
Disallow: /wp-admin/
Disallow: /wp-includes/

Harder to find was what else goes in this file. The first agents.txt spec I found used a completely different syntax and a completely different purpose. I had to search for the policy directives (in quotation marks) to find the proposal it’s implementing, which turns out to have been renamed as agent-manifest.txt shortly after it was proposed in March. Apparently whoever DreamHost didn’t get the memo before it rolled out. Update: As Patryk points out below, it’s changed again to agents-brief.txt, just one day after the blog post was updated with the second name. .

Good: sensible defaults for new sites.
Bad: rolled out to existing sites without notice, half-baked implementation.

*Update: To clarify, this is on DreamHost’s managed VPS service, where they handle the OS and the webserver, but you have a flexible userspace all to yourself. It’s a middle ground between shared hosting (where other sites are on the same virtual machine and webserver) and fully run-your-own-OS cloud hosting, and the balance generally works for me (YMMV).

The Firehose and the Jetpack

Posted on March 1, 2024

Kelson

I’ve been meaning to disconnect from Jetpack for a while now. This seems like a good time to do it, and to finally clear out the older Tumblr and WordPress.com blogs I don’t use anymore.

Tumblr and WordPress to Sell Users’ Data to Train AI Tools — 404 Media

It’s the kind of thing that you expect from Google or Facebook, or from any number of start-ups, but there’s been this sense that Automattic should know better — and with Tumblr being login-walled and ad-saturated, and the push to upsell in their WordPress plugins, and now this…it’s looking like they don’t.

I don’t think they’ve hit the “trust thermocline” yet, but selling user data is a pretty clear line.

As for AI access to the Firehose: My previous understanding of the firehose is that it’s basically an aggregation of what you’d see in a bunch of blogs’ public RSS feeds. Which, OK, fine. Analyze your heart out. Display my posts in your RSS reader. Just make sure private posts and comments don’t leak.

But LLM training isn’t the same as analytics, or showing a properly attributed post in a reader. And quietly changing the terms to allow more kinds of re-use on something most people using the service don’t know about? Not cool.

And not making it clear what is and isn’t included for which purposes? That breaks down trust.

Before this, I wasn’t worried about the Firehose. But now I’m not sure I can trust Akismet, never mind Jetpack, and I’m looking for a new spam filter.

Originally posted across several threads through my GoToSocial test site.

Update: Automattic did clarify that self-hosted blogs with Jetpack are not included in the training data. Only company-hosted blogs on Tumblr and WordPress.com. But I still uninstalled Jetpack from this site, just to be sure. Like I said, I’d been meaning to for a while.

Tired of Eventbrite Spam

Posted on June 15, 2023

Kelson

Eventbrite has worked well for buying tickets to events I’ve attended…

But over the last few months I keep getting spam for events that are not only not remotely interesting, they aren’t anywhere NEAR me. Sorry, but I’m not hopping on a plane for a pub crawl on the other side of the continent or a 2-hour “gong bath experience” on the other side of the planet.

At first I thought they were bogus. But everything pointed to Eventbrite’s servers. I’ve been blocking the campaigns in Eventbrite as I get them, but at this point my account settings show 10 organizations I’ve blocked, even though I’ve theoretically unsubscribed from “all Eventbrite newsletters and updates for attendees.”

Of course searching online is useless, because (1) everything’s about how organizers can keep their messages from landing in spam folders, and (2) searching online in 2023 is more or less useless anyway. It’s the end result of years of SEO trying to get into the first page (now with generative AI to flood the zone with even more bullshit!) combined with Google and Bing giving up on trying to give relevant results when what they really care about is ad impressions — and no, DuckDuckGo results aren’t much better.

I haven’t bought tickets to an event that uses Eventbrite since 2019 (for obvious reasons). I’m thinking at this point I should just cancel my account [Update: I did], and the next time I want to go somewhere that uses them for tickets, I can open a new one. With a different address.

Have You Confused Your iNat Lately?

Posted on December 24, 2021

Kelson

I confused the iNaturalist identification AI with some random snapshots from a trip up into the mountains a few years back.

Normally it’s pretty good at narrowing things down to a family or genus. In this case, I was aiming for scenery and family snapshots at the time, so they weren’t exactly ideal for plant IDs even cropped. Still…

Thumbnail of a pine tree in snow, with a dropdown menu for species name: "We're not confident enough to make a recommendation, but here are our top suggestions: American Black Bear, Mountain Chickadee, Lodgepole Pine, Bobcat, Mule Deer, Wild Turkey, Coyote, Mountain Lion"

This is on the level of “A flock of sheep on a hill” for an empty landscape. I wanted to ask it how many giraffes were in the picture!

The Color out of Cyberspace

Posted on November 4, 2017

Kelson

The Verge ponders: Has the internet been overtaken by the eldritch horror of Yog-Sothoth?

We’ve got this dimension right next to ours, that extends across the entire planet, and it is just brimming with nightmares. We have spambots, viruses, ransomware, this endless legion of malevolent entities that are blindly probing us for weaknesses, seeking only to corrupt, to thieve, to destroy.
—Astercrash

It’s a joke, of course. And it would make for an interesting story. But it’s scarier that we’ve created the awfulness ourselves.

Update Feb 2023: With some of the AI-generated art and writing going around these days, the cosmic horror comparison seems even more apt.

»All pages site-wide with this tag

K-Squared Ramblings

Sci-fi, comics, humor, photos…it's all fair game.

Tag: AI

Pokédrones

Drones (And not just Beedrils or Combees)

Training Data

Quietly Dropped

Notes

Wall! Of! Text!

New agents.txt file found on DreamHost

The Firehose and the Jetpack

Tired of Eventbrite Spam

Have You Confused Your iNat Lately?

The Color out of Cyberspace