Back in December 2024, Niantic posted about a large geospacial model they were building using data from Pokémon Go.

That led to a lot of online talk, with takes ranging from:

  • Well yeah, that’s what we figured they were doing with the “Scan a Pokéstop to build better AR models” feature.
    to
  • Watch out! Niantic is building a global AR model using every image that passes in front of your phone’s camera!

I was in the “of course that’s what it’s for” camp. When the game rolled out the Pokéstop scanning feature a few years earlier, it seemed obvious that it was training 3D machine vision, like how all the “pick the squares with bicycles” CAPTCHAs are obviously training for self-driving cars. I figured there was a good chance someone would use it for some harmful purpose or another, probably surveillance, so I skipped those tasks.

Anyway, after a week or so, Niantic updated the article to clarify* that it was using the deliberate Pokéstop scans in public places for Pokémon Playground, not any of the other AR features like taking a photo of your buddy in the kitchen.

This made sense, because if they were using that data, it would have eventually gotten better at placing a Pokémon in my kitchen. (The floor’s a grid. You’d think that would help, but noooo….)

Drones (And not just Beedrils or Combees)

Those scans are back in the news, because as DroneXL reports, that geospatial model is being used for camera-based drone navigation.

Including military drones.

Because of course everything has to be weaponized. Allegedly even Pixar’s RenderMan.

Admittedly, GPS itself started as a military technology long before it became civilian infrastructure. Military and civilian tech really do just have a revolving door between them, don’t they?

Training Data

Among other sources, DroneXL cites a Dutch-language article at Trouw, who asked the defense contractor (Vantor) directly whether it uses Pokémon Go data: Vantor initially said no, but later walked back any guarantee. Niantic Spatial, however, has stated that the Pokéstop scans were used to train an “early version” of their model. That means the data (or weights produced from it) is still in there, just blended so much by training process that it can’t be identified anymore.

Kind of like you probably couldn’t confirm my old blog posts are in the training data for an LLM by looking at the LLM weights, but you can find pages from hyperborea.org in Common Crawl data, and assume any model trained on Common Crawl still has it in there somewhere.

Maybe scans made since Scopely (US-based, Saudi owned) bought Niantic’s gaming division last year haven’t gone into the map built by Niantic Spatial (still independent), so Vantor technically isn’t using current player data. Or maybe Niantic Games continued passing scans along to Niantic Spatial for a while, under the separate TOS, and Vantor’s spokesperson just hadn’t made the connection.

Quietly Dropped

Curiously, the Pokéstop scanning task I’d left in my list for years just disappeared a few days ago.

At first I deleted the tasks as I got them, but every time I scanned an eligible stop it would add a new one if I didn’t have one in my list. So after a while I just left one there and ignored it like an ad banner.

It turns out Pokémon Go discontinued the features on June 2, just three days before the Trouw article was published. (New tasks stopped appearing that day, and it took a few days for old tasks to disappear.)

Coincidence? Maybe. But the timing’s certainly suspicious.

Notes

* Before Niantic published their update, I e-mailed them asking for clarification. It took them over a month, but they did eventually reply:

Hi Trainer, we appreciate your patience. Thanks for your questions about AR Mode and our Privacy Policy. I’ve shared some additional information below:

For Pokémon GO, only AR scans from the PokéStop Scanning feature will contribute to the development of the Large Geospatial Model. As noted in the PokéStop Scanning Help Article (https://niantic.helpshift.com/hc/en/6-pokemon-go/faq/2519-scanning-a-pokestop/): information gathered during PokéStop Scanning allows Niantic to generate accurate, dynamic 3-D maps of real-world objects and their relative locations, and help devices understand the surroundings in AR real-time. As noted in the Editor’s note to the blog post, merely playing the game does not train an AI model.

When using AR or AR+ mode, we do not store your photos on our servers. For PokéStop Scanning, once a PokéStop scan is voluntarily uploaded, the video recording and associated camera data is retained on our servers in accordance with our data retention policies. For more information please see our Privacy Policy (https://nianticlabs.com/privacy).

I’ve been meaning to disconnect from Jetpack for a while now. This seems like a good time to do it, and to finally clear out the older Tumblr and WordPress.com blogs I don’t use anymore.

Tumblr and WordPress to Sell Users’ Data to Train AI Tools404 Media

It’s the kind of thing that you expect from Google or Facebook, or from any number of start-ups, but there’s been this sense that Automattic should know better — and with Tumblr being login-walled and ad-saturated, and the push to upsell in their WordPress plugins, and now this…it’s looking like they don’t.

I don’t think they’ve hit the “trust thermocline” yet, but selling user data is a pretty clear line.

As for AI access to the Firehose: My previous understanding of the firehose is that it’s basically an aggregation of what you’d see in a bunch of blogs’ public RSS feeds. Which, OK, fine. Analyze your heart out. Display my posts in your RSS reader. Just make sure private posts and comments don’t leak.

But LLM training isn’t the same as analytics, or showing a properly attributed post in a reader. And quietly changing the terms to allow more kinds of re-use on something most people using the service don’t know about? Not cool.

And not making it clear what is and isn’t included for which purposes? That breaks down trust.

Before this, I wasn’t worried about the Firehose. But now I’m not sure I can trust Akismet, never mind Jetpack, and I’m looking for a new spam filter.

Originally posted across several threads through my GoToSocial test site.

Update: Automattic did clarify that self-hosted blogs with Jetpack are not included in the training data. Only company-hosted blogs on Tumblr and WordPress.com. But I still uninstalled Jetpack from this site, just to be sure. Like I said, I’d been meaning to for a while.

I think there’s been a lot of talking past each other on privacy lately because there are so many layers to it.

Google or Dropbox keeping your cloud files from showing up on someone else’s drive or a public share is one layer. Keeping your data from leaking in a data breach is another. Protecting messages in transit from your device to their service. Google and Meta (Facebook, Instagram, and now Threads) are good at those.

But then there’s ensuring that Google or Meta doesn’t misuse it themselves, or sell it to someone who will.

And, well, to put it mildly, they’re not so big on that aspect!

Continue reading

This looks cool: Mozilla has released a translation tool as an add-on for Firefox that can do web page translation locally instead of sending data to the cloud! It’s based on Project Bergamot and implemented in WebAssembly.

IMO translation is one of those things like speech recognition that ideally should have always have been local (for obvious privacy reasons), but the processing and data just wasn’t there yet when Google Translate and similar services launched.

Cnet has a report on how police departments are being inundated with false alarms from Amazon Ring alerts because people have freaked out over the camera footage of innocent activities. In one case someone called to report footage of themselves walking into the door!

I’m reminded of a case that happened nearby just a month ago. In Manhattan Beach (near Los Angeles), police from five cities — and an LA Sheriff’s helicopter — descended on a neighborhood because someone panicked over Ring footage of a food delivery sent to the wrong address. It took them an hour and a half to confirm that there was no crime in progress.

The story basically filled a bingo card:

  • IoT doorbell camera (and of course it was Ring)
  • Gig/app delivery service
  • Upscale neighborhood
  • Paranoid reaction to, you know, people
  • NextDoor posts quoted in article (because of course they are)
  • Massive police over-response
  • SMS alerts sent to neighboring cities

It was absurd. Fortunately no one was hurt or arrested, so it remains an absurdity, but between the waste of resources, the increase in fear, and the risk that something could have gone wrong, it fits right in with these other cautionary tales. As Fight for the Future puts it:

Ubiquitous, privately owned surveillance camera networks are NOT going to make our neighborhoods safer. They just make us all paranoid. Soon we’ll be snitching on our neighbors Red Scare style. Enough

Here’s a fascinating look back at the spam wars by former Gmail spamfighter Mike Hearn.

I was involved for most of the previous decade as (among other things) the email admin for a small ISP. We used a mix of public blacklists, a private blacklist, virus filtering, SpamAssassin with both shared rules and local custom rules, and various other tools all tied together, some at the Sendmail level and the rest through MIMEDefang. It worked tolerably well, though of course it wasn’t perfect. I find it amusing that Gmail declared victory on spam in 2010, the same year that I changed jobs to a position that was more software developer and less sysadmin.

Privacy is a growing concern these days, so he also talks about the impact that widespread end-to-end email encryption would have on spam fighting. If you’re the mail handler, you can’t filter on, say, links found in the message, or characteristics of the writing or formatting, or anything else in the content. You can’t even run statistical analysis on all known spam and non-spam to see which the new message fits better. All you can do is look at where it came from and where it’s going.

Moving the spam filter to the client lets you do content filtering on your own mail, but you can’t take advantage of the larger volume of data that an ISP can, which means your filtering isn’t going to be as effective. And if your main email client is your phone, that’s really going to slow it down — and chew up battery.

Encrypting more of our communication is probably the way to go, but we’ll have to come up with new approaches to some previously-solved problems like this.

It got me thinking: Most of us not only accept that our email providers will look inside our mail to filter spam and viruses, we expect it. That’s weird. The idea of the post office looking inside our letters is so abhorrent that even tracking programs raise concerns. The idea of an actual person reading our email in transit creeps us out. Many people have problems with the idea of automated systems (like Gmail) reading our email for purposes of targeted advertising. But spam filtering? We get upset if it’s not happening!

That says something interesting about our priorities, and about how big an impact unfiltered spam has on our email.

Via ma.tt.

Every time I listen to Vienna Teng’s song, “The Hymn of Acxiom,” it gets creepier. It’s beautiful, it’s haunting…and it’s all about how big data is keeping track of every trace we leave, piecing together a more and more detailed picture of each of us in order to feed us back the perfect, tailored life, and isn’t that what we wanted?

Tracking. Privacy. Social media. Filter bubbles.

And I always think, “I need to post something about this on Facebook…”

And that just creeps me out more.

»All pages site-wide with this tag