I’ve been meaning to disconnect from Jetpack for a while now. This seems like a good time to do it, and to finally clear out the older Tumblr and WordPress.com blogs I don’t use anymore.

Tumblr and WordPress to Sell Users’ Data to Train AI Tools404 Media

It’s the kind of thing that you expect from Google or Facebook, or from any number of start-ups, but there’s been this sense that Automattic should know better — and with Tumblr being login-walled and ad-saturated, and the push to upsell in their WordPress plugins, and now this…it’s looking like they don’t.

I don’t think they’ve hit the “trust thermocline” yet, but selling user data is a pretty clear line.

As for AI access to the Firehose: My previous understanding of the firehose is that it’s basically an aggregation of what you’d see in a bunch of blogs’ public RSS feeds. Which, OK, fine. Analyze your heart out. Display my posts in your RSS reader. Just make sure private posts and comments don’t leak.

But LLM training isn’t the same as analytics, or showing a properly attributed post in a reader. And quietly changing the terms to allow more kinds of re-use on something most people using the service don’t know about? Not cool.

And not making it clear what is and isn’t included for which purposes? That breaks down trust.

Before this, I wasn’t worried about the Firehose. But now I’m not sure I can trust Akismet, never mind Jetpack, and I’m looking for a new spam filter.

Originally posted across several threads through my GoToSocial test site.

Update: Automattic did clarify that self-hosted blogs with Jetpack are not included in the training data. Only company-hosted blogs on Tumblr and WordPress.com. But I still uninstalled Jetpack from this site, just to be sure. Like I said, I’d been meaning to for a while.

The year is 2006. I’m complaining on my blog about businesses training their customers to fall for phishing attacks.

The year is 2011. I’m complaining on my blog about businesses training their customers to fall for phishing attacks.

The year is 2022. I’m complaining on my blog about businesses training their customers to fall for phishing attacks.

Corporations haven’t learned. Unfortunately, their customers have learned from all this training. And so has the fraud industry. Even if you’re usually savvy about this sort of thing, you can get caught up if the circumstances put you just off-balance enough to line up the holes in each overlapping layer of security.

I trusted this fraudster specifically because I knew that the outsource, out-of-hours contractors my bank uses have crummy headsets, don’t know how to pronounce my bank’s name, and have long-ass, tedious, and pointless standardized questionnaires they run through when taking fraud reports. All of this created cover for the fraudster, whose plausibility was enhanced by the rough edges in his pitch – they didn’t raise red flags. Cory Doctorow on “Swiss-cheese security.”

And here I am, in 2024, complaining on my blog about…well…you know.

Since I started converting parts of my website to use 11ty as a static site generator, I’ve been able to automatically generate tag and category pages that are *just there* as plain html files. And since they’re plain HTML, the old local site search engine I have on there still finds all the Eleventy-generated pages. And again since it’s all static, it doesn’t go down when the database does (which has been happening on an annoyingly frequent basis lately).

And this would be perfect if I was using a single Eleventy instance to build the entire site, but I’m not. I’ve got separate instances building the Les Misérables blog, the reviews, the tech tips, the creative writing collection, and so on, plus I have this WordPress blog and a bunch of hand-coded HTML from the old days.

Which leads to a few problems:

  1. Tags are per-section, not universal.
  2. The site search, which indexes html files on the server, sees everything except the WordPress posts, and the WordPress search *only* sees the WordPress posts.

Some ideas I’ve had to combine the tag pages:

  • Rebuild everything in a single Eleventy instance with a deeper hierarchy. Upside: Still static pages for everything except WordPress. Downside: Time-consuming, still leaves the main blog separate.
  • Write a post-build script that combines all the the tag pages from each subsite. Upside: Same. Downside: Need to either run on the server or make sure my local copies of the *other* subsites are current.
  • Write a server-side page that combines the backend HTML pages into a dynamic frontend for only the tag being viewed. Upside: simple. Downside: tag pages now depend on PHP. Update: I went with this one (see below)
  • Write some client-side JavaScript for the tag pages that will check whether other subsites have tag pages, and add those to the end of the list in a “See also…” section. Upside: simple, and the “local” tag pages are still usable as long as I make sure the script doesn’t block anything. I could even have it check the other static subsites first and then check the blog, so if the blog times out I still display everything else. Downside: requires JavaScript and additional network requests. But as long as I stick to vanilla JS, I can make it pretty small.

And for unifying the search:

  • Write a post-site-indexing script that adds the WordPress posts to the index. Could be done with direct DB access.
  • Write a pre-site-indexing script that generates a bunch of files for it to index. Seems like overkill.
  • Update the search code to send the same search terms to WordPress and combine the results.
  • Use a new search engine that indexes the served pages instead of the files on the server.
  • Point the search box at a remote search engine like Googl…yeah, never mind.

I haven’t settled on anything. I’m just kind of writing down ideas in public. If you have any suggestions, please let me know!

Update January 2026: I finally got around to actually implementing part of this for tags. I went with the PHP front-end that pulls in all the pre-generated HTML tag pages as sections, plus a custom tag search on the blog that returns the same format. Each website segment’s tag pages now include a link to the collected tag page, so if you click on a tag on a review or a tech article it keeps that context. So far it only includes the 11ty and ClassicPress sections (plus a couple of individual pages). I’m contemplating a back burner project to tag pages in other parts of the site and pull them in too.

Looks like IEEE has finally renamed their sustainable tech conference. Now it’s “IEEE SustainTech Expo.” Not only is it a bit clearer than the old name, but ever since Among Us came out, “SusTech” always made me giggle a bit. I doubt I was the only one.

Update: apparently I was mistaken, and SustainTech is entirely separate from SusTech, which is still going on. Looking at it a bit more, it seems that SustainTech is more of a marketing/trade show, while SusTech continues to be a technical conference.

Popped over to Twitter to delete the last handful of posts I left there when I deleted most of them back in December. Decided to leave two for now, though I might still delete them before the new TOS takes effect.

Oct 2008:

If only the super high-tech jet fighters had identified, clarified & classified, they’d have seen the attack for what it really was.

Nov 2022:

Weird, it’s almost like the needs of a “town square” for people to communicate and exchange ideas aren’t compatible with the incentives for a single for-profit entity to maintain it.

This is fascinating: Researchers looked at variations in the human leukocyte antigen (HLA) genes of people who had confirmed cases of Covid before the vaccine rollout and also had genetic records on file.

Those with a particular variation were twice as likely to have been asymptomatic.

Having that same variation from both parents made them 8.5 times as likely to have been asymptomatic!

They looked at two more cohorts and found the same results.

And then they looked at T-cells collected before the pandemic, and found that the ones with this allele responded more actively to SARS-COV2, despite never having been exposed to it before. That lends weight to the hypothesis that some people’s immune systems were able to recognize it as similar to more run-of-the-mill coronaviruses.

Next they want to broaden the study more to include people with a wider range of ancestry.

It doesn’t come close to explaining all asymptomatic cases, and they didn’t look at how it might stack with immune responses that are actually targeted at covid (vaccines, prior infections), or whether it also reduces the chances of long-term damage from covid.

But wouldn’t it be great if someone could come up with a supplement based on what this HLA variant produces that’ll cause your immune system to generalize better? Even if it’s just within coronaviruses?

I think there’s been a lot of talking past each other on privacy lately because there are so many layers to it.

Google or Dropbox keeping your cloud files from showing up on someone else’s drive or a public share is one layer. Keeping your data from leaking in a data breach is another. Protecting messages in transit from your device to their service. Google and Meta (Facebook, Instagram, and now Threads) are good at those.

But then there’s ensuring that Google or Meta doesn’t misuse it themselves, or sell it to someone who will.

And, well, to put it mildly, they’re not so big on that aspect!

Continue reading

»All pages site-wide with this tag