I host most of my websites on a DreamHost VPS*. This morning I discovered that a new file had been added, agents.txt, to the root of each site, on May 7.

It was easy to confirm that this is a new default file similar to the default robots.txt and favicon.ico DreamHost puts in every new site to get you started. Apparently they retroactively added it to sites that don’t already have one. So it’s a host action, not a hack. That’s good at least.

The contents are simple, and sensible for a new website: Discourage LLM training and actions, allow on-the-fly “AI”-generated summaries, disallow access to some common folders that shouldn’t be used for any of the above.

Though I am annoyed that they added it retroactively, particularly since it includes what looks like an explicit opt-in to retrieval-augmented generation, even if it’s something that’s happening already and less of a problem than a model vacuuming up your entire website for regurgitation. (Guess who’s already in Common Crawl!)

# Data use policy
Allow-Training: no
Allow-RAG: yes
Allow-Actions: no

# Default rules for all agents
[Agent: *]
Allow: /
Disallow: /admin/
Disallow: /config/
Disallow: /tmp/
Disallow: /logs/
Disallow: /backup/
Disallow: /.env
Disallow: /wp-admin/
Disallow: /wp-includes/

Harder to find was what else goes in this file. The first agents.txt spec I found used a completely different syntax and a completely different purpose. I had to search for the policy directives (in quotation marks) to find the proposal it’s implementing, which turns out to have been renamed as agent-manifest.txt shortly after it was proposed in March. Apparently whoever DreamHost didn’t get the memo before it rolled out. Update: As Patryk points out below, it’s changed again to agents-brief.txt, just one day after the blog post was updated with the second name. .

Good: sensible defaults for new sites.
Bad: rolled out to existing sites without notice, half-baked implementation.

*Update: To clarify, this is on DreamHost’s managed VPS service, where they handle the OS and the webserver, but you have a flexible userspace all to yourself. It’s a middle ground between shared hosting (where other sites are on the same virtual machine and webserver) and fully run-your-own-OS cloud hosting, and the balance generally works for me (YMMV).

After a list of companies publicly supporting SOPA (the censor-the-internet-in-the-name-of-stopping-piracy bill) went public last week, the complaints started rolling in…but the biggest target, at least in the circles that I frequent, was GoDaddy. People organized a boycott, transferred their business elsewhere, and GoDaddy eventually reversed course, but it was too late to stop a massive outflow of customers.

But why was GoDaddy such a target? And for that matter, why did so many people follow through, rather than just rant about it on the internet?

I think there are several reasons.

  1. The tech industry is mostly opposed to the bill on technical reasons. Pick a random hosting provider and chances are they’re officially against it. That made GoDaddy stand out in a way that a random movie studio doesn’t.
  2. They provide a service, not content, and there are many competitors who provide the same kind of service. (And it seems like they all came out with discount codes to encourage people to switch to their company.) With content, you can choose to read a book from another publisher, or watch a movie from another studio, but if you want to watch a particular movie, you can’t get it somewhere else. There are lots of comics publishers out there, but if you want to read Spider-Man, you can only get it from Marvel.
  3. Public opinion of GoDaddy was already low. For some it was their sexist ad campaigns. For some it was the CEO bragging about shooting elephants. For some it was their incessant email marketing, or focus on upselling unneeded services to people who didn’t understand what they were, or the fact that their website is such a %^$^@#%& pain to use. They’re cheap, and they’re well-known, which means a lot of people used them…but they weren’t that well-liked. Supporting SOPA ended up being the last straw.

As a result, you had a company that was tolerated at best painting a target on themselves, and a relatively easy way for people to vote with their wallets and not actually give anything up other than the time and money needed to make the transfer.

Full disclosure: I used to have about 10 domain names registered through GoDaddy, plus a few at DreamHost and one at Network Solutions. (Yes, Network Solutions.) GoDaddy was annoying, but cheap, and it was easier to renew than move. This week I consolidated them all at DreamHost, where I’ve had my websites hosted for the past year. DreamHost is offering a discount code for new customers who want to switch: SOPAROPA. I don’t get anything for telling you that, but if you sign up and list me (kelson – at – pobox – dot – com) as the person who referred you to DreamHost, I’ll get credits that I can apply to my hosting bill.

After years of piggybacking on employers’ web servers (with permission, of course!), I’ve moved my personal websites to a third-party web host. It’s kind of weird to be dealing with a web server that I don’t fully control, but DreamHost is really flexible and (most importantly) specifically supports WordPress.

The only thing I’ve really missed so far is Apache’s mod_speling [sic], which will automatically correct any one typo or capitalization error when trying to reach a file. It’s nice to have, but far from critical.

ยปAll pages site-wide with this tag