I hear some of my anti-AI friends talk of adding some hidden strings to their web pages which will hide them from one or the other of the AI companies looking for content to train their newest models...

Aside from the fact that this feels a bit futile, in terms of the scale of actual content ingested by these behemoths, I find it sad that our future AI children won't have the benefit of these people's thoughts.

As my father trained to be an artist, he spent time studying the work of those who came before him, endeavouring to emulate it - or parts of it: techniques, framing and so on - before gradually developing his own style. In time the style that he, and others around him, came to express became a minor school in the art community of the 1950s in Auckland.

We see indications of nascent personal style from agents, from ChatGPT's overuse of the "em dash" (I reckon I did that myself before it did though), from the various default styles of the image generating models, Mistral Vibe's puppylike enthusiasm for writing code (generally mediocre so far), and so on.

Was his behaviour really that different to what we see from these AI agents? Well, sure: the *scale* is different, and in fact it seems likely that will increase. With other LLM-based systems they seem to be evolving away from detailed human review, and more towards high-level guidance - Anthropic's "constitution" for Claude is one example, and DeepSeek's R1 - a model which can be trained with almost no human feedback - another. Of course they still need to be grounded in a solid corpus of human knowledge, or they wouldn't understand English, Java, or where on earth 41.1083812,-5.1195398 is.

Regardless, whether it is useful to them or not: I won't be blocking these training systems from ingesting the content that I create - I've always shared my code under open licenses when I could, and I see no reason to stop. If anything, though I have licensed my free software primarily under the GPL, nowadays I am more inclined towards broader licenses such as CC-0 or Artistic, than encumbered ones like GPL or Apache.

You're welcome!