Today, I Have To Defend Bluesky | cmdr-nova@internet:~$

Today, I Have To Defend Bluesky

Follow me via:





I’m writing this, partially because, I need something to write about to take my mind off the pain of potentially having broken my toe at work, and because I’ve seen and kind of keep seeing people sharing this as a way to say, “Hey, Bluesky is about to start scraping its users data for profit in relation to AI companies.” And, I really do think it needs to be highlighted, that that’s not what any of this says.

In the below article …

It reads:

The demand for AI training data means the new social network has to think about its AI policy, even though it doesn’t plan to train its own AI systems on users’ posts.

However, the public nature of Bluesky’s social network has already allowed others to train their AI systems on users’ content, as was discovered last year when 404 Media came across a dataset built from 1 million Bluesky posts hosted on Hugging Face.

Bluesky competitor X, meanwhile, is feeding users’ posts into sister company xAI to help train its AI chatbot Grok. Last fall, it changed its privacy policy to allow third parties to train their AI on users’ X posts, as well. The move, followed by the U.S. elections that elevated X owner Elon Musk’s status within the Trump administration, helped fuel another exodus of users from X to Bluesky.

As a result, Bluesky’s open source, decentralized X alternative has grown to over 32 million users in just two years’ time.

Speaking at SXSW, Graber explained that the company has engaged with partners to develop a framework for user consent over how they would want their data to be used — or not used — for generative AI.

“We really believe in user choice,” Graber said, saying that users would be able to specify how they want their Bluesky content to be used.

“It could be something similar to how websites specify whether they want to be scraped by search engines or not,” she continued.

“Search engines can still scrape websites, whether or not you have this, because websites are open on the public internet. But in general, this robots.txt file gets respected by a lot of search engines,” she said. “So you need something to be widely adopted and to have users and companies and regulators to go with this framework. But I think it’s something that could work here.”

And while I don’t agree that Bluesky is actually decentralized, and I feel like, at the current moment, they’re really just using this as a buzzword, and I also don’t think it’s a great idea to put any faith in AI companies actually respecting a robots.txt, the part you should pay attention to, is …

However, the public nature of Bluesky’s social network has already allowed others to train their AI systems on users’ content, as was discovered last year when 404 Media came across a dataset built from 1 million Bluesky posts hosted on Hugging Face.

That a company has already scraped Bluesky, and that’s what this is about.

Bluesky wants to offer its users the ability to say whether or not they want this to happen. It doesn’t mean these companies will respect that, but not even Mastodon has a toggle in preferences that embeds a robots.txt file into your profile.

In fact, there are quite a few instances across the Fediverse, whose rules completely ignore the existence of AI, don’t mention it at all, and even some actively protect users who post, or even sell, generated slop.

So, unfortunately, I think I do have to say here, that Bluesky might actually be ahead on this, in that they’ve somehow managed to think of this, before any working on Mastodon has.

But, back to my main point here, you also have to pay attention to that first line. Aha! You thought I forgot about that, didn’t you?

The demand for AI training data means the new social network has to think about its AI policy, even though it doesn’t plan to train its own AI systems on users’ posts.

This is kind of interesting wording. One might assume that the people behind Bluesky have an AI system that they’re running. Or, you could take it to mean, if they develop a system, they don’t plan to use it to scrape your posts.

I guess you could take it either way, honestly.

All of that said, and this article sliced into bite-sized pieces, absolutely don’t get me wrong, people definitely have reason to be skeptical, and ask questions, as I do, all of the time. Social media has come crashing down all around us over the past five years. Many people have lost the basis of their livelihoods because of it. Some just picked a new place, and set up shop.

Heck, I used to have 3000+ followers on Twitter, and some of that (aside from Mastodon), was the basis for my ability to actually sell music. And I gave that up. I don’t mean that as a, “I sacrificed more than you,” but more as a like, “I understand your skepticism and fears.”

But I feel it’s important to get the facts straight, and even if you vehemently hate everything that Bluesky stands for, you should make sure that everything you say accurately represents and depicts who they are, and what they’re doing. It’s all about credibility. At least, if you want people to believe you.

For further clarification on all that’s going on here, you can read the actual proposal on Github in order to dive deeper into what Bluesky is putting forth.

And I definitely want to reiterate, Mastodon has a lot of work to do in making itself less friendly to scrapers and AI scammers.


mkultra.monster is independent, in that it is written, developed, and maintained by one person. Written, developed, and maintained, not for scrapers, bots, scammers, algorithms, or grifters: But for people to follow and read, just like the way it used to be, back in the golden age of the internet.

mkultra.monster is independent, in that it is written, developed, and maintained by one person. Written, developed, and maintained, not for scrapers, bots, scammers, algorithms, or grifters: But for people to follow and read, just like the way it used to be, back in the golden age of the internet.


WEBMENTIONS

Have you written a response to this post? Send me a webmention!

📝 How to send a webmention

To send a webmention, your response page must contain an exact link to this post and be publicly fetchable.

  • A blog post that mentions or links to this article
  • A public webpage that includes the exact canonical URL
  • Any webpage that references this content

After creating your response, paste the URL below. Social posts often need a bridge such as Bridgy before they appear as webmentions here.

Webmention submitted!
It may take a few moments to appear.

Error submitting webmention.