Przejdź do głównej zawartości


How Decentralized Is Bluesky Really? https://dustycloud.org/blog/how-decentralized-is-bluesky/

A technical deep-dive, since people have been asking me for my thoughts. I'll expand a bit on some of the key points here in a thread. 🧵

2 użytkowników udostępniło to dalej

"In July 2024, running a Relay on ATProto already required 1 terabyte of storage. But more alarmingly, just a four months later in November 2024, running a relay now requires approximately 5 terabytes of storage. That is a nearly 5x increase in just four months"

wtfh?!

Are they hiding a blockchain or some other idiotic data "structure" in there!? I know warezlords who had hidden directories for IRC DCC bots on compromised servers which weren't such disk hogs.
people post a lot of shit. I'm pretty sure my instance is using like 100 GiBs despite not keeping a copy of every image/video everyone posts. It adds up. The bsky model requires a copy of the entire universe, so it makes sense
/me tries to imagine DNS requiring everyone who operates a DNS server, or UUCP/SMTP server, or IRC server or SILC server to maintain the "entire universe" and still claim that it is somehow "distributed" or "federated" or "decentralized" in the same sentence.

Distributed/federated/decentralized systems aren't new. They were pretty well established by the end of the 20th century. p2p implementations to improve upon some of the known weaknesses/disadvantages of such things were starting to making inroads by the late 20th century too.

Bluesky, apparently hasn't caught up to them? That isn't surprising to me even slightly.

What is surprising to me, is how easily people are still misled when there is a web of information to help them know better.

Regardless, thanks to Christine for doing a deeper dive into Bluesky than I would care to do personally. I had a lot of reasons to avoid it already. It never "moved the needle forward" from my vantage and this world is already littered with too many NIH Syndrome/"Reinventing the flat tire" (as Alan Kay would phrase it) tech dumpster fires.

Bonus: also gave me some reading references into Golem that I had somehow missed.

Meanwhile: in the less technical world as Last Week Tonight with John Oliver, no one is talking about Bluesky anyway, my favorite excerpt is John Oliver highlighting a cowboy (literally, a guy who sells cows) talking about how "Facebook is archaic it's dead, Instagram's prime was more than seven years ago " and Tik Tok is apparently the go to place for the technically inept (which means the bulk of users), in spades: https://youtu.be/5CZNlaeZAtw?t=1221 ;)

CC: @cwebber@social.coop
First of all, before I say anything else, my goal here is NOT to be mean to Bluesky's devs. I know there's a lot of fediverse-Bluesky rivalry, but I have enormous respect for Jay Graber and her team and I know they believe in their vision!

This started because I got some very kind encouragement by @bnewbold to write something. I'm trying to be technical in my analysis, not unkind. I hope that can be recognized, really and truly.
That said, let's get to the summary: Bluesky / ATProto are not decentralized or federated, according to my analysis.

However, the "credible exit" goal is worth perusing, and does use decentralization techniques! But it is not decentralization/federation without moving the goalposts on those terms.
Furthermore, I think Bluesky is providing something valuable: a lot of people are trying to leave X-Twitter *right now* because it has become a completely toxic place.

The fact that Bluesky's team has managed to scale to receive such users is incredible, nearly feeling miraculous.
On the fediverse we also see a lot of accusations of Bluesky being owned by Jack Dorsey, and this isn't true. My understanding is that Jay performed an impressive amount of negotiation to allow Bluesky to receive funding independently.

These days Jack Dorsey is instead focusing on Nostr, which I can only describe as "a sequel to Secure Scuttlebutt with extremely bad vibes where bitcoin people talk about bitcoin"
I participated a bit in the process of when Bluesky was Jack Dorsey and Parag Agrawal's personal project. I also believe Jack and Parag were sincere about Bluesky as a decentralized social network protocol that Twitter would adopt, which is the directive that Bluesky was given as an organization.

When Jay Graber was awarded the position to lead Bluesky, I was not surprised. To me, Jay was the obvious choice to deliver what Bluesky was being directed, and I do think Jay is an excellent leader
There is also something which Bluesky gets right which the fediverse does not. I mentioned that Bluesky uses decentralization *techniques*, and the most important of those is content-addressing. This allows content to exist even when a server goes down.

This is a great decision and I have advocated that the fediverse do so as well. In fact several years ago I wrote a demo in @spritely's early days showing off how one could build a content-addressed ActivityPub in a spec-compatible way.
So I have opened here with the things that Bluesky does well. As you may guess, we are about to move into critiques territory, and it's a lot of critiques from a *decentralization*/*federation* perspective. It doesn't erase the "credible exit" goals, which I think are good still.

Let's dive in...
A frequent way of describing Bluesky's decentralization, including by Bluesky's team, is "it's like a bunch of blogs (Personal Data Stores), and then the relay/appview/etc pieces are like search engines"

This is a reasonable starting point for thinking about things, so let's run with it.
In fact ATProto's own tutorial even says "Think of our app like a Google": https://atproto.com/guides/applications

And indeed this is a good way to think about things. But it doesn't seem so bad, because we have Personal Data Stores like blogs, so probably things are fine, right?
While most people would argue that blogs and websites are open, few would argue that *Google* is open. So this is a curious place to begin thinking, and yet structually, it is actually quite apt.

PDS'es are like blogs, the rest is like Google. But relays/appviews/etc do a lot *more* than Google.
Relays, AppViews, etc don't just index information. Blogs and their interactions are generally slow-moving, but social media is direct and responsive. Notifications and fast interactions are key. So search engines, yes, but we should also think of these components of doing much more.
But let's stay on this blog/search engine analogy for a while before we unpack what it means on a *technical* level, which is interesting. Let's analyze for the moment from a power dynamics level.

Building a web search engine is actually pretty easy these days, you can do so with off-the-shelf tools. And yet there are only a couple of search engines *really*, Google and Bing (DDG mostly uses Bing). And yet the information is right there. *Anyone* could run their own engine. Why don't they?
Furthermore there is an interesting connection between blogs and social media: the death of blogs + feed aggregation directly aligns with the death of social media.

How many of you were around for the birth and awkward death of blog engine feeds? Because I was! Oh, remember Google Reader?
Feed readers are also simple, and in fact they were even easy to self host, even on the desktop! But Google Reader came in and was such a good design that everyone used it.

When it went away, blogs were still *there*. But blogging as a *syndication medium* died. One big player left, and it's gone.
This was sad for me especially; my favorite medium on the internet ever was webcomics. Webcomics still exist, sort of, but the loss of independent publishing and aggregation meant that they had to change to survive.

The shape of webcomics started to get shaped to the shape of Twitter's image box.
This may seem like an enormous aside, but it isn't. The big sell currently is that "you don't need to run a relay because you can run your own PDS!" but as I have illustrated here, the distribution and syndication power dynamics matter a lot.
So. It isn't enough to self-host your own PDS. Whether or not people can run their own relays/appviews/etc actually matters *a lot* if we want this stuff to survive.

So, can we? How hard is it to run your own AppView/Relay/etc?
Today, there is only one real organization running a Relay that really matters or an AppView that people use for anything other than fun aggregation of statistics. Nothing that resembles meaningful decentralization of the network. It's all run by one company: Bluesky.

But could we change that?
People are trying; most notably alice has done some great work recently: https://alice.bsky.sh/post/3laega7icmi2q

So now someone *can* run their own Relay (not the AppView yet, but maybe soon), and we're getting a sense of the cost and scale. This is good news; we didn't know before.
In fact we also have an idea of the rate of growth. Approximately 4 months prior, @bnewbold.net posted an article detailing how to run a Bluesky relay: https://whtwnd.com/bnewbold.net/entries/Notes%20on%20Running%20a%20Full-Network%20atproto%20Relay%20(July%202024)

This is great. We need more people trying to do so to get a sense of how decentralized things can be.
Just focusing on storage, in July @bnewbold.net estimated the amount of storage expected to run a Bluesky relay is approx 1 terabyte. In just 4 months at start of this month (November), alice estimates nearly 5 terabytes.

This is a fast growth rate and this is *before* the big post-election influx.
I tried estimating how much this would cost; as a lazy approximation I dumped a 5 terabyte machine into seeing what Linode would cost to self-host, and it was approximately 55k a year: https://bsky.app/profile/dustyweb.bsky.social/post/3lah5n3kld42q

That's a lazy estimate, but that's also what many people make in the US every year
However @bnewbold pointed out, correctly!, that there were cheaper options available. If we used even Linode's block storage, it would be cheaper (but still expensive) for the storage component, and this is true https://bsky.app/profile/dustyweb.bsky.social/post/3lah5n3kld42q
Ten wpis został zedytowany (4 dni temu)
In fact @bnewbold and alice had gotten the server down to just close to $200/month in their estimate, much much cheaper than I had, by choosing a dedicated server plan. Much cheaper!

But there's a problem though; that's cheap because you've got a server that has a dedicated disk...
Even if we look at the dedicated hosting provider that @bnewbold provided in June and scale the cost to the pre-election storage requirements, we are adding on a massive amount of cost every month, over $400/month more.
4x 7.68tb SD  is +$414.20/month on the original dedicated storage example
But worse, we have reached the limits of what is possible to do with a dedicated server. We *have to* move to abstracted storage from this point forward because we're starting to hit the limits of what's offered for cheap dedicated storage on one machine. And this number will only grow, and as said previously, is growing at an enormous rate.
I have spent a lot of time focusing on the cost of storage, but storage is only one cost required. These estimates have been done so far against servers that *nobody is actually using*. The cost of servers that people are using will be much higher, because more needs to happen than just store things.

And that is not even to mention the challenges with administrating, dealing with takedown requests, illegal content, etc, which are probably much more serious.
Let's take a break, the analysis of server costs is boring and I don't like doing it, and I'm sure people will throw numbers at me of the absolute race-to-the-bottom hosting numbers they can find to store and run all this stuff, but really that's not interesting to me.

Let's do a comparison.
Remember that the idea of "fully self-hosting" on Bluesky/ATProto at this point is primarily abstract; nobody is really doing it. But of course there's a place where tens of thousands of people are running their own servers for millions of users, and that's the fediverse/ActivityPub.
As said, tens of thousands of people are self-hosting *today*. Fediverse software doesn't just scale up, it scales *down*.

GotoSocial is cheap enough on resources where you can run it for family and friends on a raspberry pi or spare laptop you have sitting around.
Now you're hitting the point in this thread where some of you may be thinking "aha! this is where Christine is saying that the fediverse/activitypub are awesome and atproto is terrible!"

you have NO IDEA HOW MUCH I CRITICIZE THE FEDIVERSE ALL THE TIME, I do it all the time, and will later here
The fediverse has a lot of flaws. Oh trust me, we're gonna get to that.

But comparison-wise: what I mean to say is that architectural decisions matter, and scaling up isn't the only thing that's important, *scaling down matters too*.

If you care about decentralization, anyway.
Now look, we're about 1/3 of the way done here, there's a lot more to say, and a lot more said in my article, it's about 24 pages long if you print it out.

This is because in the age of TikTok I somehow have decided to model myself after David Foster Wallace, sorry

"Consider the Fediverse" I guess
But now, I will break for lunch. Enjoy your intermission because I will be back. We still have to get through the remaining 2/3 of the analysis, after all.

======= LUNCH BREAK HERE =======
Okay I am back from lunch, time to resume my analysis thread for "How decentralized is bluesky really?" https://dustycloud.org/blog/how-decentralized-is-bluesky/

I have been receiving a lot of notifications, I am not reading any of them until I finish with this so bear with me, BEAR WITH ME, we're gonna make it through
And before we make it any further can I say that I watched a nice medley of David Bowie and Cher singing, and it was so lovely https://www.youtube.com/watch?v=KPlN8RBP-Ws

@mlemweb said "of course it's very heteronormative despite having two queer coded icons on the stage and ISN'T THAT THE WAY I guess
But where was I? Oh yes. We had talked about why PDS'es aren't enough (blog/google analogy), relative costs of hosting things on ATProto vs ActivityPub, etc etc

But we haven't gotten into the really interesting parts which are the structural analysis stuff, so let's move onto that
Now you may be saying, "Christine, this is really unfair, because you're looking at ActivityPub servers which are only dealing with a small amount of the network, what if it were an ActivityPub mega-node? What are the costs THEN huh?" and "What if we hosted just PART of ATProto?"

What then INDEED
ATProto is not designed for the Relay and AppViews to only hold part of the network, not *really*, and ActivityPub is. We'll get to this in a moment.

But Bluesky actually has good justification for this! I will defend it insofar as Bluesky was making a serious *design decision*
Remember the directive that Bluesky was given: develop a decentralized protocol which Twitter can adopt. That informs a lot of things, and has meant that Bluesky was really very ready for this moment!

If you're an ex-X-Twitter user then by god, you're going to be amazed! It's just like Twitter!
This informs some other things:
- Bluesky's gotta scale BIG and do so FAST (scaling down: not a priority at all)
- It has to be something Twitter can adopt (of course, not anymore, but initially)
- Everything on ATProto is public (yes, everything, including your blocks btw, we'll get to that)
But here's the other thing. People have trouble with the fediverse! All those decentralization decisions get in the way, my god, you've got to choose a server, search doesn't work well (actually it could but it's a cultural thing, different topic), and worst of all:

Sometimes you DON'T SEE REPLIES!
Actually all these critiques of the fediverse are TRUE, these are known challenges, and actually it's not really so bad, but it could be better, and at any rate, Bluesky made a major decision to simplify a lot for new users, and they have. Things seem to just work for people! Incredible!
The thing you often get seen thrown around is "it's amazing, I had no idea a decentralized protocol could just work like that! How on earth did they solve that in a decentralized system and so FAST too!"

It's simple: all those things "just work" because Bluesky is centralized.
Now yes, they are using decentralized techniques. Remember when I said content-addressed storage is a good idea and the fediverse should do it too? IT IS! (And as I also said, it's actually fully possible for the fediverse to do, more on that later.)

But the reality is, it's still *centralized*
In every meaningful way from a power dynamics perspective *EXCEPT* the category of "credible exit" (which I am saying and agreeing is a good idea!) Bluesky is centralized.

MAYBE another big corporation could come along and host all this stuff but that's adding a Bing to our Google
Yes, you can host your own PDS. You can also host your own blog. But try hosting your own PDS and NOT hosting a relay or AppView and you can't do much.

Blogs are decentralized, Google is not.
PDS'es are decentralized, Bluesky is not.
We're getting to the point where we get to why I'm so damn frustrated about this and have been biting my tongue until it nearly comes detached from my mouth: users THINK Bluesky is decentralized because they're TOLD Bluesky is decentralized

AUGH! *That's* what drives me nuts.
Here's an example of this problem in action

fry69: "The working search box was the second thing that impressed me on Bluesky, I thought that was not possible with a decentralized model"

Sorry fry sixty-nine I regret to inform you the reason search works so well is that it's centralized! THAT'S WHY
So hold on, let me set some terms for "decentralization" and "federation" that I think are reasonable.

> Decentralization: the result of a system that diffuses power throughout its structure, so that no node holds particular power at the center.

Pretty reasonable. Do you agree? I hope so!
Okay how about "federation" now because this is a *technical term* that the *fediverse has established* and I'm kinda PO'ed about the goalposts being moved on this one.

A lot of people coming to Bluesky have never heard of "federation" before in a social network so listen up this is important!
Here's my definition of federation:

> Federation: a technical approach to communication architecture which achieves decentralization by many independent nodes cooperating and communicating to be a unified whole, with no node holding more power than the responsibility or communication of its parts.
Now historically, federation has been achieved on the fediverse via "message passing". Actually, this is to the degree where I just always associated message passing with federation, but really, federation is about the distribution of power, creating an abstract whole in a sea of autonomy.
Maybe there is another way to achieve federation, but it's about the power dynamics. It's a technical immersion of power dynamics, the flow and interchange of cooperation between many parts.

So you may say, well, doesn't ATProto have that? After all, messages flow through the different parts!
ActivityPub, as it turns out, follows the actor model of computation. Okay, many people implementing the fediverse don't know about the actor model aspect of ActivityPub but I am here to tell YOU, dear reader, that it is an important thing, not a detail
I'll take one more note about federation which is that often time the message passing mechanism of the fediverse is often called "federation", but theoretically another mechanism could exist, but I'm actually not so sure of that.

There's a reason the actor model and the lambda calculus are undying
Oh god Christine said "the lambda calculus" did you know she's into lisp and functional programming, what's she going to talk about next monads?!?!

I am not going to talk about monads. Not TODAY

But we do need to get a better architectural idea of how these systems work because it matters a lot!

myrmepropagandistudostępnił to.

So let me introduce two models of communication which we can use to analyze these two systems. It's important!

- Fediverse/ActivityPub: "message passing"
- Bluesky/ATProto: "shared heap"

Okay, cool, terms established, let's talk about them and why they matter because they matter A LOT
"Message passing" is what ActivityPub uses. It's "like email", people say, and that's true.

Actually it's even a lot like physical mail. You write a letter, you say where it should go, it gets delivered to your house.

Message passing. The world runs on it.
Now I can use message passing to send a message to you *directly* and indeed, that's "like email". For one-to-one correspondence, that's enough.

But it's not enough for a followers/following type mechanism. But we can build it on top! Thank *you* computational abstractions!
On top of "message passing" we will build "publish-subscribe" as a second-layer abstraction

"Your ideas are interesting and I'd like to subscribe to your newsletter."

You send me a letter saying you'd like to hear the things I have to say, okay, you're part of the reader list. That's how it works.
On top of that we can build even more abstractions and the net result is that this is how federation works in pretty much every "federated" system I know.

ActivityPub does some extra work to help you see replies on a thread, think "letters to the editor". This is a bit lossy sometimes though
It's true that sometimes users click over to a thread and see some replies but not all on their instance's UI. There's things that could be done to improve it, but it's sometimes mildly confusing, but not so bad, and you can click over typically to see whatever else is happening, and people learn to
I actually think this is improvable but I mostly don't care because this isn't as big a complaint as people tend to think it is on the fediverse, the other concerns like "what instance do I pick" tend to be bigger and "oh no my server went down"

That can be improved, we'll talk about that later
So okay, the federation is "message passing" and like email, or physical mail. You have an idea how it works.

Now we need to get to that other thing, a "shared heap" architecture. What on earth does that mean?
If "message passing" is like "mail comes to your house", a "shared heap" system works differently

In a "shared heap" system, all the mail gets dumped at the post office, and in the most naive version, you go over there and read through every single piece of mail to see which one is relevant to you
There is no "directed delivery" in a "shared heap" system, which means you are stuck with two things: either a "god's eye view" (Bluesky) or "even lossier about replies than ActivityPub" (Secure Scuttlebutt/Nostr)
The Bluesky approach to the "shared heap" is that *everything* goes into the big, centralized shared heap. Bluesky takes a "god's eye" view: it knows everything, and so knows what all your replies are, and can give you perfect search.

Secure Scuttlebutt / Nostr... well long story. Lossier, I'll say
You can imagine the physical world version of "message passing" already because you already live in this world. Messages come to your house or apartment building or whatever

For Bluesky's "shared heap" architecture, you'd have to build a whole addition to your house for everyone's mail
That's exactly why running a Relay or AppView is expensive: you're building an addition to your house for all the world's mail.

Eeep! That ain't cheap. That's why I'm saying: decentralization also means the ability to *scale down*.
Look, I know that I've been hitting this nail on the head for a while but: the web is open, blogs are open, but Google isn't open

But you could run your own Google, in theory. You could index the web. So why aren't you?

Ah yeah. Same thing here. That's what I mean, that's why it's centralized
Now as I have said, this is a *design decision*. And remember: most users of Bluesky really *don't care*. Decentralization is not their focus, they're trying to get the hell off the nazi hellscape that Musk's toxic reign of Twitter has become.

Bluesky's architecture, actually, is great for them.
If what your *goal* is to get off Twitter, then Bluesky has solved it. They solved it by building another Twitter, and this time it's open source, which is cool! And it might have this "credible exit" thing.

But god damnit it's not decentralized and it's not federated stop TELLING people that
"Oh Christine you're being sensitive"

Maybe, but there are real consequences to this. What if Bluesky/ATProto fails? "Oh well we tried decentralization and that didn't work." If people think something is something that it isn't, then that's a real problem.
Users, clearly, think a lot more of Bluesky is decentralized than it is, and realize less of the consequences than they should. This really worries me. Blocks and DMs are both great examples of this.
Blocking first. Bluesky's decision to have *everything* public means that it is expected that every participating node knows *everything* about who's blocking *everyone*.

"This is consistent with how blocking works on Twitter/X" their paper says

But wait, I'm pretty sure that one's not true though
There are ways Twitter exposes that a person blocks another person, which might actually be bugs. If a user quotes a user that blocked them (before or after the block happened), the quoted post is not displayed to you, just as if you were blocked by its author.

This likely got _fixed_ by the recent changes.
It is ONE thing to be able to block JK Rowling and for you to see that JK Rowling is blocking you.

It is an ENTIRELY DIFFERENT THING for ANYONE to see who is blocking JK Rowling and who JK Rowling is blocking

This one is shocking to me: this seems like a vector for abusive actors
Now to be completely fair this is something that Bluesky's devs are interested in potentially changing: there is an open issue to discuss the possibility of private blocks https://github.com/bluesky-social/atproto/discussions/1131

What I am saying is there are architectural consequences to fundamental design abstractions
Yes, I may sometimes seem silly over here, SICP-hugging fangirl, come on we're just trying to build things that *work* over here

Look I'm a lisp lady, I know the realities of "Worse Is Better" more than most, I now the right CS designs don't win

But Conway's Law flows in two directions!
You know what, we'll come back to "bidirectional Conway's Law", let's talk about Direct Messages for a minute because I think those are telling

Direct Messages in Bluesky, wait how do they work if ATProto is public?

Did you guess?

DMs are centralized! All DMs flow through Bluesky
Now to be completely fair Bluesky is clear about this *in their blogpost announcing DMs*, but just like this thread, I doubt nearly anyone has read that far (am I talking to the void? I don't know, if you actually have gotten to this message reply with "I found the easter egg" or something)
The thing that is telling to me about DMs is that we *have* federated direct message protocols like XMPP which have been around for ages; if Bluesky wanted to they could have tacked that on pretty quickly, E2EE or not. It still would have been decentralized at least
The point is that I have *seen in the wild* people saying "Oh yeah Bluesky added DMs to their decentralized protocol" and augh

I know they aren't claiming this but it's very clear to me that people are reading things as being completely different architecture than it is
But to Bluesky's credit, Twitter's DMs aren't decentralized either! And getting and shipping something that works, now for the influx of Twitter users, again... I am sympathetic

Bluesky's team is doing an INCREDIBLE JOB in that way of scaling to meet the incoming stream of Twitter refugees
On that note, again, I am not reading the replies right now because I am (a) afraid to and (b) I'm never gonna finish this and we are a bit over HALFWAY THROUGH the analysis but I have this fear that EVERYONE is mad at me, Bluesky fans, fediverse fans

I am trying to be analytical. I am trying!!!
I said we are about halfway through and criminy we're halfway through the afternoon, I need a break to get some tea

We have a few big topics left:

- Decentralized identity, how does it work (magnets too, yes)
- The Org is a Future Adversary
- Christine critiques the fediverse
- Wrap up
And so, it is TEA TIME

Go get yourself a hot beverage. Put honey or agave in it, if you like. Dairy, or perhaps, non-dairy, if you prefer.

=== BREAK TIME! Time for tea! ===
Okay, I am back and I am back with tea! I made "black tea with ginger" and I put some whipped honey in it. I also made tea for my spouse

I am drinking out of an oversized mug from @baconandcoconut that says "I'm that person who likes to serve on open source program committees", which is not actually accurate but I do anyway
I am also sad about the US House of Representatives being shitty to trans people who work there and are just trying to make it through the day

I used to do data modeling contracting for the US HoR on our legal system, true story, which sends me back to a time when I did a lot of data modeling
A lot of data modeling I did in that time was in the W3C Verifiable Credentials group that was working on Verifiable Credentials, zcap-ld (my spec), and, oh hey, Decentralized Identifiers (DIDs, the name is not my fault)

So actually I was pretty excited when I heard that Bluesky was gonna use DIDs!
Back in 2017 I wrote a whitepaper: "ActivityPub: from decentralized to distributed social networks" and it also suggested using DIDs https://github.com/WebOfTrustInfo/rwot5-boston/blob/master/final-documents/activitypub-decentralized-distributed.md

I no longer think DIDs are necessary to solve this, but then and now I think *decentralized identity is important*
In that sense, I am really glad Bluesky is taking on decentralized identity, as a concept! And DIDs, in a way, are a good signal.

But there are several problems, the first of which is: Bluesky supports two kinds of Decentralized Identifiers and they're both -- you guessed it -- centralized!
Before we get there, let's talk about what the DID spec was and what DIDs are. The core DID spec is an *abstract interface* for key management which provides a way of representing keys (and some other metadata) which can be created, retrieved, and updated/rotated.

So far so good...
The other requirement you would expect, based on the name, is that Decentralized Identifiers are *actually decentralized*.

When I got involved in DID work, that was actually the expectation of everyone. Then it was loosened. What? Why on earth?!
The reason actually stems from the first centralized DID method that Bluesky supports: did:web.

did:web is centralized, and kinda useless. It just works by a regex rewrite of the DID's name to an https URI and then it's retrieved. Anywhere you use did:web, you could have just used an https: URI
"Now wait Christine, didn't you say earlier that the web is decentralized and open? So therefore, did:web is decentralized and open"

Yeah but the naming system of the web is CENTRALIZED

We use DNS and ICANN (and then we add another centralization layer with TLS/SSL CAs)!
Everyone in the DID standards space KNEW that did:web was centralized, so why on earth was a centralized identifier permitted for something named "Decentralized Identifiers"?

The answer is easy. did:web is easy to implement, many DID methods were not.

did:web existed for test suites.
I was kind of exiting that particular area of standards when this happened but colleagues will tell you that I, and some others, were deeply upset and troubled by this

"Sure having a nearly no-op DID to pass the test suite is helpful but it shouldn't be labeled as a DID, people will get confused!"
Confusion, on its own, is one thing. But the problem is when confusion turns into decentralization-washing.

"This is going to turn into decentralization-washing!"

"It's just to pass the test suite!"

[... time passes ...]

"Actually we like did:web now, it's a DID method everyone can implement!"
And of course once the door was open to did:web, the door was open to everything! Decentralization is now no longer a requirement for DIDs. You can make a centralized DID method and call it a "Decentralized Identifier" and you're right because it implements a spec named "Decentralized identifiers"
But it's ONLY EXPERTS IN DIDs WHO UNDERSTOOD THIS

Most users hear "Decentralized Identifiers" and they think they know what's being delivered, the distinction between the *spec* being called that and the *mechanism used* being centralized... you have to go digging to find that out
So did:web is not only useless, it misleads people about the problem domain entirely, but hey it's now the most broadly deployed DID method in the world, congrats everyone!

Speaking of centralized Decentralized Identifiers, did I mention that did:plc is centralized?
For that matter, where did the term did:plc come from? Early versions of "did:plc" documentation called it the "Placeholder" DID method, that's what it stands for, to motivate changing it later

Well the docs no longer say that, it now says "Public Ledger of Credentials"

Good backronymn, but...
did:plc is centralized, and that bothers me because once again, users think something is more decentralized than it is, because they're being *told* it's decentralized

The particular way in which did:plc is centralized doesn't bug me too much but once again, few users have read into this
If you read the documentation of did:plc, they're actually quite upfront about did:plc's centralization being non-ideal. That's good, I appreciate that. Again, you gotta dig though, and the name misleads (which is, to be fair, the original sin of the DID Working Group)
(aside: wow my eyes are getting tired from staring at my monitor while I recap of what was a 24 page blogpost, why do I do this to myself)
Aside from being irritated about the name misleading, I don't mind the centralization of did:plc too much (other things, I am more concerned about, we'll get there)

There's one organization that can be queried via their API that keeps a definitive list of certificate and their updates
In theory, once a DID is registered with Bluesky, it cannot be altered by Bluesky, because a cryptographic update from the original key is necessary; it's a certificate chain, a good design

Bluesky can refuse to share did:plc documents or their updates, but it can't manufacture updates
This is pretty good tbh, it lowers the stakes a lot to have certificate chains

I love certificate chains, certificate chains are great

Honestly, having a centralized registry for them, it's not the best but it's not the worst (aside from that damn naming thing)

However...
There are some strange, strange things about did:plc that heightens the centralization concerns and, well

I'm not a cryptographer, but some of my good friends are cryptographers, etc etc. I got some... reactions to what is to follow
The first strange thing to me is that did:plc uses sha256 and, AFAICT, not sha256d (which is really just running sha256 again over the hash). Unless I am missing something? Am I wrong?

Maybe it's not a concern because of doc parsing but it's best practice to protect against length extension attacks
The next concerning thing is that did:plc truncates the hash to just *15 bytes* of entropy.

I'm... again I'm not a cryptographer, but why throw away all that delicious entropy? So the did fits in 32 characters? Weird choice, and it means collisions are cheaper
This is public information, I don't need to file a CVE to tell you about the truncation of entropy. I am, again, not a cryptographer. Maybe it's fine?

I do remember the Debian short IDs fiasco tho https://gwolf.org/2016/06/stop-it-with-those-short-pgp-key-ids.html

Why not hold onto all the entropy you can get?
DIDs weren't meant to be seen by the user; cryptographic identifiers in general *shouldn't be*, they should be encapsulated in the UI.

We'll get to UI stuff in a bit.

I just don't understand this decision though, it just seems weird to me but maybe a cryptographer will tell me it's fine, actually
At any rate, I continue to not understand it, maybe it's fine, but it did play a part in that "Hijacking Bluesky Identities with a Malleable Deputy" blogpost, which is fascinating and, unlike me, is written by a Real Cryptographer (TM) https://www.da.vidbuchanan.co.uk/blog/hacking-bluesky.html

Good post btw
One way in which the truncation shows up in that blogpost which I thought was curious is that the attack involved generating a *longer* truncated hash

The fix ended up resulting in codifying the hash length: 24 characters, and no longer https://github.com/did-method-plc/did-method-plc/pull/31
There's another thing about that blogpost that caught my attention. I will just quote it:

> However, there's one other factor that raises this from "a curiosity" to "a big problem": bsky.social uses the same rotationKeys for every account.
> This is an eyebrow-raising decision on its own; apparently the cloud HSM product they use does billing per key, so it would be prohibitively expensive to give each user their own. (I hear they're planning on transitioning from "cloud" to on-premise hosting, so maybe they'll get the chance to give each user their own keypair then?)
Anyway that's the quote and presumably this must be changed. I haven't looked, but I can't imagine they're still doing this today (are they?) but the fact that only one key was ever used in production for expense purposes is a strange decision
At any rate, that decision was used to create a kinda confused deputy-ish attack, which is why it came up in the blogpost, and anyway, hi, I'm not a cryptographer, momentary reminder that I am not a cryptographer, but I have designed cryptographic certificate chains and I was pretty shocked by that
At any rate, one way or another, you can presumably use did:plc to move yourself from one server to another so in the interest of "credible exit" this is a good choice

Though, one might take a moment to ask: who controls the keys if you *do* want to move?
Bluesky has identified, I'd say correctly even, that key management for users is an *incredibly* hard thing to do.

But the solution, once again, ends up pretty centralized: for all users on Bluesky's main servers at least, Bluesky generates and manages the keys for them.
I am, once again, kinda sympathetic and kinda unsettled simultaneously.

- Sympathetic: key management *is* hard and we just don't have the UX answers to solve that, and Bluesky is once again trying to deliver to Twitter refugees
- Unsettled: it's centralized, but... there's something *more* troubling
The big promise here, the "credible exit" side of things is that for most users, the vision they have is that if Bluesky gets bought by a big evil company, no problem, move somewhere else

But for those same users, Bluesky still *controls their keys* and thus *controls their destiny*
Regardless, Bluesky has this "your domain is your id!" thing, and that's pretty cool, the domain maps to your DID and your DID maps to your domain

Well, I'm not gonna get into this in detail here, I do on the blogpost if you wanna read it but, the cyclic dependency might be an actual cycle
tl;dr on that UX part:

- users only know domains, they don't know the DIDs
- turns out that's a phishing attack when those can change at any time
- if bsky.app ever goes down how do you actually know I *really* mapped to that name
- and a whole lot of "liveness" problems that enter there
in addition to this long-ass thread there is a long-ass article and if you care about things like "zooko's triangle" maybe read that version, the rest of y'all can move on we've got other stuff to cover here
It is time for TEA BREAK 2: THE REHEATENING

I will also go to the bathroom

TMI? If you've read this far into this weird thread I am already giving you too much info

=== TEA BREAK 2 ===
I have returned, with tea

I am still not reading notifications. Well, I have seen a few fly by on the fediverse which is blipping and blooping nonstop in the Mastodon UI so people are clearly reading it there

Bluesky says "30+". How big is the +?? I will resist temptation to look and assume "31"
"Where are we going with this Christine?"

Well you could have just read the blogpost but 3 more sections remain, we are approximately 2/3 there

I know, bear with me, what is left is:

- What should the fediverse do?
- Preparing for the organization as a future adversary
- Conclusions
Yes, I changed the order of the remaining sections, not from the blogpost but from the last time I said what was left on this thread

pray I do not reorder them again
Before we get into the next section, earlier I left an easter egg, which you could reply to and say "I found the easter egg" or something

Now you can put 2 eggs

I 2 was once an egg

(Look I specifically transitioned so I could never be accused of making dad jokes again so that does not qualify)
Ten wpis został zedytowany (4 dni temu)
Alright you've heard enough critiques of Bluesky for a bit and I SAID I was gonna critique the fediverse and I am a WOMAN OF MY WORD

So let's get into it!
I have actually critiqued ActivityPub and the fediverse a lot! I have kind of never stopped critiquing it, ever since the spec was released. There's a lot that can be improved!

I have even gotten criticism from AT LEAST ONE ActivityPub spec author for critiquing AP-as-deployed but I do anyway
Actually something that is funny about ActivityPub is that there's "ActivityPub the spec", which I think is pretty solid for the most part, and "ActivityPub-as-deployed"

Many of the critiques I'm about to lay out we left holes in the spec for which I hoped would be filled with the right answers
One thing we have already discussed so, before I will say anything else, I will repeat: content addressing is really good, and I'd like to see it happen in ActivityPub, and it's *possible to do*, I even wrote a demo of it https://gitlab.com/spritely/golem/blob/master/README.org

Bluesky does the right thing here, AP should too
Content addressing is important. It should not matter where content "lives". It should be able to live anywhere.

A server should be able to go down, and content should survive.

Go content addressing!
Actually with this and several other things I am going to bring up, I actually made sure there was space to do things right: there was a push to make ActivityPub "https-only"

I pushed back on that, I didn't want that requirement, and it was exactly for this reason: enabling content addressing
This isn't the only time I left a critique of ActivityPub-as-Deployed as opposed to ActivityPub-as-it-could-be: see also OCapPub, which critiques the anti-abuse tools of AP as inadequate and leading to "the nation-state'ification of the fediverse" https://gitlab.com/spritely/ocappub/blob/master/README.org

Oh, and ocaps!!!
ActivityPub left giant holes in the spec around two things which sound the same but which are not the same: Authentication and Authorization

Trying to mix these two, you accidentally get ACLs, and then you get confused deputies and ambient authority, plagues of the security world
Anyway, if you know *anything* about me, you know I am a big fan of capability security (ocaps) and that's the foundation of our work over at @spritely

But we will come back to ocaps in a second because it turns out OCapPub is not the only time I proposed AP + ocaps!
The other time I wrote about ActivityPub + ocaps was in a proposal to, yes, Twitter's Bluesky process in 2020 with Jay Graber titled... "ActivityPub + OCaps"! https://gitlab.com/-/snippets/2535398

I think that document laid out all the right ideas for *the fediverse* (not saying bsky, the fediverse)
Ten wpis został zedytowany (4 dni temu)
Now I want to be clear here that I *don't* think that proposal was necessarily the right one for Bluesky, and I *do* think Jay Graber *was* the right person to lead Bluesky

What I wanted to do required a lot more research, and we have done that over at @spritely instead
The reason I bring up the proposal here is that I think it has all the right analysis of *what the fediverse should do*, if it was going to rise to the challenge of fulfilling its true potential

So let me lay out what the things in that proposal were:
Here is your recipe for making the "Correct Fediverse IMO (TM)":

- Integrate ocaps, which is possible because actor model + ocaps compose
- Content addressed storage!
- Decentralized identity (notice the *y*, I did not say DIDs) on top of ~mutable CAS storage
- Petname system UX

(cotd...)
(cotd ...)

- Better anti-spam / anti-harassment using OCapPub ideas
- Improved privacy with E2EE ("encrypted p2p" even a better goal)

Whew! An improved fediverse?

"Uh, Christine, this sounds like a lot, do you think the fediverse can take this on?"
Spec-wise in ActivityPub, I think it's possible. The ecosystem, as deployed? I think the ecosystem can and will only do part of it, if we really get everyone excited, maybe the content addressed storage and decentralized identity parts, in which case the fediverse will also survive nodes going down
The ocap stuff, I tried getting fediverse implementers excited about this and tbh, it's pretty hard to design into a Ruby on Rails or Django style framework and mindset. Backporting the right designs to existing systems is a real challenge.

Especially ocaps need to go bottom-up.
For this reason, @spritely's tech looks like it's very focused on computer science'y low-level BS, but that's actually because it's *too hard to build the systems I want right now on top of current technology*, we need stronger foundations

But people have to build for today too
Let's leave the ocap stuff to the side for now, then. Let's focus on what Bluesky and the fediverse have to learn from each other.

- The fediverse should adopt content-addressed storage and decentralized identity
- Bluesky should adopt real, actual federation and decentralization
For this reason @blaine says of both ActivityPub done right and Bluesky done right, "they're the same picture" (The Office meme goes here, yes)

To a large degree, I think @blaine is right
Of course, adapting an existing system as deployed isn't easy.

I will say though that I think if Bluesky were to become *actually decentralized* it would look a lot like ActivityPub in terms of having directed messaging. This will also introduce similar challenges around eg replies, etc.
To the end of the fediverse, perhaps I sound bitter, "they didn't adopt ActivityPub the way *I* saw it!"

The truth is that Mastodon didn't, but Mastodon also saved ActivityPub. It then painted a vision of the future that wasn't, at least, what Jessica Tallon and I expected of it. But it saved AP.
The fediverse and Bluesky, at great effort, could learn a lot from each other in the immediate term.

In the longer term, neither is implementing the ocap vision I think is critical for the big vision, and in a way, I think maybe neither can be easily rearchitected to achieve it. Well, not yet.
When I laid out the ideas of OCapPub to various fediverse developers, the response was "this sounds cool but I have *no idea* how to retrofit a Rails/Django app for this kind of actor-oriented design".

And they were right.

Remember when I said Conway's Law flows in both directions?
Conway's Law says that a technical architecture reflects the social structure under which it was built. But the reverse is also true. The social structures *we can have* are made possible by the affordances of the tools we have available.

"Tech problems/social problems": false dichotomy.
monads? more like GROANads
if you could convert this thread to TTS narration and attach it to some subway surfers footage I'll be set
I'd go for a vertical video of Luanti parkour gameplay, we're doing FOSS here