Moving my public social media presence from Twitter to Mastodon has also had me revisit some thoughts and notes around recommendation algorithms.
As a newcomer to Fediverse in general, and Mastodon specifically, this text is more of "thinking out loud" rather than "this is how it is".
Algorithms are nothing but instructions
A lot of the premise for the public debate around algorithms is that they are bad, per se. I disagree. We need algorithms, because algorithms are nothing but instructions:
In mathematics and computer science, an algorithm is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. – Wikipedia
So what is the problem that needs to be solved? Once the accounts you follow on social media posts more than you can read, somehow, someone has to decide what to put in front of your eyes, and what post shouldn't be shown to you.
The "somehow" is the instructions, the rules – the algorithm – that specify the criteria for how that selection process is done.
But to me, "somehow" is not the most important question.
It's the "someone" we should focus our attention on.
But more on that in a bit.
First, let's talk about Mastodon.
Mastodons feeds are already algorithmic
Many Mastodon users say that the posts on there aren't sorted by an algorithm. I've done it myself many times already. But if you are really picky with the details, one could argue that that statement is wrong.
The Mastodon feed is indeed algorithmic.
The problem to solve on Mastodon – and any social media – is: "In which order should the posts from the account you follow be sorted?"
Right now, the main answer on Mastodon is "by publishing time". That's an algorithm. And, to be honest, no a bad one. For human conversations, a chronologic sort order makes a lot of sense.
But there are occations when a strictly time-based feed breaks down. My mornings are one recurring example.
Handling the morning flood
Most of the Mastodon accounts I follow are based in the US, while I'm in Sweden. During the daytime, my feed is often manageable. But the mornings are different, especially the very first time I open my Mastodon client each day. There are often a couple of hundred unread posts, and with each new account I follow, the number of posts increases as well.
I very rarely have time to scroll through them all – and as a consequence, I might miss something I really would like to see.
So for my morning check-in, a chronological feed is good. But not enough.
I want algorithms and filters...
So there are occasions when a chronological feed is what I want, and other times I need something else. And Mastodon already has some complementary tools as well:
- With lists, I can group accounts I follow by topic, by importance (to me!), or any other way I see fitting. If I don't have time to browse my full feed, I can jump into a subsection of it.
- On Mastodon, I can follow accounts, but I can also follow hashtags. This is a way to keep the number of accounts I follow down, and still be able to keep up with topics that are of interest to me.
- But the reverse is also possible! I can follow accounts, but block hashtags I'm not interested in. For me, #caturday is one such example. The Saturday flood of cat pictures is not my cup of tea, but with the block feature it isn't a nuisance either. Excellent!
- The three different feeds (Home, Local, Federated) are a way to get a broader or narrower post inflow.
All these are fairly simple instructions/algorithms. Good, but I want more!
...where I know what problem they are intended to solve
And finally, we are at the "someone" part. The algorithms I want are algorithms I control, not someone else. It's not algorithms per se that are the problem, it's black box algorithms that you don't understand, and don't know what they really are optimized for.
"Popularity" and "relevance" are two common answers. But what does that really mean?
Popularity is possible to quantify. "Sort feed by the sum of each post's replies, boosts, and likes" is one way to do it. But should replies, boosts, and likes equally influence the popularity score? Or is any of them more important? Would a better algorithm for popularity be "replies * 0,8 + boosts * 2 + likes"? I've no idea.
And moving into relevance, things get even harder. Especially on aggregate. What's relevant for me is not necessarily relevant for you.
But if you talk about relevance on an individual level, I think that there is a fairly large overlap between "popular" and "relevant"
The accounts I have chosen to follow are relevant to me – and as a consequence so are what they reply to, like, and boost. If many of the accounts I follow share the same link or boost the same post, the chances I'm interested in that specific link or that specific post increase.
One of the best third-party services I ever used for Twitter was Nuzzel. It worked just like that. Gathered the links from the Twitter accounts I followed and sent me a daily digest with the most popular ones.
The general principle behind Nuzzel's output was easy to understand:
- I follow accounts that are of interest to me.
- Those accounts share links.
- If many of the accounts I've decided to follow (because they post things I want to read) share the same link, that's a "vote" for that link being something I would like to see.
Understanding the basic principle is great. But even better if you could also fine-tune the settings.
Sliders, knobs, and server costs
There is sometimes (or even often) an inherent conflict between what I want to see and what others want me to see. For me, Meta does a fairly good job with Facebook's algorithms, while Instagram is slowly turning useless for me, with too many ads for products I'm not interested in, and too many sponsored posts from accounts I don't follow. Or as Cory Doctorow recently wrote:
The platforms treat your unambiguous request to receive messages from others as mere suggestions, a "signal" to be mixed into other signals in the content moderation algorithm that orders your feed, mixing in items from strangers whose material you never asked to see.
What I want is better control over the algorithms, both on a general level and also access to some settings to fine-tune how they perform. If we return to the "Nuzzel principle", that could mean that not all accounts I follow get the same importance when calculating what posts are most relevant to me. Perhaps I follow a couple of accounts that I know are better than others to share the things I absolutely don't want to miss. If that's the case, I want to increase their weight in the calculation, perhaps for certain link + hashtag combinations.
What I find so promising is that I see a lot of development going in this direction on Mastodon:
- Matt Hodges builds a script that creates Mastodon digests a'la Nuzzel.
- Popular iOS client Ice Cubes is experimenting with a digest view. So is the upcoming client Mammoth, which has built-in features for link discovery.
- Most interesting is perhaps Fediview, a tool that let you try out different algorithms and timeframes to find popular posts in your feeds.
But before I wrap up, a little bit about cost: You can use Facebook without your credit card because the company has the final say over the algorithms. Running the servers needed, be it for Facebook, Mastodon, or any other internet service comes with a cost. I'm on a private Mastodon instance and pay somewhere around 10 Euro per month to do so. And in my Mastodon feed, I see many admins asking for financial support for their servers.
Consider to do that.
If you want to read more on algorithms, filtering – and the related topic of content moderation, here are some articles I've read over the last few weeks that could be of interest:
- Cory Doctorow: Freedom of reach is freedom of speech
- Cory Doctorow: Better failure for social media
- Nilay Patel interviews Matt Mullenweg: How to buy a social network
- Mike Masnick: Hey Elon: Let Me Help You Speed Run The Content Moderation Learning Curve is a hilarious made-up conversation on moderation
- Arvind Narayanan: TikTok’s Secret Sauce