Model collapse may provide the funding for accelerating decentralised collective intelligence. What needs to happen next?
(Notes: This is an early draft. As explained in this newsletter edition, I am publishing these early versions as I develop my thoughts in the hope that constructive comments will help me finish the post. More version control in the footer.)
In my last post I highlighted how the authors of the model collapse paper in Nature pointed out that in the future any data stemming from “genuine human interactions … will be increasingly valuable” for training the next generation of LLMs. Online, there’s nothing more genuinely human than the interactions in a well-managed digital community, so “after greed and short-sightedness floods the commons with low-grade AI content… well-managed online communities of actual human beings [may be] the only place able to provide the sort of data tomorrow’s LLMs will need” (How Model Collapse could revive authentic human communities).
This supports the AI4Communities idea, first outlined in A Minimum Viable Ecosystem for collective intelligence in January 2023, which envisions tomorrow’s online denizens thriving in a diverse landscape of “small is beautiful communities, supported by AIs which the communities themselves own, train and monetise”, rather than putting up with whatever Meta’s or X’s global, “one size fits all” algorithms inflict on them to turn a profit.
If model collapse has made AI4Communities more feasible, what is it exactly, how do we get there, and who’s building it? But first, why care?
According to the Three Legged Stool manifesto, “Very Small Online Platforms (VSOPs)” are “social networks created for a very specific purpose, with rules, norms and affordances appropriate to that community” (A social network taxonomy). While they’re obviously playing with the EU’s term for “Very Large Online Platforms”, they see VSOPs as separate from federated servers, which combine small size and large reach thanks to protocols like ActivityPub (the Fediverse), AT Protocol (Bluesky) and Nostr.
But for the purposes of this post I’ll call them all “cozyweb”, partly to keep things simple, but mainly because:
Which is not to say they’re isolated: while each cozyweb community has its own content moderation policies tuned to its needs, most can be networked together without sacrificing independence.
So while each cozyweb can be a village — “small, most people know each other and you all share a common interest in keeping the sidewalks tidy” — each can build federated roads to many others, including bigger ones “crowded with people, plenty of them sleazy and more than the occasional sidewalk madman… But you’ll always discover something or someone new there. Every second person’s selling something, but one in 20 is selling what you need. Besides, you’re selling too…” (Welcome to the Fediverse, starry-eyed noob).
Just as each Cozyweb community chooses its own “village rules”, with AI4Communities it manages its own AIs to support the community: some help users with content discovery while guarding the village gates against trolls and trash, for example, while others support community moderation and collaboration processes.
So if small is beautiful, why is everyone trapping each other on a few massive, genuinely awful platforms?
Decentralised, federated networks have been around almost as long as Twitter and Facebook (Evan Prodromou’s Identi.ca launched in 2008, while linkbacks, invented to knit blogging conversations together, came much earlier). But although they are growing now, they are tiny compared to the closed gardens of surveillance capitalism, at least partly because — unlike their closed counterparts — they lack a business model, and hence revenue, as I found out the hard way almost a year ago:
“Social infrastructure needs to deliver content as a basic minimum. And running that infrastructure takes time, money and professionalism” — All my toots gone
The Three Legged Stool authors believe “many VSOPs will likely receive most of their revenue from the communities they serve, similar to how a local newspaper or nonprofit is funded”, and I hope they’re right. Today, however, only a small fraction of Mastodon users currently donate to cover their server’s costs (anecodotal).
People may pay more for a better experience. An AI owned by the community, which reflects their preferences and adds value in many ways (as explored below) could provide that experience.
These communities therefore need access to what the Three Legged Stool manifesto calls a “Friendly Neighborhood Algorithm Store”. Many of the AI services available there will support individuals (cf Bluesky users subscribing to custom feeds and block lists) but other services could support communities, and can be configured and fine-tuned by the village to ensure it reflects their interests and preferences.
Moreover, as I pointed out in 2020 when I launched myhub.ai, each community could act as a data union: rather than just buying or renting an AI to support their community, they could monetise the resulting algorithm to at least help cover the costs of running the community. While I shelved this idea when ChatGPT appeared, the model collapse paper now suggests that the training data created by well-managed communities could be the new currency of collective intelligence.
training data created by well-managed communities could be the new currency of collective intelligence
Ideally, the marketplaces would be able to serve all protocols seamlessly, helping create the scale required.
Because scale is currently missing. The Fediverse has around 12 million active accounts, Bluesky around the same (check), with Nostr a very distant third. Most users in these ecosystems are using Twitter and Facebook clones (check), which consist of sharing very short messages, albeit often with multimedia and/or URLs.
One key question is therefore: would 20–30 million users, using such superficial applications, create enough high-quality training data?
But before we can address that, we need a framework for understanding what AI services can offer communities on decentralised social networks.
What services could these “cozyweb AIs” provide, and within what sort of apps?
Content discovery is what happens when content you like or needed to see is presented to you.
The basic idea is simple: use AI trained by the community to find content of interest to the community, whether the community is connected to the author or not. These “For You” feeds will be driven by relevance and trust, as explored earlier:
! (caption) Concentric trust circles of collective intelligence, from Building collective intelligence from social knowledge graphs
So how is this any different to X, Threads et al? After all, these platforms all have content discovery algorithms.
Simple: your village’s algorithm works for you, not the platform’s owners and advertisers.
your village’s algorithm works for you
Don’t want it to optimise your feed for enragement? Tell it not to. Don’t like engagement bait, NSFW comedy or culture war memes? Tell it so. Only interested in a few topics? Let it know.
This will of course vary from platform to platform:
Enforcing any online space’s rules on content moderation is a thankless and sometimes hideous task, so the major platforms have already invested fortunes into AI-supported moderation. Open-source LLMs now make it possible to develop something similar for cozyweb spaces.
Depending on the protocol, these AIs could intervene in many different ways, from nudging members to rephrase a specific post to outright banning another for life. It would also intervene at several stages of content development, from before a member clicks the “Post” button through to the moment a conversation starts overheating.
Most of all, of course, the AI will constantly learn from the village’s inhabitants what they like and dislike.
The key to success will be effective community co-ownership of the village rulebook which the AI enforces. After all, in any decentralised network, if you no longer agree with your village’s rules you can move to another one easily, so ensuring everyone has a say in defining its rules is essential for any village to thrive.
a community working together to train their village AI
This raises the fascinating possibility of a community working together to train their village AI. Obviously, any member would ideally be able to contest decisions — the conversations around these edge cases could then help fine-tune the rulebook. The AI should take an active role in these conversations, effectively being coached by the community on norms and practices. I suspect the coaching will be bidirectional much of the time.
Well-run large communities will therefore develop effective content moderator AIs reflecting the community’s values, attracting more members with matching values. The data used to create these AI will be valuable, as will the AIs themselves.
Again, the details will vary according to the platform:
Chatting with strangers and making friends on a social media platform is only a small part of collective intelligence. How should AI help members of a cozyweb community be more creative and productive, maximising their human potential?
I think everyone is familiar with the idea of the concept of "centaurs", but the key role of "centaur services" is to avoid us humans becoming reverse centaurs - human beings turned into "OK-button-mashing automatons" by the requirement (often driven by regulation or marketing) to keep a human in the loop.
Research in this field tends to result in prompt libraries and frameworks for using LLMs - for example:
More on my hub: #centaur.
So rather than provide a generic "send this to an LLM" system, as is currently the case on myhub.ai (see How to chat with ChatGPT about your content) and dozens of other apps, AI services provided to cozyweb communities should provide AI-powered process agents which "lift up" our thinking and help us learn, rather than product agents which focus on turning us into reverse centaurs, mindlessly OKing whatever product the AI provides.
The content created with the support of these process agents should be a high-quality source of AI training data - after all, the AI knows exactly what was were created by AI, what was created by human inspiration, and what the AI did to help.
the AI knows exactly what was created by AI, what was were created by human inspiration, and what the AI did to help.
How exactly these services would look, of course, would depend on the client application: an AI service supporting users exchanging toots and skeets will look very different from an integrated thinking and longform publishing tool like (tomorrow's) myhub.ai. FWIW, as I launched myhub.ai in mid-2020 I set out some ideas, ranging from the commonplace (knowledge management assistance, auto-translation and summary, etc.) through to content credibility scores and filter bubble analysis.
!
From Imagining new MyHub.ai features as the pilot Hubs launch , June 2020
Things get even more interesting when you create agents to help a community collaborate and produce something together.
The above point about client applications becomes central if you want to take things further and co-create something together with a group, Because you have a problem: hardly any social media platform currently offers integrated collaboration tools (exception: Reddit offers community wikis). So you need to agree, as a group, on a collaborative tool everyone is comfortable with (TL:DR; it doesn’t exist for groups larger than 4), and then co-create on one platform while chatting on the other (which, frankly, sucks).
I’m not sure why this is so. Work-oriented collaboration platforms offer reasonably seamless chat and collaboration environments. In the public realm, however, there seems to be a stubborn divide separating wikis and other groupware from blogging and social media.
A social ecosystem where users can seamlessly collaborate
So picturing how AI services could help social groups collaborate is hamstrung by the lack of existing client apps where such collaboration already takes place. I explored one scenario almost 4 years ago in Thinking and writing in a decentralised collective intelligence ecosystem , and followed it with this video, which introduces the idea and then explores massive.wiki, an early example of how the collaborative part of the overall technology stack might look (and the tool I use to manage and publish this content).
However that content was light on details. Some framing ideas on how AI-supported collaboration can be found in How to Use AI to Build Your Company’s Collective Intelligence, which explores how AI can help “increase the collective intelligence of the entire organization… through boosting collective memory, collective attention, and collective reasoning”. However, this article also identifies some risks: for example, bringing an AI voice assistant into a collaborative effort created groupthink, reducing “intellectual diversity … Through a form of algorithmic monoculture” (see comments).
The AI services marketed to cozyweb communities to support collaboration, in other words, must not only help the individual members boost their creativity, but will also have to reinforce the positive and dampen the negative dynamics of groups. The most successful agents will probably have as many psychologists, facilitation and negotiation experts behind them as data scientists.
Finally, if any collaborative work is to be supported by any group’s AI, the members would obviously need to give their consent. Which in turn means that these collaborations become another source of high-value AI training data.
For AI4Communities to work we need social apps which people want to use and which generate high-quality training data.
One thing that bothers me is that almost everyone on Bluesky, as well as (I think!) most of the users on Nostr and the Fediverse, are using Twitter look-alike apps. My gut feeling is that for AI4communities to work in privacy-first spaces like these, the apps people use will need to be less superficial to generate higher-quality training data. After all, there’s only so much value in short status updates, and noone's proposing following users around the web, siphoning up their data to sell to marketers as Facebook et al do.
there’s only so much value in short status updates
This is perhaps why the only example of AI4communities I've come across to date is in the adjacent sector of search - see Brave example. AI4Communities is the social media equivalent of Brave’s approach to search. Both revolve around authenticity-driven data quality:
For one example of how that could look, A Minimum Viable Ecosystem for collective intelligence suggests combining publishing tools (blogs, social posts), social bookmarking and personal knowledge management, and networking them together across federated networks, all supported by your personal AI. (caption) From Social knowledge graphs for collective intelligence (January 2023)
Because then your AI will be able to can learn from:
That’s a lot of tightly interconnected knowledge, all reflecting your interests and expertise. Any AI sourcing that sort of knowledgebase will perform content discovery superbly well for you. And that AI is also being trained by everyone else in your village, with whom you presumably share some interests and values, and with whom you may also be collaborating (see below). Together, your community will generate extremely valuable AI training data, due to the authentically human curation and creation processes involved in making it.
(2024-12-08 update: I just updated AI4Communities on the ATmosphere after researching and testing some early non-Bluesky apps on the ATmosphere, the ecosystem underpinned by Bluesky's ATProto protocol. I'm optimistic 2025 will see many more such apps.)
How would this look on each decentralised social network?
Above I've set out a framework for categorising AI services in an AI4Communities ecosystem, but the details of these services will vary on different protocols. This is a feature, not a bug: "protocol competition" might drive innovation and growth in much the same way that platform competition in the telecommunications industry helped accelerate broadband rollout.
Subfiles to be developed:
Summary of the above files to be developed.
This is one of this wiki's pages managed with the permanent versions pattern described in Two wiki authors and a blogger walk into a bar…