This is a subfile of the primary AI4Communities post, and won't make any sense unless you read its parent first. It explores how the AI4communities idea would look within the ActivityPub-powered Fediverse.
In May 2023 Brave Software, creators of the privacy-first eponymous browser and search engine, released Brave Search API. Via this service, AI developers can access the training dataset Brave developed through both Brave Search and their Web Discovery Project, which allows Brave browser users to contribute anonymous data about the pages they’re actually visiting.
As a result, Brave’s training data “is much more representative of the Web people actually care about”, without the huge morass of “junk sites — spam, dead links, duplicates, and more” one finds when crawling the Web at large. And that was written before the Web started filling up with AI-generated content, which is even worse for training AI.