To understand the experiments in this repository you need to read At a glance, below. To understand why, you need to read Context, which comes after it.
!
Why such a complicated looking experimental method?
Right now, with pilot MyHub ChatGPT integration, I can use any Prompt I like to interrogate ChatGPT about any Collection of resources, but in all cases I am sending "S-0 summaries" - summaries of the resources generated by "Summariser 0", encoded into MyHub.ai.
However, S-0 generates very short summaries of the notes I wrote about each resource. This is so a maximum number of resources can be sent to ChatGPT at any time, but this happens even when the Collection is actually quite small and I could actually send the entirety of my notes.
This begs some questions - for example, given the same prompt and collection:
Then there's the questions of the Prompts. What's the best Prompt to create a newsletter summary? A knowledge visualisation? To generate ideas for a blog post or a paper? And does each prompt work well for all collections, or only some?
In fact, to optimise the LLM integration plans I need to test quite a few moving parts, so each experiment involves a number of different variables:
One of my first experiments, for example, is called C-1-S-1-150-P-1: testing Prompt 1 on Collection 1 using Summariser 1 set to a max length of 150 words. Let's take each in turn:
When I started investigating this I thought that designing each Agent would just be a question of finding the right Prompt.
For example, Prompt 1 is written to take a Collection of Hubbed notes and:
"write a 500 word editorial summarising the main themes ... in particular highlighting themes common to several articles. Follow the editorial with the articles listed in the following format:
article title
An 80 word summary of the article.
Provide all content in markdown format, with each article title linked to the URL provided with it." - Prompt 1 - newsletter
But if it was that simple, I would have done many experiments by now.
There's a complicating factor: the ChatGPT token limit.
I'm using the ChatGPT3.5 API, which gives me a 16k context - ie, the content I throw at ChatGPT, and the content it throws back, should not be more than 16000 tokens, or ~12000 words. The moment I break that limit, ChatGPT will start "forgetting" earlier parts of the conversation.
After analysing the most active Hubs, I calculated that if MyHub Agents sent the full notes of each Hubbed item, it would break the token limit for any Collection over 15-20 items.
So we need a Summariser: once an Editor has Hubbed an item, MyHub checks its length. If it is over a certain length ("Summary Threshold"), MyHub will ask ChatGPT to create a Summary of the note, of length Summary Threshold.
Moreover, we also need Collection Composer: when an Editor activates an Agent, this algorithm checks the total length of all the notes in the Collection:
In this way ChatGPT gets the Editor's notes of the Collection where possible, and their Summaries if the Collection's notes are either too numerous, too long, or both. The actual algorithm is spelt out a little more explicitly in LLM integration plans, while all Summarisers can be found in summarisers in summary.
So we now have several questions requiring experiments to answer:
Moreover, is the best combination of Summariser Prompt, Summary Threshold and (Agent Prompt or GPT) good for all Collections, or just the one I used in my first round of tests? So I need to test all these variables against multiple Collections.
Hence the name of my first experiment (C-1-S-1-150-P-1): testing Prompt 1 on Collection 1 using Summariser 1 set to a max length of 150 words. These files, the resulting responses from ChatGPT and my analyses of them can all be found in experiment log.
experimental nomenclature - archived experiment round 1 - archived