The first results from my series of experiments to identify the most valuable MyHub Agent prompts are promising and confusing in equal measure.
See experiment 1 - newsletter for all files. Below:
Experiment 1's full title is C-1-S-1-150-P-1, so the three outputs to compare are:
If you check out the original Collection and input files, you'll see that for many Notes there wasn't a huge difference between the AllNotes and S-1-150 versions sent to ChatGPT.
That's probably due to the way I've changed how I Hub resources, creating the notes which are usually summarised before being sent to ChatGPT:
After painstakingly analysing the above outputs across several dimensions, I wrote an executive summary of my findings. Then I asked ChatGPT to write me one. Then I wrote the final one.
My first exec summary, after doing the analysis manually (below):
There was no clear "winner" - evaluated using 8 different parameters, each output had wins for some, losses for others. If I was looking for a first draft I could most efficiently edit into a newsletter, for example, the S-1-150 version was the closest in length and had all links integrated into its body, while the S-0 version highlighted the common themes a little more effectively, and AllNotes gave the least-worst title and article summaries of the least-wrong length.
But most, possibly all, of any efficiency gains from using ChatGPT to write this newsletter could be lost in the necessary accuracy check. While checking the accuracy of each version thoroughly would take more time than I have, every time I looked I found inaccurate hallucinations, misrepresenting the articles summarised.
Those hallucinations, however, could be valuable if I wasn't trying to write a newsletter summarising 14 articles.
Then I asked ChatGPT's opinion: "I will provide you an analysis of an experiment into the best way of using ChatGPT to create a newsletter summarising the content of 14 different articles, You will write an executive summary of the analysis of no longer than 400 words. Here is the original analysis, in markdown format:"
Its full response is experiment 1 analysis chatgpt execsummary (425 words), but the final "Lessons and Recommendations" section is worth embedding here:
"The analysis offered several valuable lessons for optimizing ChatGPT's use in newsletter creation:
In conclusion, the experiment demonstrated the complexity of using ChatGPT for newsletter content generation. While certain versions excelled in specific aspects, no single variant was flawless. As suggested, iterative refinement and exploration of different inputs and instructions could yield improved results in future experiments."
What I wrote initially still stands, but bringing in ChatGPT to help write this executive summary helped me see better what I should be doing in the next experiment(s).
Rather than evaluating different versions and word lengths of S-1 (ie, S-1-250, S-2-100...), I think I'll focus first on varying the prompt: rather than asking for a nice summary of Hubbed resources in newsletter format, I'll explore Prompts which, as ChatGPT puts it, "Use ChatGPT outputs as idea generators rather than sources of literal truth".
Update: see experiment 5 - ideator, where I do precisely that.
And now for some analysis:
Sent the same prompt Prompt 1 - newsletter, the differences between ChatGPT's proposed newsletters are striking.
In none of the cases ChatGPT respected the 500 word limit for the editorial:
WTF, ChatGPT?
The AllNotes response is easy to "Fail": it provides 14 (admittedly good) summaries of 14 articles, but draws no links between them, as ChatGPT was specifically instructed. The other two responses did better, providing paragraphs drawing on multiple articles.
In S-1-150, for example, we find 5 paragraphs like this:
Central to nurturing creativity is the ability to reframe problems. The article "Three Ways To Reframe A Problem To Find An Innovative Solution" highlights how adopting different perspectives can lead to breakthrough ideas. This concept is further reinforced in the article "Ideation: List and Paint your Ideas", which encourages the playful and creative generation of ideas. Building on this, "What I Wish I Knew About Creativity When I Was 20" underscores the significance of collaboration and the confluence of ideas in fostering innovation. Embracing this confluence is the article "Build a Culture of Innovation: Kill Mediocrity", which advocates for dismantling complacency and hierarchy to create a culture of open idea-sharing.
The S-0 version does something similar, but covers the ground in 7 sections (and 8o more words). As it gives each section a subtitle, it highlights the common themes identified, as requested - for example:
Reframing Problems for Innovation: Problem-solving lies at the core of innovation. "Three Ways To Reframe A Problem To Find An Innovative Solution" from Fast Company emphasizes the importance of tackling problems from novel angles. Techniques like imagination, creativity, and entrepreneurship are highlighted to transform challenges into opportunities. Similarly, "The Discipline of Creativity" underscores the need to link creative ideas with actionable steps. The proposed integrative process involves understanding problems deeply, generating tangible ideas, and translating them into action.
Here we are just evaluating the form, so S-0's subtitles wins this round.
This aspect takes the most effort to judge but is, of course, the most important, as when I ask ChatGPT to "highlight themes common to several articles" I can't stop it from simply making stuff up.
For example, consider the above paragraph from the S-1-150 response, starting with "Central to nurturing creativity":
So while ChatGPT's editorial, based on ChatGPT's summary of an article, says that the article"underscores the significance of collaboration and the confluence of ideas in fostering innovation", that idea simply isn't present in the article. ChatGPT's editorial is false.
But is it not also true in a larger sense? While the article doesn't say that collaboration helps foster innovation, maybe it could have. After all, ChatGPT's logic(1) is as follows:
(1) Note that I used the word "logic" loosely. ChatGPT did not "think this through", as I just did in the above bullet points. Instead, it connected the concepts "collaborate" with "confluence" because (I assume) these words are well-connected in its training corpus.
ChatGPT's hallucination, in other words, was a connection that the author did not make. While that makes it a false summary of the author's work, that doesn't mean that it's wrong as a stand alone statement. But it also doesn't mean that it's right.
Lessons:
Finally, how did the different inputs affect the quality of the article list, following the Editorial?
Prompt-1 asked for 80-word summaries of each article:
AllNotes' consistency reduces the editorial polishing time, and so wins this particular comparison. But I asked for 80 word summaries, not ~20 word summaries.
This obviously subjective comparison is made more difficult by the above length differences.
For example, the summary of "The downside of diversity" based on the S-0 input provides more information, while the S-1-150 and AllNotes versions benefit from their pithiness. Which is "best"?
Think about that for a moment: sent Prompt 1 and asked to work with the shortest summary (S-O) of my notes about an article, ChatGPT created a richer and longer summary than when it worked from a longer summary (S-1-150). Moreover, both were more accurate than when ChatGPT was working from my raw, unsummarised notes.
Another example: when given the S-0 summary of my notes about "The idea that creative people are different from everyone else is a myth", ChatGPT gave an accurate summary in the newsletter's article list:
"The article challenges the myth that creative people are different from others and highlights the organizational structure of the company that invented Gore-Tex, which operates in small teams and has minimal layers of management despite its large size.""
However, when given S-1-150 and my (very brief) notes in full, ChatGPT invented: neither the article nor my notes mention "diverse perspectives", "collaboration" or "diverse teams", but according to these versions of the newsletter, the article does.
What do we learn from this? ChatGPT doesn't always summarise when asked. The above quote from the article list is 38 words long, and was generated when I asked ChatGPT to summarise the S-0 summary of my notes. But: