LLMs are popping up everywhere - calling them the new hotness is an understatement. Damn near every new product you’re going to hear about in the next 6 months is going to have an LLM angle or feature. This experimentation is great, it’s what drives us towards the future, but the future is a monkey’s paw and brings both new rewards and challenges.
Near-human Quality
LLMs are becoming a chief creator of spam on the internet. Its possible to generate near-human levels of writing on any topic very quickly. Near-human is the catch here as there is still a quality difference between what GPT is spitting out and a human can craft.
Consider the case of Clarkesworld, a sci-fi magazine that will take submissions for stories, select ones that they would want to publish, and then pay the authors if they run their stories. Seems pretty great yeah? Sci-fi enthusiasts subscribing to a magazine that pays authors who are trying to get published.
Well, lots of people decided to be lazy and see if they could have ChatGPT generate a story for them, toss it in the submissions pile, and hope that they got selected as a story and get a pay out.
The catch is that all the ChatGPT stories were of obviously low quality. This is where “near-human” matters a lot. You can generate stories like this, but the results are not of a quality that you’d be proud to attach your name to.
The result of this flood of submissions was for Clarkesworld to stop accepting submissions for the month of February until they figured out a better way to vet stories for quality.
In this world of being able to casually generate near-human levels of writing, how are we to sift through the deluge of content and distinguish between generated and human crafted material?
Hard to verify
With Clarkesworld the editors were able to find generated content because they were looking for it - it was a sort of low quality crap that didn’t pass their standards.
Planet Money explored the problem from a different angle - would it be possible to have an AI write and voice and entire episode? The three episode series is worth a listen, but the final episode consists of both the results of the experiment and the hosts/producers of the show reacting to what was created.
The hosts reacted in the same way Clarkesworld did - that the resulting show was not at the quality level that they would want to attach to their name. Phrases like “first draft”, “repetitive”, or “I still have a job” are tossed around. Generally, I agree with this assessment. Planet Money is a great show and its creators put a ton of effort into it, effort that is seen in the final product.
However, I listened to this show knowing that it was generated. Had this been playing in the background and I wasn’t paying too close attention I probably would have thought it was a weird and kinda crappy episode, put together by some new people in the industry.
The same thing goes for text and images that are generated - you have to put in some effort to verify if something is legit or generated, more than you used too.
Take the balenciaga photo of the pope from a few months ago:
This 100% got me the first time I saw it. I didn’t think “ha, creative use of Midjourney”, it was “sure, the pope deserves a puffy coat”.
It takes more-than-a-glance levels of effort to spot the fake here. From questioning the premise itself (Why does the pope have drip?) to looking closely at physical issues (Why does the necklace not have a chain on the right side?) or unnatural poise and posture (Why is the pope’s right thigh bulging out?)
And this is the world that we’re in - generated content is near-human in terms of quality. It takes a non-trivial amount of effort to figure out if something is real or generated and if you’re not ready for it its very easy to get fooled.
Untrustworthy
Low quality, hard to verify content, that’s cheap to generate. That’s not really a great combination for building trust with these tools. Moreover, when trust is blindly put into these tools its possible to really get burned.
The “ChatGPT Lawyer” did just that. In this case the lawyer asked ChatGPT to write a legal filing for him, and it did so, making an argument based completely off of made up case law, complete with citations. It turns out that making up case law for an argument before the court is a good way to get judges particularly mad. From the reporting of the linked article:
There was no silver lining in courtroom 11-D on Thursday. At one point, Judge Castel questioned Mr. Schwartz about one of the fake opinions, reading a few lines aloud.
“Can we agree that’s legal gibberish?” Judge Castel said.
“Did you read any of the cases cited?” Judge Castel asked.
“No,” Mr. LoDuca replied.
“Did you do anything to ensure that those cases existed?”
No again.
Here again we see the issue of near-human levels of content generation, but that falls apart once a more keen eye is turned to examining what was produced - “Can we agree that’s legal gibberish?” is probably one of the last things a lawyer wants to hear from a judge.
This is the trust issue of generated content. Its easy to miss as low quality nonsense that someone pushed out to complete a task, but upon further inspection its seen that not only was the content generated it was completely wrong in the first place.
The lawyer here should have put in the leg work to verify the output of ChatGPT, but what happens when he does find errors in legal arguments or case law that doesn’t exist? Is the entire resulting text suspicious? Can you trust any of the output if the core of it is a hallucination?
The Monkey’s Paw closes
LLMs are content generating machines. It’s at a near-human level of quality and requires more-than-a-glance levels of inspection to determine if its generated. Its content thats hard to trust. It’s also cheap to generate and send to spaces that humans generally occupy.
The places where humans exist on the internet - reddit, twitter, facebook - are primed to become flooded with low quality, hard to verify, and hard to trust content. The amount of effort its going to take the average person to sort through this sort of material is much much higher than it is today.
In some ways, nothing has changed. “You shouldn’t trust anything on the internet” was such a common phrase growing up. It’s because people could just lie, make stuff up, and otherwise troll. That’s still around, but now it’s possible to be a troll with the power of generating legal documents, images, and other evidence to be even more convincing.