I love computers because they are a tool for which we have not fully appreciated the expanding potential. Among the latest of their newly discovered features (though old-timers would say “rediscovered” since the 1970s) is the promise of artificial intelligence (AI) with the advent of OpenAI’s ChatGPT. Released in 2022, the program epitomizes the current zenith in AI and machine learning (ML) technology with its novel generative technology.
Since starting a legal tech software company in 2015, I have had several opportunities to work with machine learning because it is a hot sector attracting both media attention and venture capital. I have personally worked on natural language processing projects using the Python programming language to solve a problem, but after several months, I didn’t commit to it, because it did not solve the problem better than existing methods. To sum up the reason, its accuracy never got high enough to be compelling.
WHAT IS MACHINE LEARNING?
The basis of ML is deferring to the computer, the programming of itself. In a way, it is a form of biomimicry, wherein we borrow from biology the concept of evolution: We inject a bit of randomness to force changes in the next generation, and if the change produces a better result, we adopt the newly evolved method and iterate that further. In practice, what we provide are inputs and outputs, asking the computer to learn the patterns so when we give similar inputs, it gives us similarly corresponding outputs.
For example, an input could be a credit agreement, and the output could be a summary of key terms of that agreement. An accurate distillation of a lengthy contract in an automated form — a CliffsNotes on demand — would be a useful tool, and there’s no denying there is demand for it in multiple industries, including insurance, compliance, and capital markets. But as I found out the hard way, the devil is in the details. And here, the primary relevant lessons involve the ever-elusive accuracy of AI and the law of diminishing returns.
WHY MACHINE LEARNING SEEMS AWESOME AT FIRST
The aphrodisiac of ML is that with very minimal initial effort, you can get tantalizing and promising results. For instance, if you put in two dozen samples of credit agreements and a matching set of term sheets, it would generate a believable summary term sheet with, say, 60% accuracy.
The natural expectation is that if we continue to work on this, it will generate amazing results. One thinks, “This only took us a week to do. Let’s see what happens if we work on it for a few months.” And spending several months with more inputs and outputs, the accuracy may increase to 70%. That’s when you would naturally commit and pour resources into it; but alas, that’s also when you realize that progression is not linear. Two years pass with a 75% accuracy, and five years go by with 80% accuracy, ultimately plateauing.
DEALING WITH LIMITATIONS
Even though 80% seems pretty good — it’s a B-minus after all, a passing grade — when you are paid to do work, an error rate of one in four or one in five would result in a reputation hit for your company. So naturally, AI companies responsible for generating deliverables hired a legion of quality assurance (QA) personnel to take the 80% accuracy rate and make it 100% via human intervention. At first, this was OK because it was expected that accuracy would improve over time, just like how Uber planned to use drivers as temps until it built a fully self-driving taxi. Unfortunately, that 80% never flirted with 90%, let alone 99%.
Others took a different approach, which is to let you, the customer, build out the ML model. Their premise is: We will provide the platforms and technical assistance; if it never reaches 100%, that’s not our fault. Over time, though, customers figured out that the technology never delivered on its promise, and if it did near 100% accuracy, its scope had to be narrowed substantially, which converted the issue to another problem: selecting the right model.
What that means is that model A only worked on documents created using form A (say, the institution’s form), and model B (perhaps the latest market deal) only worked on documents created using form B. In this context, the genealogy of the documents becomes important, and the system fails if a document was created by merging the two, which happens often in real life. The accuracy dropped to unacceptable levels until model C could be custom-made for that scenario. Soon, there were multiple competing models, and resources had to be spent to keep track of all this.
I believe the dream of AI will be realized when it actually reduces the headcount of those developing or using it. The whole purpose of AI is increasing efficiency so less human involvement is required for a computer to program itself well. What we have is a system that merely replaces programmers with so-called “AI trainers” who review the integrity of inputs and desired outputs and QA folks who intervene to correct data. Generative platforms like Copilot do indeed claim that they will reduce programmer headcount, but let’s not forget Elon Musk cut 80% of Twitter prior to any AI implementation, and it turned out OK.1
PROBLEMS WITH AI-GENERATED CONTENT
What are some common problems with AI-generated output? The first is what people call hallucinations. In my projects, hallucinations were akin to random numbers/words or streams of thought that had been injected into the output, an error a human would never make. It was so bizarre that I was initially taken aback, but I have gotten used to it.
Newer systems are better at guarding against hallucinations, but, unfortunately, progress is tied to randomness, meaning mutants and deviations are the driving force behind advancement. If perfection in a given model had been reached, we might never know, since it may train itself out of perfection in the name of progress. I believe this inherent design of following in the footsteps of evolution to create “accurate” (low error rate) systems may be flawed — perhaps because evolution is not expected to end, and there’s no correct solution to the problem.
There are practical problems, too. AI can’t backstop a lawyer’s or any other professional’s responsibilities. Anyone getting paid to pass off AI’s work as their own will have a rude awakening by failing to meet their professional obligations, as people have already found out.2 If you have to reread everything AI generates, maybe it is nothing more than a tool to help you out with writer’s block in the first draft. I remember using ChatGPT to draft a contract that I was unfamiliar with. The end result looked nothing like the beginning, but I admit that the program was helpful at the start. It’s just not what it is advertised to be.
PROFESSIONAL TAKE
When Linus Torvalds, the creator of the Linux operating system, was asked about AI, he scoffed at the thought of being replaced by AI anytime soon.3 Dirk Hohndel, head of Verizon’s open-source program office, summarized the current iteration of generative AI as “autocorrect on steroids.”4
I agree. Only with human ingenuity — like ChatGPT typing out an answer even though it fully knows what it will say — can the computer continue to expand its bag of tricks to impress us. Professional programmers are not impressed, because it is their job to create the illusion of competence.
This is the ultimate critique of the current generation of AI. It doesn’t read or understand; it just looks for patterns and generates a pattern of a string of characters in response. Themes, morals, and insights, all of which require understanding and the ability to feel them, cannot be registered because they are drowned out by more numerous noises. In this respect, if the current iteration of ChatGPT were asked to generate a book-length text, I believe it would have trouble making a coherent story, let alone an interesting one with character development or plot twists.
As attorneys, we certainly remember that our law professors and mentors drilled into us that a comma can change the meaning of a text. A comma. Of course, the words “not” and “and/or” can have tremendous consequences. I would argue that even using “the” versus “a” could result in a different meaning in certain contexts, and I know of no computer language model that assigns the words “the” or “a” more than a zero value in weight. The nuances require understanding, and that’s simply not what AI does.
AI’S STRENGTHS
So, is AI useless? Absolutely not. I long pondered about where AI will be the most useful and concluded that it will be useful in places where 80% accuracy is good enough, in a context where the error itself would be drowned out by the proximity of “good enough” data. Take image processing and generation, for example, where an error shows up as a wrongly colored pixel. In the context of a high-resolution image, a pixel essentially is invisible without zooming in. Likewise, in audio processing and generation, an error shows up as an imperceptible blip so short in duration that it will be dwarfed by the intended sounds. MP3s showed that most sounds don’t matter, only the loudest,5 allowing for its magical compression rate for audio files to take root in the 1990s and become the bedrock behind one of the first viral apps, Napster.6
In a domain where the collage matters and individual units don’t, I think AI will flourish. This may be related to the degree in which each unit of data is independent from the others and how much damage an error could impact those around it. A pixel by definition is confined to a rectangle and does not naturally pollute the next pixel, and neither does sound in frequency X at time Y. But in text, a word impacts words that come before and after it. And in law, every word matters, and legal language is rife with examples where certain language trumps others, such as “notwithstanding the foregoing,” or ambiguous situations, like citing an overruled case for dicta.
If I could summarize my thoughts into a single statement about its applicability, it would be that AI will excel in domains where a lossful (rather than lossless) compression is permissible, like JPEG, MP3, and MP4 corresponding to image, audio, and video. If, on the other hand, only lossless compression is permissive, such as with text, then I would argue the probabilistic nature of AI will limit its applicability in such domains.
SHORT TERM VS. LONG TERM
In the short term, AI will be an unbelievable tool for animation7 and music8 studios. It might also be able to generate short fiction stories or summaries that carry few consequences.9 For the public, it may mean lowering the cost of design services, whether for logos or websites, and improved translation services. On the other hand, it will also be a boon for scam artists and others not bound by ethics. Phishing emails will look ever closer to real ones, fake landing pages will look like real web pages, and people will have a harder time discerning the difference.10 All sorts of member-created communities, like Facebook and LinkedIn, and especially the less prominent ones, will be polluted with fake accounts and more sophisticated scams (such as “pig butchering” cryptocurrency scams) preying on the unwary.11
Generative AI will help by giving us new inductive tools, but it will not help solve the problems it creates. What we’ll need as a counterbalance is deductive AI that takes information and narrows it down, accurately comparing its veracity against the vast amount of knowledge we as humanity have collected and digitized, so the truth and insights can be gleaned from it, sort of like what lawyers do for their clients: simplifying complex concepts and detecting and correcting errors from multiple dimensions — not just spelling and grammar but also regulations, market conventions, and social norms. And this has to be more than 95% accurate.12 Given this and current limitations, the best short-term use case would be the hybrid model, where the AI assists and augments humans rather than replacing them, as others have predicted it will.13
In the long run, I think AI will likely play an important role in programming robots. Videos of how people (or animals) perform certain acts (or signals from electrode-imbedded wearables) can provide the input, and the output is the corresponding movement in robotics judged by whether the task at hand was performed successfully. In that respect, I think we will end up preferring humanoid robots (think C-3PO) over mechanoids (like R2-D2) because having congruent body parts will better translate to more efficient self-programming, invariably leading to the creation of objects in our image rather than our imagination.
CONCLUSION
It is a peachy gimmick to ask AI to write an article like this. But you could tell that the style of this writing and the content don’t feel like a ChatGPT output because the content is deeply personal — something I have been thinking about and have refined over a period of several years. Of course, publishing this article will allow AI to consume it and mimic it, but in my view, it will never truly replace the purposeful self-expression of organized thought that is writing.