InsureThink

GPT-4 upgrade improves results, expands application potential

By Jeff Heaton March 20, 2023, 12:07 p.m. EDT 3 Min Read

The ChatGPT chat screen on a smartphone arranged in the Brooklyn borough of New York, US, on Thursday, March 9, 2023. ChatGPT has made writing computer code and cheating on homework easier. Soon, it could make email scams a cinch. That's the warning from Darktrace Plc, the British cybersecurity firm. Photographer: Gabby Jones/Bloomberg — The ChatGPT chat screen on a smartphone arranged in the Brooklyn borough of New York on March 9, 2023.

The much-anticipated latest version of ChatGPT came online earlier this week, opening a window into the new capabilities of the artificial intelligence (AI)-based chatbot.

Processing Content

Developed by OpenAI, GPT-4 is a large language model (LLM) offering significant improvements to ChatGPT's capabilities compared to GPT-3 introduced less than two months ago. GPT-4 features stronger safety and privacy guardrails, longer input and output text, and more accurate, detailed, and concise responses for nuanced questions. While GPT-4 output remains textual, a yet-to-be-publicly-released multimodal capability will support inputs from both text and images.

The potential implications for insurers are profound and should only become more pronounced as the technology improves. OpenAI will continue to release future versions, enabling insurers to more easily implement and customize applications across the insurance value chain – from customer acquisition through claims processing.

Meaningful output

The GPT-4 upgrade is currently available to ChatGPT Plus subscribers only. Compared to GPT-3, the new version better answers questions dependent on reasoning and creativity. According to OpenAI, GPT-4 achieves human-level performance scores for many standardized tests, such as a simulated Law School Admission Test, Scholastic Aptitude Test, and Graduate Record Examination. On a simulated Uniform Bar Exam, GPT-4 scored in the top 80-90th percentile compared to GPT-3 landing in the bottom 10%.

Last month, RGA posed three insurance questions to GPT-3 with mixed results. While GPT-3 provided good answers to questions about the long-term mortality effects of COVID-19 and the future of digital distribution, it stumbled on a more nuanced query. GPT-3 incorrectly surmised that adoptive parents could pass on a genetic condition to their biologically unrelated children. GPT-4 answered all three questions correctly, providing more detail for the two correct answers without adding substantially to the response length.

On a set of 50 underwriting-related questions prepared by RGA, GPT-3 did perform well on those that dealt strictly with anatomy, physiology, life insurance practices, or underwriting. However, GPT-3 was often unable to answer cross-discipline questions correctly. Additionally, assessing the underwriting risks of certain advocations and comorbidities proved difficult.

GPT-4 proved more accurate overall than GPT-3. Although GPT-3 provided 38 correct answers to the 50 questions, GPT-4 was able to answer 47 correctly. The updated model delivered more accurate, detailed, and concise answers by tightening or even eliminating some GPT-3-generated preamble and redundances.

Of course, ChatGPT-4 is not error free. Generally, the further the questions ventured from mainstream to insurance industry-specific knowledge, the more ChatGPT answers degrades. OpenAI is the first to concede that humans must check its work. For example, when asked which parties must have an insurable interest in a policy and whether agents can conduct specific medical tests, GPT-4 answered incorrectly.

Caution ahead

Although GPT-4 offers exciting opportunities for insurance and countless other industries, its potential provides reason for caution. Consider this: Amid a race to incorporate LLMs, such as GPT-4, into search engines, it is possible that queries to Google, Bing, and others will not return a list of pages to read. Instead, the engines might present an answer that synthesizes source material. Such a presentation could prevent the user from reading multiple articles covering topics from various views, which could result in a substantial shift away from websites that provide original source material. Search results could therefore lack credibility by being biased, misleading, or incorrect.

Opinions differ on what effect LLMs might have on the future of society. AI luminaries continue to debate if LLMs have the capabilities to create, plan, or reason. Nearly all experts agree that LLMs work on existing information that cannot expand the frontiers of human understanding.

It is also certain that this technology will continue growing and insurers will explore and identify new use cases. GPT-5 development is already underway from OpenAI, though the official release date has not been announced.

This blog entry has been reposted with permission from RGA.

Jeff Heaton

VP, Data Science, RGA