Guest speaker luncheon: Predictive analytics: Delivering on the promise of big data

The excitement over "big data" has grown dramatically. But what is the value, the function, the purpose? The most actionable win to be gained from data is prediction. This is achieved by analytically learning from data how to render predictions for each individual. Such predictions drive more effectively the millions of operational decisions that organizations make every day. In this keynote, Predictive Analytics World founder and Predictive Analytics author Eric Siegel reveals how predictive analytics works, and the ways in which it delivers value to organizations across industry sectors.

Transcript:

Eric Siegal (00:10):
Thanks Nate. Thanks everybody for coming. And I'm going to be speaking about predictive analytics today, which is for the most part the same thing as machine learning. So in 20 seconds, here's why. Machine learning's important. Business needs prediction. Prediction requires machine learning. And machine learning depends on data. So putting that in reverse, we have data, we give it to machine learning, and it generates models that predict and those predictions improve all the main things we do. They render more effective all the large scale operations that organizations conduct, which is why we also call it sometimes predictive analytics. In that way, we boost sales, cut costs, combat risk, prevent fraud, fortify healthcare, streamline manufacturing, conquer spam, and optimize underwriting. So that's it. Thank you. Have a great rest of the conference.

(01:12)
Wait, no, actually I just realized something isn't prediction supposed to be impossible. The future is the only, the ultimate unknown. The weather can be only predicted so well, so far ahead. Human behavior, which is what we're predicting in many of these cases, no less challenging to predict. So how do we put credibility on a field that has the audacity to call itself predictive analytics? That's a credibility question and I'll address partway through the presentation. I'm going to just very well. So what I'm going to be doing in the next hour, and then we'll open it up to questions as we'll. Describe what predictive analytics slash machine learning is. We'll define it, we'll describe how it delivers value across business sectors and for insurance in particular, and then why it's effective. That is why it works to actually deliver business value in terms of two effects, the prediction effect and the data effect. And along the way we'll cover case studies. This is not exactly linearly in this order, so I'm going to kick off with the question. Is anybody here not going to eat their dessert? Okay, great, just save it for me. That was actually a joke, but now that I've gone through the joke, I'm thinking maybe he will save it for me. So a couple quick slides about my work. I'm a consultant, a former Columbia University professor and have been running a conference series myself since 2009 called Machine Learning Week, which consists of predictive analytics worlds for different industry sectors. There's an umbrella event for business and the second one here, paw Financial includes insurance applications. It's actually taking place in two weeks in Las Vegas. I am the instructor for this online course Machine Learning Leadership and Practice End-to-End Mastery. And I wrote this book, predictive Analytics and the subtitle of the book serves actually as an informal definition of the field, the power to predict who will click by lie or die. So as you'll see, that reflects the cross industry applicability of the thing and also that ultimately it's going to be about making predictions for each individual. So that'll be a main theme. I've also got a second book coming out next February, 2024 through MIT press, the AI Playbook Mastering the Rare Art of Machine Learning Deployment, and I'm the executive editor of machine learning times.com for a monthly newsletter and a news website. You can tweet me Predict analytic. Okay, so machine learning in general, as I just mentioned, we're going to define it and we'll come back to a little more complete definition. Analytical methods to render predictions for each individual. But we can also define it by virtue of doing that as a way to conduct data-driven risk management in general across sectors. So for example, this is me skiing and I'm an individual micro risk. So we're going to do micro risk management with these predictive scores. Am I a good skier? Maybe? Am I a risky skier? Provably yes, because this was the after picture where I ruptured my anterior cru cruciate ligament, which is totally fine. Now see, because I did all the rehab and all that stuff, you do so much rehab that it ends up being better. Your knee ends up being better than before the surgery. So I don't put this up here to garner your sympathy. I mean you could if you wanted to, you could give me a little sympathy. That's it. Okay, thank you so much. But maybe you should feel better. For my insurance company, it was expensive. Clinical outcome wise, it was great. I can do this knee walking without any discomfort, which is great. I now am the father of a baby and a toddler. Now what insurance companies do at, well, what they neglected to do is predict how risky a skier I might be. But in general, what insurance companies do are exactly the same core competency that all industries can take advantage of, which is to render a predictive score, a risk level for each individual office worker, right? These are people cleaning the windows of a highrise. Eric Webster said that insurance is nothing but management of information. It's pooling of risk. And whoever can manipulate information the best has a significant competitive advantage. Well, this applies across all the industry sectors because we're always doing business is a numbers game and you do it better by scoring, by assigning probabilities to the likelihood of negative outcomes for each individual. What's the chance that this individual in insurance is going to file a high claim that this one's going to default on their loan and never pay back the bank that this customer's going to cancel or defect and go to a competitor that this customer's going to commit an act of fraud, that a customer will fail to buy your product after you invested $2, sending them a marketing brochure, a very micro risk, but a risk, but a risk that adds up.

(06:41)
Nonetheless, by performing data-driven micro risk management, we can render all these kinds of operations more effective. Douglas Hubbard said that no certified regulated profession like the actuarial practice exists outside what's strictly considered insurance fine be that as it may when it comes to embracing that same core value proposition that is central to insurance and applying it across all industry sectors. In general, machine learning or predictive analytics, is it That's the field by definition that learns from data to assign these risk levels. And with this wide applicability, it is very much taken off and all industries very much are embracing it with an estimated value, an estimated market of over 200 billion by 2027 surveys showing it as the number one priority by chief information officers. LinkedIn top emerging jobs usually has it in the first or second slot in recent years. So we could say that predictive analytics is like Moneyball for money. I made that joke up personally, not today, but thanks man. So with that in mind, it's gotten so hot that people are actually calling data scientists sexy. This is a weird future that I never expected to happen. This famous now infamous Harvard Business Review article data scientist, the sexiest job of the 21st century was written by Thomas Davenport, who wrote the forward to my book and DJ Patel, who subsequently became the first chief data scientist of the United States. Now those two guys are really handsome, but I always thought it was firefighters who were supposed to be the sexiest. So this is a picture of me from Halloween dressed up as a data miner and thought maybe the hard hat could have been part of the formula. Firefighters actually use predictive analytics in order to prioritize and triage which buildings are at highest risk of a fire in order to prioritize the inspections of buildings and likewise where there may most likely be next a forest fire or a wildfire. Similarly, Con Edison in New York City predicts which individual manholes are most likely to exhibit a dangerous incident like a fire or an explosion, and again, in order to prioritize where to do inspections. So I actually try to explore what predictive analytics does to your social life with a rap music video that's fully educational. This is the best ever educational rap music video about predictive analytics ever Purdue. It might also be the only one, but it's definitely the best one. It's only three and a half minutes long predictthis.org. The lyrics are legitimately educational, the visuals are sort of distracting from that education, so I recommend listening to it and trying not to watch the video. So let's turn to a more complete definition of machine learning and we turn to the book, this is a real book. I'm reading it to my son here, machine Learning for Babies and Toddlers, which is a book that I recommend for babies but not for toddlers because the definition isn't that great.

(10:37)
Let's turn to this robust, practical applied definition of Machine Learning technology that learns from experience. Now, by experience I mean data is often considered boring. Data is a deal killer at cocktail parties. I know this from personal experience, I have the data, but that's because people forget that data isn't just a bunch of arcane ones and zeros, it's a recording of things that have happened. It's an encoding of prior events. The collective experience of an organization from which it's possible to learn how to predict the outcome or behavior for each individual customer policy holder, patient business vehicle image, piece of equipment, individual unit satellite that might run out of a battery vehicle that might break down train wheel that might be faulty. That level of granularity of some kind of organizational element is the defining characteristic of predictive analytics. So each individual gets assigned a number typically in the form of a probability, and the higher the number, the higher expected chance that individual will click, buy, lie, or die like in the title of my book, commit an Act of fraud, file a high claim, cancel their Subscription, any and all outcome or behavior for which there's value for the organization to predict in order to drive better decisions. And this last part of the definition is absolutely critical because this alludes to us actually acting on those predictions, integrating them into existing operations to improve those operations. Technically that's called the deployment of the predictive model. There's a big problem in the industry where a great number of models get developed but never actually deployed. Putting it where the pedal hits the metal, where the rubber hits the road, that's where you actually achieve value, that's where you're improving organizational operations. So just so a word here on terminology, predictive analytics and machine learning are largely synonymous in most con or many contexts. Predictive analytics alludes to certain business applications of machine learning, a great deal of them, maybe not all of them. If you are classifying a medical image for radiology or for self-driving cars, does this image have a traffic light? Usually we don't actually call that predictive analytics, but either way it's leveraging the same machine learning methods, decision trees, logistic regression, neural networks, those, there's lots of different technical methods. I'll come into those a little bit later. And when you're using those for business applications, depending on context, you can choose to call them machine learning or predictive analytics. Now all of this is within the broader fields of big data and data science. Both of those fields are actually subjectively defined arenas where it basically just means let's use data to get value in some way. It can mean a lot of different things. Those terms actually don't have very clear definitions. They don't necessarily allude to any particular method or value proposition. Instead, they allude to a thriving culture of people doing creative things to leverage data for value. And the same can be said for the old school term data mining. So in a nutshell, the data comes in for the left and the core machine learning method, also known as a predictive modeling method, churns on that data. That's the number crunching part, that's the learning part. And then what it generates, what it has learned is in the form of a predictive model that I depict in these slides as a golden egg. That's the thing that has the rules or formulas or patterns that have been discovered within and extracted from that training data. And once you have that now in deployment, in the use of that model, we're going to apply it one individual at a time and take what we know about that individual, any of their demographic and behavioral data, input it into the model. And the model in that first step was designed for this very purpose. It now can predictably score, calculate a probability for that individual as to whether they're going to click by lie or whatever the outcome or behavior you're predicting for this particular project. Typically there's, there's one prediction goal at a time. So for marketing, you're looking at a customer today, you're saying, Hey, should I apply the marketing treatment? Should I invest $2 to send them a brochure? If I do, what's the probability that the outcome will be positive and they'll buy our product. So as I open with, we have this major challenge. Prediction, right? It sounds like an audacious goal. Nobel prize winning physicist Niels boar said that prediction's very difficult, especially if it's about the future. And Jay Leno, how come you never see a headline like psychic wins lottery? It's a good question. So here's the answer to that conundrum. It's very simple. The bottom line is that in order to drive value, you don't necessarily have to predict accurately. We definitely don't have a magic crystal ball rather predicting better than guessing as generally more than sufficient to drive large scale operations more effectively and deliver a dramatic improvement to the bottom line. Business is a numbers game. Mass marketing is showing a bunch of darts across a canyon that is a small target. What we do in this way by predicting better than guessing is we tip the odds in our favor and the effect can be quite dramatic.

(17:06)
So let's do a quick back of the napkin arithmetic example for direct marketing. Let's say we've got a pool of a million prospects, a customer list, a millions, a million long, and for each individual that we contact, it costs $2. So let's find out how much profit we make by doing mass marketing. Without machine learning, without any particular method to target, we're just going to hit everybody on the list. Well, if the response rate is let's say 1% and everyone who responds gives us a profit of $220, then the math is really straightforward. We spend $2 for a million customers, we spend $2 million, 1% of them respond. That's 10,000. We make $220 each for those. So that's 2.2 million. We spent two, we get 2.2 back. The profit is 0.2 million $200,000. So how much better would we do if we targeted that in a more intelligent, meaningful way by way of a predictive model, by way of machine learning? Let's say we took out of that million a sample, like 40,000 a random sample that's representative of the whole population and then we test this direct marketing on that sample and then we wait around and find out how it went. Then we get a bunch of positive and negative cases. If we give it enough time, maybe three weeks or it depends on the context, right? So you wait for that waiting period and some of them are positive, they did respond and they purchase the product. Other ones are negative. That's your training data that that's the list of examples from which to learn. So if we now apply machine learning and generate the predictive model that golden egg from that and apply it over the entire list of 1 million and order that list from most likely to respond if contacted down to least likely, remember that's the point of the model is it's going to provide that predictive score for each individual. What are the chances they'll buy if we contact them? So now we've got from most likely down to least likely, and if we just take the top 25% of that list, the ones considered most likely to buy according to the predictive model. Well, if machine learning did its job pretty well and if the data was sound, we might get what's called a lift of three. So a lift of three means that within that smaller pocket of potentially more valuable customers, three times as many respond than average. So instead of an overall 1% response rate within the smaller group, we have a 3% response rate because it's a multiple three, it's called a lift of three. Well, to find out how profitable that'll be, we just have to do the exact same kind of arithmetic we did a second ago. We're going to save 75% of our costs because we're only marking 25% of the list. We're going to forsake all potential purchases from the other 75% of the list. But most of the responses are concentrated in that top quarter. So if you did the same, if you took a calculator and did the same arithmetic, you'd find that the bottom line profit of doing this skyrockets by a factor of more than five to $1.15 million. Such a dramatic improvement, such a great increase in profit just based on more intelligently targeting. No new marketing creative, no new customer prospects, no new product or anything like that, just more intelligently targeting. It's such an impressive improvement that I paid my designer who I pay by the hour to draw this star explosion thing to express my robotic inner feelings about it. Hold on a second. This model totally stinks. It's not highly confident, it's not accurate in the conventional sense of the word it. There's no individual where it says, I think this person's almost definitely going to purchase. It's just a 3% response rate instead of a 1% response rate. This is feasible and it's valuable. So there's no questioning the value. You can alternatively define predictive analytics as a skunk with bling. You're not writing it down. So I call this the prediction effect. A little prediction goes a long way, predicting better than guessing, generally more than sufficient to render more effective large scale operations. So before we go on, now I'm going to talk about another quick marketing example. I'm going to talk about retaining customers instead of acquiring new ones, we're going to talk about some data examples, what a model looks like, and then turn to insurance. But before we continue, I wanted to see if anybody has any questions at this point, or at least clarification questions or burning questions. Yes.


Audience Member 1 (22:30):
One area, maybe it's intentional in addition to the you, one would be deep learning where it essentially just teaches itself as you excluded that, I was just curious if there's a particular reason why, for an example.


Eric Siegal (22:52):
I would never exclude deep learning, deep learning's, the hottest kind of machine learning. Yeah, it's the awesomest. It's like mind blowing, deep learning. So when I listed methods and I, I'll come back to some more, although we'll go, we'll only go in detail with a decision tree today, but I did list neural networks and deep learning is a subset of neural networks. It's sort of the latest and greatest, the biggest neural networks. But in general, all the methods operate the same way. They do take training data and there's some semantic confusion. Sometimes do they train themselves? Well, depends on how you define train themselves. But they all take a bunch of examples where you already know the answer. The bunch of images of that have a traffic light and a bunch of images that don't. A bunch radiology images that have the positive diagnosis and those that don't, a bunch of customers that canceled and a bunch that didn't, and then they automatically learn from that core modeling method that produces the golden egg, whatever kind of golden egg it is, which could be a deep learning model, is a PHD tool. Push Here Dummy, once you've set everything up, it's automatic. So it's deriving the model and assigning all its parameters and or its architecture depending on the modeling method automatically to derive that egg. And the same concept does apply to deep learning. Now deep learning tends to apply when the input is really big. So the full every pixel and a high resolution image or an actual sound sample rather than a few dozen or even a few hundred elements that describe an individual customer or an individual policy holder. So it tends to be overkill for a lot of these business applications, but the general concepts are the same. Any other questions before we go on? So I have two copies of my book and he's going to get one of these copies and whoever asks the next question gets the other copy. I'm sorry, I don't have any other basis for how to do this. So this is it. Sure you don't have a question? Yes.


Audience Member 2 (25:22):
Alright, this is all great, but what happens when you're dealing with a company who doesn't know what to do with the predictive and data that you've given them.


Eric Siegal (25:31):
Right. No, that's a great question. And that's unfortunately the book I'm giving you doesn't exactly answer that question. But the my book that's coming out in February does, and it's the topic of the whole book, which is that that goes back to what I alluded to a minute ago about how so many models actually failed to get deployed. So they might be sound, they might have a lot of potential thank you. But this challenge ends up being not so much technical as organizational. And you need the right business practice to run an end-to-end pro project in a way where people on the ground at operations, on the business side, rather than the data science side, the super technical side, are prepared and understand, are prepared for and understand the meaning of all those predictive probabilities coming out of the model and how they're intended to be used. In fact, those very same people must be involved from the beginning of the project as far as this is how, this is the proposal for the project, we're going to have you change the way underwriting is done by predicting exactly this type of outcome, particularly high claims or something like that, send certain red flags or where a red flag could be a very high probability that only comes out for very rare customer or policy holder or applicant. So that whole plan down to a relatively semi-technical nitty gritty needs to be planned from the get go and socialized in order to get proper and informed and for those people to also be involved during the development of the project because their input from the sort of practical, pragmatic business side very much can affect exactly the way the data scientists are optimizing and tweaking the modeling part of it, that piece of it. So it is an education effort that's needed and one that's not always but very often omitted. So that's, thank you for letting me plug my next book, which is called the AI Playbook. You have to call it AI. And then by page five, I've kind of said, okay, let's be real. I'm talking about machine learning a little bit more concrete than the buzzword AI. So I'll continue on. If you have a burning question, feel free to interrupt and I'll also leave time for questions at the end.

(28:11)
Okay, so a quick, one more quick marketing example, which is not exactly marketing, but the Obama reelection campaign in 2012 used predictive analytics in order to win his reelection, which at the same time very publicly, we had the most famous prognostic quant in this country. Nate Silver, very publicly forecasting the outcome of the 2000 presidential election up until the last minute. And this is a perfect example to distinguish between predictive analytics and forecasting. What Nate Silver does is forecasting. So for a given state like Ohio, and this was from 2016, he puts a probably or rather his model puts a probability on the overall state, what are the chances that this state is going to overall be a win for Hillary? And the probability of that was at the end, 35% for that one state. And now he rolls it all up for a national forecast and he got a lot of notoriety because in the 2012 Obama versus Romney election, his model was correct for every single of the 50 US states. So which was partly luck, I mean the model isn't exactly super confident all the time, but that gave him a lot of notoriety. But that's not what we want to do. If we're trying to win the presidency, there's a little more granularity that would really be a great help. What the Obama campaign did, and this was the first time that it was at least publicly known that a presidential campaign had done this, was that they assigned a predictive score for each individual constituent, for each individual voter and not what I'm showing here. Not whether they would vote for Obama versus Romney, but actually whether they could be influenced, which is technically more actionable. It's a more advanced form of predictive analytics. So instead of predicting just simply the outcome, will this person vote for our candidate or the opposing candidate? It's predicting is this individual influenceable or persuadable? Can we change their mind? Does it make sense to send our limited marketing resources to campaign volunteers who knock on doors and make phone calls and then they targeted direct mail with the same type of method? If we do so, what are the chances that they'll increase the probability of voting for our candidate? Same concept applies for driving commercial buying behavior rather than voting behavior. So in a nutshell, Nate Silver was competing very, very publicly to win at forecasting the outcome of the presidential election, whereas at the same time, the Obama campaign was very secretly performing analytics to win the election itself, which is more powerful that it's so much more actionable when you're on that level of granularity because each individual prediction informs the action to be taken with that individual. Should I knock on this constituent's door, right? In that sense, predictive analytics, because of that granularity empowers an organization not just to predict the future, but to influence the future. Alright, so let's turn to how it works. Let's take a look under the hood, look at what the data needs to look like and what the models can look like. So we'll turn to Kung fu Panda. How many people saw Kung fu Panda? Okay, yeah, I saw how many people saw it with their kids most here. Yeah, yeah, I saw it way before I had kids, but I'm proud of it. So the Oogway, the turtle who was the mentor said yesterday's history, tomorrow's a mystery, but today is a gift. That's why it's called the present. So what do we have today that is a present to us, that's a resource to help us with the ultimate unknown. This is the one thing that nobody knows. We know we're looking at what we know about this customer today and we're wondering what the outcome's going to be tomorrow. Well, we do have two things in pocket where we already do know that have a very similar relationship. We know that yesterday we applied the marketing treatment and that earlier today we found out there was a positive outcome. And this training example, learning example, corresponds to one row of data. So everything on the left of the dotted line is what we knew back when we had to make a decision about how to treat that individual, whether to contact them with marketing In this example, their they're male live in California made at least 10 purchases so far, et cetera, any and all demographic and behavioral data. And then on the right of the dotted line is something that came later, we didn't know until later in time, but now it's already in the past, which was that earlier today. They had a positive response. So there's no prediction necessary. This is an example from which to learn. And if you get a whole bunch of examples like this put into the one example per row, and that's typical, one individual gets one row of training data. And if you can get your data into that form and format, that's the data requirement for any machine learning software that is sort of 80% of what you need to know about training data. And although it's relatively simple, it can be real challenging to get it into that form and format where the meaning of the variables and what you knew at one point in time what you found out at a later point in time is critical to get right. So it ends up being a pretty challenging database programming task that's specialized, customized for each individual machine learning project. And it ends up on terms of the technical hands-on endeavor for any new machine learning initiative. This is not the rocket science part, but it's the majority of hours, whereas the actual learning from that data, maybe that's kind of like rocket science, but that tends to take a lot less time.

(34:48)
Once you get into that form, you juxtapose what you knew at one point in time with that, what you found out later. And you get these insights like people who go to a bar have a higher credit risk, so can Canadian tire issues, credit cards, and then they look at how people use their card in relation to how often they miss repeated credit repeatedly miss credit card bill payments. And it turns out that if you're observed spending at a drinking establishment, you have a higher credit risk in that regard. Whereas you say Oh, there's lots of, except this is not a crystal ball. There's plenty of exceptions. It's trends, right? So I mean just work. This is just what the data tells. If you go to the dentist, lower credit risk, and if you buy the little fell pads that help protect the floor from how many people here bought the little fell pads that protect the floor, but most of you, yeah, yeah, you're that kind of crowd. You really got it together, Okay, drum roll, better credit risk, lower lower credit risk. People who like curly fries on Facebook are more intelligent. Now, this doesn't necessarily, the reason for this isn't because isn't necessarily because you have to be smart to realize how great curly fries are or that they make you smart, but something happens so that there is a relationship for that group. Maybe some relatively smart cluster of friends and then there was a social effect. And so for whatever reason, kind of birds of a feather, we don't really know. Likewise, we've got a study showing that college males, or sorry, it was males of some age group, and by the way, I should take a moment to say something before I forget. Many of these slides such as this one have a link or a note in the notes section. So you'll be provided with the PDF of these slides. So for many of them where you want a little more detail about the example or further learning, some more paragraphs or links and that kind of stuff. In any case, males who skip breakfast have a higher risk of coronary heart disease. Again, it's not necessarily because breakfast is helping you avoid coronary heart disease, but the more accepted explanation is that people who are really busy living a very fast and hectic lifestyle are more likely to skip breakfast and they're also more likely to develop coronary heart disease. Certain neighborhoods of San Francisco that exhibit higher rates of crime also exhibit higher demand for Uber rides. Now, this was put on the blog by Uber and then taken down, but it's the internet. So I've got the link here to some place that saved it. So not necessarily because criminals are riding an Uber, but because it's considered a proxy for non-residential population. These are the neighborhoods with more throughput where there tends to be more people that don't actually live there, and therefore more so there's a relationship, but it's not quite as direct as you might think. So in general, data speaks, if you get into that form and format, you will find these kinds of links, these kinds of connections that help predict that serve as building blocks for a predictive model. And I call this the data effect. So data for all intents and purposes is always predictive. You don't have, you can sleep well at night, you don't have to worry about whether your data will have value. You're going to find connections like these, even if they're hard to explain, they are links, they statistically exist and they help predict, they help improve the odds by way of putting probabilities. Now, as I've mentioned, it can be hard to understand the connection between these things. So for example, we see that in increase in ice cream consumption is linked to an increase in shark attacks. So then you think, oh, I want to explain why that is. Maybe when I eat ice cream, it makes me taste better. So I consume the ice cream and the shark consumes me. But the more generally accepted explanation is that it's seasonal and that when it's better weather, people are eating more ice cream and also going swimming more. So that is to say that these two things which are correlated don't have a direct causal link or even an indirect causal link in either direction, rather they're both caused by a third factor. So this is a perfect example where we want to remember the often heard phrase, correlation does not imply causation. So just because two things are linked in the data doesn't necessarily tell you why. Because when you're trying to understand it and you're trying to answer the question why you're formulating some causal explanation, some description of factors and how they're causally linked causality is at the heart of any understanding, but we don't necessarily understand it. Would you need to perform certain specific kinds of scientific experiments? If you want to do that, you need to do what's called experimental design and collect data that's meant to establish causation. But in the big data movement, and I know we don't call it that anymore, well, they left that old title of this talk, the big, big data. Now we call it data science, but the thing that's exciting about how much data there is is that we're using what's called found data. So data that was collected anyway as a side effect of conducting business as usual. So we're not conducting these experiments unless absolutely necessary, and that's okay. We don't need the causation, we don't need the explanation or the understanding necessarily. All we need is a statistically sound model and where we can see over a large number of examples how well it predicts, even if we don't have any conclusive findings about exactly why it works.


Audience Member 1 (41:22):
Can I ask a question?


Eric Siegal (41:23):
Yes.


Audience Member 1 (41:24):
So going back to the thing, when I saw that, my assumption was people take Uber, they're to walk around in the street so they feel safer taking Uber, which wasn't the explanation. So building on that, is there a risk that, I mean, and maybe I'm right, maybe I'm not right, which is fine, but people come to the wrong conclusion or more to the point machines do and then kind of build that in. Do you know what I mean? So it's like a AI kind of working off the wrong proposition and then it gets further away from the real too, even though it's arguably logical.


Eric Siegal (42:06):
So I don't think that it's so much that machines come to the wrong conclusion. The whole warning of, Hey, correlation does not imply causation. And you hear that all the time. Correlation does not imply causation is a warning for people not to over-interpret, draw a conclusive conclusion, a causal conclusion in light of a correlation. But just because we're not sure of what the causal factors are, just because we don't understand the why doesn't mean it's any less legitimate. That's what statistics is for, right? Statistics is a field that has established, Hey, we can see this happening this many times in the data and therefore we can say there's an extremely small chance it was just due to random chance. I mean, that's the kind of thing you do with statistics, right? There's a much better chance that I'll be hit by lightning tomorrow or whatever. So we know it's real or we have very, very high confidence anyway. And if we don't know the causation, maybe you have a colleague who misinterprets or over interprets it, but that doesn't change the degree to which the model is sound and the degree to which it predicts better than guessing is real. So it's okay right now there are ethical considerations around wanting to understand the meaning of the model. So you do end up with a bunch of religious debates. But in terms of the scientific merit, I think the jury is in, I think it's pretty clear that these are two separate questions.


Audience Member 3 (43:43):
Our goal should be how do I use data to go from predictive to prescriptive trying to take an action on it?


Eric Siegal (43:52):
Great question. So the question of going from predictive to prescriptive in general, the leap from a predictive score to what do you do about it is very often very straightforward. It doesn't require any particular new technology. Some people sometimes use the word predict prescriptive analytics, but really that's not a separate field. There's no particular technology or methodology. It's like, hey, this customer is three times more likely than average to buy. So what are you going to do about it? You're going to send them a brochure. This transaction is five times more likely than average to be fraudulent. So let's hold the transaction, don't let the credit card go through, make them do an authorization. So it's sort of probability in their response. Now, there are exceptions where it gets kind of complicated, and I've already sort of alluded to one of them, which is what the Obama campaign did was they predicted whether somebody was persuadable. That is rare to use that advanced form of analytics. It's much more technically challenging. It does require experimental design in a new data set, but it avoids certain problems that come up with marketing, which is if you contact somebody and they made a purchase, how do you know they weren't going to buy anyway? How do you know that your marketing actually changed, actually caused them? So causality comes back into it, and I'll come back to that a little bit. So a can be a little bit hairy, but just practically speaking, you typically don't have a long jump from the probability to the action.


Audience Member 3 (45:38):
But you still need to know the cause, right?


Eric Siegal (45:43):
I'm claiming that you don't necessarily need to understand the causation. I mean, even now this seems pretty obviously true that it has to do with the weather and you could use data and new data collection to try to establish that. Let's do one more causation example. It turns out that people in a certain high typing work environment where they would get carpal tunnel repetitive motion disorder, those employees who were smokers had a lower incidence of carpal tunnel syndrome. So then you think, oh, well how do we explain this? Maybe there's some mysterious chemical in the bloodstream of some people and it both makes you more prone to smoking and less prone to carpal tunnel syndrome. Well, that would be, I made that up. The more widely accepted explanation is that, well, these are the people that take breaks, right? So this, this doesn't mean that if you take breaks, it's okay to smoke. I don't. But the thing is, when you see the relationship between smoking and lower carpal tunnel syndrome, you and then you see an explanation that seems to make a lot of sense. We still don't know for sure what the causal explanation is unless we conducted experiments to actually address this very particular hypothesis. And it's, but we're not performing, we're not in the social sciences or anthropology, we're not trying to understand what makes people tick. We're not unless maybe you're in the medical sciences, but for many of these business applications, all we want to do is know what's the probability of this particular outcome. And even if you don't have conclusion about the reason why, you still get those correlations and you still get those probabilities. So we don't need to rely on understanding causation from many of these applications in order to get value from the data. In any case, the whole point of a predictive model is to consider more than one of these factors. I went through a bunch of factors like whether you like curly fries and all demographic and behavioral factors, but the model puts them all together, considers them all together in concert, and then you get more precise probabilities. And for these are marketing examples, first, Tennessee Bank was able to lower the direct model cost by 20%, which was an increase in return on investment of the marketing campaign by a factor of six. Because of the move to using predictive analytics to targeting the marketing more effectively, target targets their marketing and prove direct mail by targeting by 15 to 20%. Premier bank cards, a major credit card issuer in the us and they reduce their mailing costs by $12 million with a predictive model. There's plenty of other examples. So what we've talked about so far is driving, driving binary yes no decisions. So we're trying to decide between whether to contact or not to contact. That is the question. So between active and passive decisions, should I reach out and contact this person or not? But in many cases, we're trying to drive decision where there's no passive treatment. What should the PR be for a credit card? What should the pricing be for an insurance policy? Which ad should I show when this website loads for this one customer? What price should I set for any product? So in that regard, I'm going to skip this example. When you apply predictive analytics, it's kind of like optimizing your own personal dating life because you're not in the restaurant for food, it's a sales call, you're the director of marketing and the product, and it's your goal to optimize your treatment of the prospect in order to increase the chances of a positive outcome. So how many people saw this movie Groundhog Day with Bill Murray and Andy McDowell? Okay, virtually everyone. So as you recall, bill Murray was stuck in this repetitive loop and he hated it. He couldn't get out of it until he realizes that it's set for him an unprecedented superpower of testing different treatments on the same prospect under exactly the same circumstances to see which leads to a positive outcome. Let's watch a 47 second clip of the movie Groundhog Day.


Speaker 5 (50:26):
(Video Clip)


Eric Siegal (50:52):
So Bill Murray said you weren't always in broadcast journalism. And she goes, no, I was an art history or a whatever, French literature major. And he goes, what a waste of time mean for someone else? That'd be a waste of time. You must be a very strong person. And then she crumbles, right? She's an actress. So is easy, right? She crumbles and it's over. But fortunately gets a do over, right? Because the day repeats. So can you just play the second half of the audio?


Speaker 5 (51:26):
(Video Clip)


Eric Siegal (52:02):
He says we at the end, for some reason we couldn't hear it. So unfortunately we can't do that in real life. You can go back to the slides, hopefully, and I know my right hand is up for an Academy Award and my left hand's very jealous. So unfortunately, in real life there are no do-overs, right? So the only recourse is to predict a priority, which treatments mostly likely to lead to an positive outcome. So let's turn to the other main marketing application of machine learning, which is called churn modeling, where we're trying to retain customers who are at risk of defection of leaving. So you can consider your customer base as this balloon and the acquisition of new customers is air flowing in from the left and the attrition, the loss of customers is air flowing out to the right. If we could just squeeze that nozzle on your right just a little bit by convincing some of those customers who are on their way out to actually stay, how much more quickly would that balloon be inflating? That's the growth rate of your customer base. But retaining a customer is typically very expensive. You say, Hey, if I think that we're going to lose this magazine subscriber tomorrow, I should send them a discount, right? But you can't afford to expend that discount on your entire customer base. Again, prediction is the only recourse. So let's look at one particular example of a predict in this context, this type of business context of a predictive model. Just to make it a little bit more concrete, what these models actually look like. This is based on real data from Chase Bank before they became, before they merged with JP Morgan, and it's predicting which mortgage holders are going to defect. So in this sense, we're trying to find out the risk that we're going to lose this customer, not financial risk in the typical use of the word, not who's at risk of not paying back their loan and defaulting on payments, rather, who's going to actually pay back too quickly because they went across the street to the competing bank, refinanced their mortgage, and now you know, get all the money back, but you're not going to get any of the future interest payments, which is why you were in the loan business in the first place. So to try to start building a predictive model, it's going to ask one question about one factor, one variable, in this case the interest rate. And if the interest rate is lower than around 8%, then you go left. Otherwise you go right? And this breaks us up into two risk levels, two different groups of customers with a 3.8 and 19.2% chance of defection, which is a really big difference in risk level just based on one factor alone. So the rocket science that is machine learning is there because its whole point. The challenge that it needs to surmount is to figure out how to use more than just one of these input variables. And for one individual considering concert all or as many as possible, of the different profile demographic and behavioral factors that are known and consider them together in concert to derive the MO as precise a probability for that individual as possible. So in the case of what's called a decision tree, which is a tree upside down tree, and this is the top of it, all we're going to do is continue to grow this tree downward. So that's literally the algorithm for how decision tree learning works. So decision tree is one of the simpler forms of machine learning methods. So now we're going to take these two subgroups and say, Hey, what's the next variable I should use to divide that group in two? And we're going to keep dividing them in a downward direction and get something like this. Now once machine learning is done, once the tree has been automatically built over the data, you're going to have something like this. Actually, the optimal tree was about three times this big. So excerpted it just to make a smaller, a little more legible example. But once learning is done, now we're going to use the tree to actually predict for an individual. And the way it always works is you start at the top of the tree, the root because it's upside down, and you ask these yes no questions. If the answer is yes, you go left, otherwise you go right and you get down to an endpoint and the endpoint is the score is the probability for whatever you're trying to predict for that one individual after having considered some number of factors. And that's it, right? So you can think of it as a bunch of if then else statements. If you've done programming or kind of like a taxonomy or you for marketing, it's like doing sub sub sub segmenting except it's doing it automatically and it's doing it to optimize for your particular data set, for your particular prediction goal. And this is one of the most popular forms of predictive modeling methods, also known as machine learning algorithms. A way to learn from that data and derive something that now can be applied one individual at a time to predictably score it. That individual. Now going back to the business context, cell phone companies do this a lot. They predict who are they going to lose as a subscriber and then they try to retain that customer by saying, Hey, here's your incentive. We'll give you a free device or discount or this kind of thing. The problem is that this marketing implementation of the model, deployment of the model has the potential to backfire because in some cases a customer when they receive this marketing treatment actually gets triggered to leave. It reminds them that their contractual obligation is almost expired and then they realize, oh good, I'm free to defect.

(58:08)
And you may have actually caused a customer to leave by way of your intentions to keep them around. So we want to exercise the adage, let sleeping dogs lie. These are called in the industry sleeping dog customers. You want to refrain from waking up the dog. Typically the dog in that aphorism is much angrier looking. That's why you want to let them sleep. And similarly, a customer is going to do the worst thing they're going to leave unless you leave well enough alone. And it turns out, as I sort of have alluded to that with the Obama campaign, because they used that more advanced form, they were able to avoid this kind of thing. They did show that there were some voters sleeping dog voters, where if a campaign volunteer knocked on the door, you actually decreased the chance that you would get a vote for your candidate Barack Obama and increased the chance that Romney would get their vote. So these were exactly the people who were left off the contact list. All the campaign volunteers were not told to pound the pavement and knock on every door in this neighborhood. They were told, Hey, go to exactly these lists of, and don't go to any other house. So they were able to put probabilities on this phenomenon and then avoid it. Okay, so let's wrap up with how this applies with an insurance. What I've talked about so far is how the value proposition of insurance, the core competency, it can now be leveraged essentially across industries for all different kinds of operational endeavors. But how does it work the other way? What does machine learning do for insurance? Seth Eerily said, insurance has always been about predictive analytics. What are actuarial tables, loss history analysis and pricing risk algorithms, if not predictive, right? So it's kind of like that Janet Jackson song, what have you done for me lately? So the answer is, the answer is that there are two main changes when you move from standard actuarial methods to predictive modeling methods. One is that you widen the data, there's more data elements. So in that training data you have more columns. So it's physically, it's literally a wider amount of data. You have all different kinds of demographic and behavioral data. Whereas in standard actuarial approaches, there tends to be a more regimented, restricted set of data elements. So you're going to be able to find maybe unexpected things. What did they do when they came to visit our website? All sorts of other things that could help make predictions more precise. And then second, the mathematical models themself tend to be much open, much more. So it's a lot less restrictive and it's a lot, it's sort of the wild west of anything goes. So instead of it being sort of a regimented authorized method to the degree allowable within these regulations, you're going to try any and all of the fancy mathematical footwork, even up to deep learning neural networks that are going to do a better job leveraging that data and all those data elements in order to put as precise probabilities as possible. So you get these improvements that are all from actual presentations at our conference machine learning week, where insurance carriers saw real improvement by making a move to that sort of wider, more open space of machine learning methods. So one leading international commercial lines insurance policy provider was able to decrease the loss ratio by half a point, which was very meaningful in the scaler operations contributing to savings of almost $50 million a year. Allstate used predictive modeling, actually deployed a learning contest. So in that sense, crowdsourced for the best possible predictive model. And the winner of that contest was worth an estimated $40 million by way of more accurately predicting bodily injury liability. And finally, accident fund ascertains secondary medical conditions from workers' compensation claim notes. So these secondary medical conditions are predictive of high cost injuries. So that particular predictive modeling project was valuable to accident fund. So this is a little technical unless it comes up in the questions, I'm going to skip it, but it's sort of another example where I show, hey, if you get these two different risk levels, 17 to 3%, that makes a huge difference and you don't necessarily have to get a big lift. We talked about a lift of three times more likely than average to buy with a skunk arithmetic. In this case, we have a lift of only 1.7, but it's more than one and it ends up getting these two very different risk groups. So there's plenty of other places, other outside of act, outside of underwriting with. But within insurance where machine learning applies, I mean machine learning sort of applies across the enterprise across all different kinds of functions and operations. Infinity was able to fast track their claims more by a factor of 11 by way of using machine learning, predicting which claims are most likely to be denied or accepted in order to allocate them and fast track them when warranted. And then fraud detection of course, which applies everywhere. It turns out that insurance criminals steal more than 30 billion a year in the us, which results in average 200 300 additional insurance premiums for us household. So in this way, insurance fraud is the second largest white collar crime in the United States. After tax evasion and fraud detection applies everywhere, but within insurance claims you get results like this where you actually had a lift of 6.5 with a major automobile insurance carrier who pres also I saw presented at our conference 6.5, which means what? It means that if it could identify a pocket of claims that were 6.5 times more likely to be fraudulent than average. That means if you have a staff auditing those claims and investigating for potential fraud, if they spend their time on that pocket rather than some randomly selected group, they obviously can't audit them extensively. Every single one of them, if they spend their time on that pocket, their time is literally 6.5 times better spent. So they're going to find six point times as much fraud per investigation. Now when you get that kind of a lift, you could increase the amount that you catch, or alternatively, you can decrease the amount of staff that you need. So that was pointed out explicitly by Citizens Bank who did checking fraud detection. So the predictive model says, Hey, what are the chances that this check is fraudulent? And their model had enough of a lift that could be deployed in two different ways depending on whether the goal was to cut costs or increase the amount of fraud that was detected. So loss prevention was up by 20%. If they use the same staff and audited more likely fraud, or they could retain the existing amount of fraud detection and decrease their staff by 30%. So the same model with the same predictive performance can be deployed in different ways for different purposes. Here's an interesting little anecdote, at least I thought it was interesting, which is that a health insurance provider actually predicted mortality. So this is not a life insurance company it's a health insurance company, and they swore to me to secrecy when they were talking about this because they're concerned about the cosmetics of it. But a top five US health insurance company predicts the chances that elderly policy holders are going to pass away within a year and a half. But the reason is to trigger end of life counseling with regard to palliative care. So where do individuals need more assistance with this? Because they're nearing end of life. They may not have the right family and resources to be championing them. The word championing them, helping them. So just to quickly summarize, we've talked about most of these, but these are all the different areas or many of the areas where machine learning can be applied within insurance. So obviously underwriting for pricing and selection, fraud detection, which applies across sectors. Marketing applies for across sectors just as much for insurance as anywhere else. Workforce analytics, such as which app job applicants. So workforce applications are very much analogous to marketing. It's predicting who's most likely the best individual to try to hire or to try to get them to apply for a job and which current employee is at risk of defection. And then claims management such as fast, fast tracking claims I mentioned. So just taking a step back before I wrap up, there's so many ways that this stuff applies. And one is that it's ultimately the antidote to information overload. So when you do search, you've got an information overload, and that's Google's job is to give you towards the top of the search results, hopefully results that are as interesting to you as possible. They do that with machine learning. Facebook, same thing. Your default newsfeed is ordered based on all of the thousands of posts by your contacts in relatively recent time. Which of those posts are most likely to draw you in or get you to engage or be of interest to you? And this is closely related to that, our product recommendations. So Netflix recommending movies and Amazon recommending everything, both Airbnb and match.com actually show search results ordered by way of a predictive model in terms of which of these individuals for that particular user is most likely to be the best match. So in modern society, our experience is dictated by how we're treated and served. All these verbs done unto us by organizations and more and more of these organizations conduct those decisions about how to treat us based on the output of a predictive model, which drives millions of decisions a day as to who to call, male approved, test, diagnose, warn, investigate, incarcerate, set up on a date and medicate.

(01:09:30)
Yeah, I'm rushing. I didn't got to let the rhyme sink in there a little bit. So I like to call predictive analytics the latest evolutionary step of the information age where we've moved from the improvement of engineering for how to warehouse and manage more and more data to the application of science to learn from that data. Its content, what it means, and the most actionable thing to learn from data is how to make these predictive scores. So Harvard Business Review calls machine learning the most important general purpose technology of our era and thought leader Andrew calls it the new electricity. So we've talked about how the wide applicability, the fact that it learns from data. The defining characteristic is that it's a prediction for each individual and it works in the sense of delivering value to organizations across sectors, across operational functions because of these two effects, the data effect, which is that data's always predictive and the prediction effect, which is that a little prediction goes a long way. So I'm going to make you stay a couple minutes late because we stayed. They wanted me to end at one 30, but I'm going to just sort of break the rules and defy all of you who might need to get to the next. Let's just do at least two questions and then I'll let you go and stick around for those who want to continue talking.


Audience Member 1 (01:11:06):
You talked about predictive analytics, machine learning, and probably more traditional sense. What's foundation models? Large language models and how want to go along?


Eric Siegal (01:11:19):
So Large language models, generative AI both for creating images and for writing copy our applications of machine learning. So they're literally predicting what should the next word be. Now it's not quite that simple. It's the next token, but that's a level of granularity. It more or less aligns with each word. So given this paragraph or these several paragraphs, what should the next single word be? So then let's pick that word and now do the same thing with what we've written so far. It just keeps doing that over and over again. And then it ends up writing something which is impressively coherent and sometimes even correct. Right? So the value of that is probably profound but not equal to the hype currently because it's not reliable. It's not designed to be correct or reliable, at least not yet. I mean, to some degree it is, but that's not been the main thrust. And very few people out there are talking about quantifying exactly how good it is by some objective measure of performance in terms of, for example, I don't know, correctness, right? So the hype is a little premature. It's somewhere between one and seven years premature, but it's not nothing. And it is amazing. I mean, I was in the field of natural language processing or computational linguistics as a graduate student for six years, and I never thought I'd see something like what this thing can do.

(01:12:51)
But it's pretty different than what we've been talking about today where we're just very directly using these probabilities to improve individual operational decisions as opposed to, Hey, let's use this technology to get something that can write in English or any language and then somehow use that to create rough draft for a human to edit. And it's a very different conversation. So that's sort of my quick take on it. Any other final questions? All right, well thanks everyone for your time and I'll stick around a bit if anyone wants to continue this discussion. So have a great rest of the conference. Thank you.