Level With Me: How Complicated is Level 3, Really? With Jason Skow

Show Notes

Wait, wait – where is Christopher?

In this episode of Pipeline Things, Rhett and Chris shake things up with special guest (and new co-host?) Jason Skow of Integral Engineering. Chris joins remotely as the group untangles the complexities of API 1163, reviewing the long-standing assumptions around Level Two and Level Three validations. This episode provides more clarity around the language and why asking the right question matters more than checking the right box.

Along the way, they talk about tax law, criminal law, Star Wars, and medicine. If you’re new here, we promise this podcast is about pipelines…

Highlights:

Level Two vs. Level Three: acceptance, rejection, and the questions we should be asking
How code language has influenced industry perception and practice
Who should be driving validation efforts: operators or service providers?
Why Level Three might just be "doing the math" rather than “extensive validation work.”

If you've ever wondered about Level Three validation – watch this episode.

Connect:

Rhett Dotson  

Christopher De Leon  

Jason Skow

D2 Integrity

Be sure to subscribe and leave a comment or rating!  

Pipeline Things is presented by D2 Integrity and produced by FORME Marketing.   

D2 Integrity (D2I) is providing this podcast as an educational resource, but it is neither a legal interpretation nor a statement of D2I policy. Reference to any specific product or entity does not constitute an endorsement or recommendation by D2 Integrity. The views expressed by guests are their own and their appearance on the program does not imply an endorsement of them or any entity they represent. Views and opinions expressed by D2I employees are those of the employees and do not necessarily reflect the view of D2I or any of its officials. If you have any questions about this disclaimer, please contact Sarah Etier at [email protected].  

Episode Transcript

00:00 Rhett On this episode of pipeline things it is action-packed we get into taxes, Star Wars, API 1163, and medicine in the United States it's all over the map I get a new guest co-host and my old guest co-host is remote for the first time ever you definitely don't want to miss this episode on 1163 00:35 all right welcome to this edition of pipeline things if you're watching the youtube you'll probably see some funny stuff if you're not some of this might not make sense but I apologize here filming at REX in 2026 alone I might add I am on a search for a new host and lo and behold one has been found welcome I will be co-hosting this podcast with Jason Skow, who will also serve as the guest. And the person who will be introduced is my former co-host for this episode, Mr. Christopher DeLeon. So Jason, before I get into the episode. It's tax season in the United States. Is it tax season in Canada? 01:15 Jason It is tax season. Okay. Rhett We're not going political. Don't worry. Tax season in the United States. The tax code is something like 15,000 pages long or something in the United States. Could be. It is written by lawyers, for lawyers, for the express purpose, I believe, of keeping accountants in working conditions. 01:37 Jason And secondary purpose to get your money. Rhett Yes. They could have gotten the money easier and with less pages, though. I want you to know that. They could, yeah. I was told by somebody that the tax code is linguistically written in a way to be complicated and difficult to understand, such that the average individual struggles to make their way through it. 01:55 Jason You want me to confirm? Rhett No, unless you conform to the U.S. tax code, which I don't know if you do. 02:01 Jason The Canadian tax code is just as crazy. And you know what I heard? Rhett You know what it is? They wrote it in the United States and went north and wrote yours. 02:07 Jason It could be. In Singapore, apparently you get like one page from the government says, this is everything we know so far. Is there anything you got to fill in? And it takes like less than an afternoon for everybody to do. 02:14 Rhett I bet you they don't have 15, however many thousands of accountants. Jason I think it's written to just make it quick and easy. Like, yeah, ours is written to be complex. 02:23 Rhett Amen. Well. That's going to actually sound a little bit like today's episode. That's as much as I know about taxes. We're not talking about taxes on pipeline things. We are talking about API standard 1163. And as I welcome my host back into the screen, Christopher, Sarah will now bring you in. We are. Christopher is still my host. For those of you who are out there, don't worry. We are still together. We are still life partners. But Chris cannot be with me. 02:48 Jason But now we're together. But now we are together. You're right. 02:52 Chris For those of you that are interested, I am not intimidated at all. I am very comfortable. So you guys enjoy yourselves. Have a good time. And I don't mind being an observer to this party. Rhett Okay. If you are listening to this podcast and you have not listened to the PPIM episode with Tom Bubenik, I'm going to tell you right now I believe you need to stop and go back and listen to that episode. This episode is very much almost a follow-up to that episode in many ways. And in order to introduce this episode, I'm going to go over some of the statistics from the Tom Bubenik episode. And I'm going to start from bottom up. He had 100 case studies. that they wanted to see if the ILI results could meet a level 2 API 1163 validation based on the field data that they had. Of those 100 results, what they found, I thought was quite shocking, only about 5% had the data on hand to unequivocally meet an 1163 level 2 specification. About half of the assessments required additional analysis. Like in other words, they needed additional data and then it could have been said that they could have passed the level two, but just based on what they had, they didn't have enough, right? About one third of the studies, it said that additional NDE measurements were required to validate the inspections or the inspections would have to be rejected. So there was this possibility that they could be, but they would have needed more NDE and 15% outright said, did not have enough data. 04:28 Jason Does that add up to 100%, Rhett? Rhett I don't know. We have to do it right now. There's a bunch of abouts in there. Thanks for questioning it. You're close enough. The shocking thing, I think, about his paper to me was how many could not adequately meet the requirements of a level two. Now, that episode we probably received more feedback on than any episode we've ever done, and it was largely into two camps. People heard one of two stories when they listened to that episode. The first story people might have heard was, the ILI tools aren't doing what they're supposed to do. And I'm like, that is not the message. The other message that should have come out of it was we are not doing enough due diligence and enough excavations and don't have the data on hand to properly justify the tools for the use cases that we're using them. So for instance, they say plus or minus 10% in their spec. We're just using plus or minus 10% rather than actually verifying. that the tool performed to plus or minus 10%, or 11%, or 12%, or 13%, and then using the appropriate number of validation. What's interesting is you guys follow a paper. which I thought we're going to be talking about today. For those of y'all who are listening, this is the PPIM 2026 paper by Jason Skow and Jed Ludlow.And the title of the paper, I'm trying to get to it on my screen right here, is Advances in API 1163 Level 3 ILI Validations. I really enjoyed listening to this paper in the plenary. And again, I would encourage you, if y'all have the opportunity to the audience to read it, I thought both papers reached very similar conclusions. And I'm going to tell you what I think those are before I turn it over. And Jason, we dive into this because you can agree or disagree with me. 06:13 Jason So far, yeah, I'm going to disagree. 06:15 Rhett Great. Great. I want my old host back. Both papers. Come on. 06:20 Jason Chris disagrees a lot. Trust me. I know. This is true. Rhett Both papers took exception to the level two approach. Tom took exception to level two and thought it was inadequate. In fact, he points to the fact that it's not right. It's counterintuitive that fewer digs benefit you in a level two. The more digs you get, the more penalized you feel in a level two. Both papers came to the conclusion that we should be digging more than we actually are. Like we need to be supporting our ILI data with more excavations than what we're currently doing. And both of you proposed fixes for the level two method. Tom's fix was like, hey, we could actually assess the tools for larger tolerances. You guys took a different approach to the level two problem, didn't you? 07:06 Jason Indeed, we did. Rhett All right. This is where I stopped talking and you tell me all the things I said wrong. And tell me a little bit about what was y'all's paper about? 07:15 Jason Yeah, sure. Okay, well, thanks for that, Rhett. Yeah, so this paper is... we referenced Tom's paper. I mean, it's not a direct follow on. We didn't do, you know, kind of some of the work that he was recommending. I would say more complimentary. Is that a big disagreement? Not a follow on, but a complimentary thing. Rhett You know, the Empire Strikes Back. This might be like Rogue One, but not, you know, are you a Star Wars fan? 07:39 Jason I did not watch Rogue One. Anything before 1984. 07:41 Chris Dude, he's not looking for the Death Stars blueprints, bro. Like, that's not what he's trying to do. He's not trying to undermine anybody. 07:48 Rhett The point was the same universe. If you say these aren't the drugs you're talking about. Nice try. 07:55 Jason Okay, go. Okay, well, I would say it's complimentary for sure. And Tom did a lot of work exercising the level two and the level three and then showing what that means with real data sets. We were focused on the level three because Jed and I and Ryan Stewart, who's also co-author on this, did a lot of the work that was the basis for it. We're looking at the level three and it makes, to me anyways, it makes more sense. It's more straightforward. And the key insight to this paper is that a lot of people will look at level two and level three. I think, you know, you may have referenced that in the intro and say, well, these are quite different or the level three takes more. It requires more data. 08:36 Rhett That's the argument most people make. It's too much data, Jason. 08:39 Jason Yeah, and also the other argument is, well, the bounds get wider. They're wider than the ILI vendor specification. And so is that a problem? Well, here's the thing that the focus of our paper. Our paper says that if you look at level three from a different perspective. you can get the same answer as you get from a level two. You just have to understand what the level two is trying to do. You can get the same answer from a level three as you can from a level two. But when you get different answers, it's because you're asking a different question. And the question is a little bit around what you were alluding to at the beginning, which is what is the performance that we achieved? And how confident do I want to be in that performance? 09:18 Chris So Jason, is it fair to say though, that in 1163, the reaffirmed 2008 version, we actually describe a level three validation as requiring substantial data and it being an exhaustive effort and so I feel like that kind of created the stigma around level threes are complex they require a ton of data which kind of sets the stage for you to bring this solution which is well it's not that complex and we could actually kind of do the same thing you want to do with the level two but through a level three method without having to describe it as requiring substantial amounts of data like was in the previous version 1163 09:55 Jason yeah and you know the the answer that we give in this paper about that is the data requirement to be sure about something is the same in both cases it's the interpretation of what you get with low data sets in a level two and high data sets in level three that people are thinking well those are two different things But you can make them equal. And to be confident about the same conclusion, you need the same amount of data in both cases. It's just we're not used to using them that way. 10:23 Chris It really just boils down to the way it was worded then, right? Because, again, in a level two, the way we define it specifically says, for a level two, it says we don't draw a definitive answer. I'm just saying that's what it says in the 2018 version. And then for a level three, it says it requires exhaustive data. 10:44 Jason Right. And if you follow what we've done in this paper, this PPIM paper, and you do the most optimistic result, you don't need much data either. 10:54 Rhett You're giving away the baby. We're not there yet. Okay. Delivering the baby too early. 11:00 Jason Okay. Okay. Okay. You want to build this up. Okay. Yeah. Go ahead. 11:05 Rhett But I want to step back because you did something in the paper and you did something on stage that I liked. And you've said it here, you alluded to it, but to define it, which is the level two and level three are answering different questions or asking different questions. 11:19 Jason They're answering different questions that you have to ask to get those answers. 11:25 Rhett Do you write tax code? What is this? 11:30 Jason I just try to pay my taxes. That's as far as I get. Rhett So explain, elaborate then. Because you keep saying that if you do that, you need the same amount of work for them to tell you the same story. What what fundamentally is a level two doing and how is it different than what a level three is doing? 11:43 Jason Yeah, okay Yeah, that's a great question Rhett. Thank you for asking that question I think this is where a lot of the confusion comes in and these statistics are a little bit convoluted if you've been thinking about a long time You wrap your head around it. You can see there's differences between them But if you're just looking at from a high level, that's where that misinterpretation I think comes from a level two is based on this really old statistical test called null hypothesis significance test. And what we're doing with that test is we're assuming we have the vendor specification. We're going to assume that that's correct. Right out of the box, no data. That's where we start. And then we start to collect some data. And then we're going to see, does that data disagree with that spec? And if it does, does it disagree strongly so that we're 95% sure that it disagrees with that spec? And if it does, then we fail but if it doesn't then we say it passes and then we just use the spec so that's not a very high bar because what you could do for example is you could say I don't have that much data therefore I'm not sure even if it disagrees it doesn't disagree strongly enough so now I cannot reject it so therefore I end up using the vendor spec but there's a good reason for this this is a well-known statistical tool. And where it comes from is it comes from the idea that the vendor has done their job really well in coming up with a spec, that they've done lots of these flow loops, they've used realistic-type features, they've done thousands of pull tests and so on, and then they come up with a spec. So then we're saying, well, they're the experts. And we expect to get the performance that they're claiming. And unless we get... of evidence that says we didn't get that we're just going to go with their spec that's kind of where that comes from 13:27 Chris But that's an important part we need to pause there right yes that's a very you use the key word that's very optimistic and so the only plug i'll put in here is 1163 at least for our our U.S. clients it's it's incorporated by reference requirement and that standard has a qualification component to it so I just wanted to make that plug-in saying guys When we use a level two, it's because we're optimistic about the efforts that were put in place to establish that spec. But then at the same time, we're required to make sure we understand how that spec was developed. Otherwise, that may need to push you towards kind of where you were going with this paper, whereas it's maybe we needed asking a different question, which is what is the tool's performance? 14:09 Rhett This is like innocent until proven guilty. In a sense, I'm using criminal terms, right? Because we in the United States, we require a high bar to convict someone in the United States, right? It's innocent until proven guilty. In 1163 level two terms, if you disagree, just tell me and I'll restate it again. But in the level two terms, the spec is correct until proven incorrect. 14:37 Jason Yeah. And then the question there is, well, how hard do you try to prove it? 14:40 Rhett Which kind of makes sense, Chris. The reason, and again, maybe the audience is like, duh, Rhett. But for those of you who don't live 1163 and breathe it, like my two co-hosts, both of them here. You see how I called you a co-host this time? Thank you. I always found it kind of funny how the dig stuff works in the level two. Because anecdotally, I observed what Tom and you just described. which is if you have fewer digs and only, I don't know, like three of them agree out of the six, you're like, oh, it meets. And you're like, does it really? That doesn't make sense. But then you get more digs. It does feel punitive. But now I think I'm understanding why, because more digs work against you in that sense. To me, it's coming together in my mind. So if a level two is innocent until proven guilty, what's a level three? 15:27 Chris We use the word pass there. And the reality is the intent wasn't for it to say, does it pass or fail? The intent of the level two was to say, can I reject it and so if the idea is it's that if we put a compliance hat on and just say hey I'm required to do a level two validation what's the incentive of getting more data because then otherwise I might have to reject it and that's kind of where it's it kind of you got to be asking the right question which again I go back to we as an industry has a consistent worded it this way right to say as level two it's can I do I have enough evidence to reject it or not and so we just got to be careful about saying do I accept it or am I really my posture just do I have enough evidence to reject it 16:10 Jason well you know Rhett you were referencing uh you know something that you liked that I did on stage I'm not sure what that is but I'm hoping what it is Rhett It wasn't the part where you dance just so we're clear it was not the part where you dance it was not the opening joke 16:20 Jason okay good good I'm glad we cleared that up because I was I was worried okay but uh there was you know when I was talking about this at PPIM the example that I gave uh for a level two was and this is kind of along the theme of what we're talking about if you take it out of context of pipelines and say let's apply a level two to something else so let's say for example you know you're helping somebody in your family they've got a health issue, they're going to the doctor, they're going to get a treatment. And so you're helping them out and you're quizzing the doctor, like asking them, well, is this treatment going to work? If their answer was, they said, we've tested this far and wide, and for 95% of people, this solves a problem. Okay, that's like a level three. That's what that means. We've tested it, 95% of the time it works. But what if the doctor's answer was, I have no reason to believe it doesn't work? If this is somebody you care about, you're thinking, okay, now wait a minute, now I've got more questions. Like, how hard did you look? How many times did you study this? Because if you didn't study it at all, I'm not confident. I'm not going to let you give this to my loved family member. So that's what a level two does. Now, if you had complete confidence that that system works, and the doctor said, well, I don't have any evidence that it doesn't work, but there's other people that somehow claimed it worked, I have no evidence it doesn't work. But there's a really strong system in place that still might be good enough in some circumstances. But that's the difference between level two and level three, if that makes sense. 17:52 Rhett It does. There's so many absolutely terrible but funny applications I could make. Now medicine. How much more do we have to cover? You have to understand. My whole family's medicine. My wife's a pharmacist. My brother's a doctor. My sister's a physical therapist. My mom's a nurse. I could go down this path, but somebody in my family and likely half of the audience would get angry if I did. That analogy is fantastic, though. In my mind, I was laughing about where I wanted to take that. So I like how you put it, though. The level three actually tells you I'm 95% confident, right? So the distinguishing factor of the level three is it actually measures the performance of the tool. Which makes sense, right? 18:33 Jason And what evidence do we have? What can we say with the evidence we have? Not that there's another claim and then we're trying to gather evidence to say the claim's wrong. 18:40 Rhett But it's much harder. It's much harder to do. It requires much more information, right? 18:45 Jason Not really, because if you look at...Now, this is also part of the papers. The question you're asking, if you get to the same question, you need the same amount of data to get to the same... answer, roughly. Now, the statistical techniques are a little bit different. It's not the exact same answer, but it's very close. And if you read the paper, and this is the work that Jed did, he's got some nice plots in there to show the differences between the two approaches. They line up. They line up really well. 19:10 Rhett They line up when you phrase it correctly, which was really, again, something that you and Jed did very well on stage. 19:15 Jason Okay, there's a second piece in here that I want to make sure is captured in this discussion about the paper. The other thing that we did is we included the idea that the field has error in it. Okay, because when you have a Unity plot, there's data scattered in that Unity plot. Some of that error is because of the field, and some of it is because of the ILI. What we've done in the past is we've often attributed a lot of that to just the ILI, sometimes all of it. In fact, the default Level 3 as it was originally written... in the first IM-106 project for PRCI that I was a part of, was attributing all of it to the ILI in the Level 3. And a lot of operators came back and said, well, you know, that's not fair because we know if we look at the field, we can't measure things exactly the way they are. If we compare them to destructive lab tests, they're not exactly right in the field. And in some cases, people make the argument that the field is on the same order of magnitude as the ILI. Maybe it's a little bit better in some cases. I think it depends a lot on the technique. It depends on a lot of the circumstances. Sometimes the technology is doing the work. their experience, et cetera, human factors. Lots of things go into that. But if we take all of the air and just say it belongs to the ILI, well, that's giving the ILI a little bit of a hard time. So in the one case, we're doing a level two, which is kind of easy on the vendor. But then on the other hand, if we're attributing all the air to the ILI, we're making it too hard on the vendor. So we're not kind of sitting in the middle here. We're not treating the ILI tool fairly in one sense, and we're kind of giving it the benefit of the doubt in the other sense. So why don't we meet in the middle? 20:43 Rhett So that's a good... place to actually pause for a moment because what I want to do audience we're going to come back I'm going to have I'm going to have him describe the test case that they did which was one data set and then we're actually going to talk about the four questions they asked and the results that we got this for me I think was my favorite part of the paper so hang on and we'll be right back with my co-host slash guest slash person of interest Jason Skow 21:15 Rhett All right We are continuing our conversation on API 1163 Level 2 and the PPIM paper with Jason Skow, my new co-host guest, and my old co-host, life partner, Christopher DeLeon, who's with us virtually. You know, when you were describing that whole Level 2 thing, I wanted to just keep going with a bad analogy because it's so shocking. I felt like the Ewoks were coming on screen, and you're like, whoa, what? what is this what is that thing but that you know when you were describing this whole null hypothesis 21:45 Jason are you pro ewok or anti-ewok 21:47 Rhett yeah I think when I was a kid I thought the ewoks were really fun right but when you go back and watch it in hindsight you're like what like what just what just happened so yeah um did that analogy work for you probably not Chris is probably not happy with that 22:02 Chris no it's so good because the the ewoks you think of them as like being like furry and cute But maybe not resourceful, but then surprisingly they are resourceful. So you kind of think of a level two like that, right? It's like, well, it's kind of simple, but I guess I can use it to get the regulators off my back. Hey, I did a level two. You know, it's kind of like. 22:22 Rhett They're riding around a little jet thing. Okay. So Jason, if you don't mind, the case study that y'all presented in the paper, would you describe the details of it for the audience, please? 22:33 sure uh so we had one data set so this is why you know this is one of the reasons I say it's quite different than what Tom did he had 100 data sets we had one data set and we treated it a bunch of different ways to show this equivalence between the level two and the level three if you ask the same question so There's a table in there that, Rhett, I know you'll like. It's a table summarizing kind of some of the results that we got from this one data set, treating it four different ways. So which four ways do we treat it? One, we just looked at the total uncertainty. 22:59 Rhett Okay. And if so, if you want to look this paper up, and Sarah might throw the graphic on the screen, she's referring to table one, and you're referring to the conservative, what y'all titled the conservative estimate. Of total uncertainty. 23:12 Jason Yeah, that's right. The conservative estimate of total uncertainty. The key there is it's the total uncertainty. This means that we're taking all the uncertainty. ILI, field, everything. Yeah, and we're taking what we see in the unity plot, and we're just saying all of that belongs to the ILI. That's why it's conservative. Throwing it all on the ILI. That's what we did. Yeah, and the total uncertainty. No, it's not fair. But this is, you know, the upper and lower bound. That's how we calculated it. Like I said, that's the total uncertainty. 23:36 Rhett So when you did that. You did it with a 95% confidence interval. Do you need to explain that? Does that need explanation? 23:41 Jason Well, when we say that it's conservative, we are saying, how conservative? Well, we picked a 95% confidence interval because that's very high. That means that if we repeat this experiment 100 times, 95 of those 100 times, we expect to get the same answer. 23:56 Rhett If you repeat this experiment with throwing all of the air... the ILI this specific case study 95% 24:04 Jason Or technically the bounds that we calculate will contain the data 95% of the time gotcha so that's what we're saying so that's why it's conservative so that was one case just the total uncertainty but then what we did is wait up 24:13 Rhett wait up you're you got a lot you can't you might be co-host but padawan 24:21 Jason i'm just an ewok invading this podcast right now so you just keep going keep going 24:26 Rhett The answer y'all got for that, because I want the audience to hear it, is plus or minus 14.6%, meaning that the ILI's performance, according to that scenario, is plus or minus 14.6%, which is larger than plus or minus 10%, but as we stated, is unfair to the ILI tool for sure, correct? 24:45 Jason And that's why we... labeled it as the total uncertainty. Gotcha. Trying to get away from the idea, like, this is not all just the ILI, but this is how it's calculated. You just attribute it to the y-axis. You calculate the upper and lower bounds using the standard tools. 24:57 Now, can I ask you a question? Why is there no level two method for that one, right? So the other ones, you have level two methods. Is that because there's no way to answer that same question on a level two? No. 25:07 Jason Yeah, why is there no level two? 25:10 Rhett Asked the question that the host can’t answer. That means we skip to the next one. That's what that means. 25:13 Jason I think we could do it, yeah. That's a good question. Okay. I don't know. 25:19 Rhett Okay, so then the second case study, moving on, audience. So the first case study, plus or minus 14.6%. If you're familiar with ILI technology and Metal Austin specifically, you would expect plus or minus 10, right? So this is larger, but in an unfair, I'm going to say, judgment of the ILI tool or assessment of the ILI tool. The second case, which you called conservative estimate of ILI uncertainty. Explain, please. 25:45 Jason Yeah, so now in this case, we're looking only at the error that we think is attributable to the ILI. Now, remember I said this is one data set, so we didn't explore sort of all the different field techniques and field errors that there could be. We gave a fairly favorable to the field performance for the field error, saying that the field error is quite a bit better. than the ILI error but we're still accounting for some error so if you kind of draw one of those ovals around there that those football shapes it's going to be quite narrow on the field side and quite tall on the ILI side but still we're saying that the field is is accounting for some of this measurement error that we're seeing and In that case, it's still conservative because we're still doing the 95% confidence. 26:26 Rhett So again, 95% of the time, if we run this scenario where we assume small field errors, which makes them, I'm going to say, closer to the gospel truth, if you will, and larger ILI errors. we came up with plus or minus 13.3% as the tolerance. 26:40 Jason Yeah, it kind of drops 2% in this particular case. It drops about 2%. That'll depend on the data set you have, so that's not going to be like a standard rule, but it will drop. Directionally, we'll see that tighten up a little bit. In this case, it tightened. 26:55 Rhett Gotcha. And which is still, again, I want to point out... than plus or minus 10%. And there's a reason I'll keep saying that. I'm going to circle back on it at the end. So that's case two, audience, if you're listening. Again, just to recap, case one, total uncertainty ascribed to the ILI tool, unfavorable. Case two, we gave more favorable uncertainties to the field, but perhaps a realistic uncertainty to the ILI. And then in case three, you called it best estimate of ILI uncertainty. 27:21 Jason Yeah. Now case three, this is a different interpretation. Instead of saying 95% confident, we're saying 50% confident. So this is more like what's the expected performance or what's the average performance? We're not looking at in 95% of the future cases that the bounds that we calculate would contain that uncertainty. We're saying just on average, what's the performance? 27:45 Chris On average. 27:46 Jason Yeah, on average. Just what's the performance on average? 27:49 Rhett Do you know which errors you used with case three? Was it the field ones from case two? The same field ones all the way down. So case two and case three are nearly identical with the exception of all you did is change the confidence interval. That's right. Got you. And when you did that, you went down to 11.4%. Now, I want to restate it. In layman's terms, that's an assessment, a fair and average assessment. of the ILI tool's performance. For this data set. Got you. And it's plus or minus 11.4, which again is larger than plus or minus 10. The last case you called optimistic estimate of ILI uncertainty. 28:24 Jason Yeah. And now this case is equivalent to the previous case in every regards, except we use a confidence level of 5%. 5% is not that confident. That means 95% of the time you're going to get something that is wider or larger than that. So that's why this is considered like sort of an optimistic view. Like what's the best performance that we think we could achieve? Now, the reason we use this is because this is very close to what we used in IM-106A when we calculated the plausible spec. It's very close to that. And so those two become kind of comparable cases. So if you wanted to say level two and level three are different, well, that's just because the way we're calculating them is different. If you do it the way we've outlined in the paper, you get quite close answers. Now, this 5%, this very optimistic view of what the performance could be, there are some cases where you might want to use that, but you wouldn't want to use that. You know, in a case where you're doing, let's say, a risk assessment based on performance to say, well, the best possible performance we could have gotten is this calculated value. Let's let's use that everywhere else or let's assume that that's good enough to do sentencing of features on the pipeline. I don't think that would be a good idea, but it shows you how to make the equivalence between the two. And it also shows what is this optimistic performance if you if you wanted to figure that out. I think a more realistic number is the 50th percentile, which is to say, on average, what do we think? 29:55 Rhett So what I love about this table and what I loved about it at PPIM is the plus or minus, that final answer you got was plus or minus 9.85% audience for the optimistic estimate, which is marginally hair's breadth lower than the stated performance spec for the tools. What I love about it is that makes sense because from my perspective, You're very, very close to the specification under optimal conditions, which is usually, in my opinion, what the specs reflect for tools, whereas actual run conditions in many cases may not be optimal. Second thing is that the best estimate of ILI certainty, which you said your 50th percent one, what I love about that is it's marginally higher than the 10%. And in the episode with Tom, one of the things he kept saying was 1163 currently doesn't allow you to do a level two to something that's slightly larger than this back. And he's like, so plus or minus 10% doesn't work. It's like you dead end. But in reality, many operators would be fine if you told them, hey, it's not actually plus or minus 10%, but it's plus or minus 11.4%. Because in reality, that's probably not going to make a hair's worth of difference in anything that they're doing. But what I like is the confidence in that a plus or minus 11.4 is much greater than the null hypothesis level two that they're trying to prove up. So for me, this is actually amazingly consistent. It helps close the loop with what we were saying for me again. 31:20 Jason Well, let me stop you right there. Go stop. Here come the Ewoks. Yeah, let me bring in my Ewoks. Stop right there, sir. Just to add in, you know, something more to the conversation about that is, you know, obviously in this paper, you know, you know, Jed and I are suggesting let's let's use level three rather than level two. And so here we've shown equivalence between the two. So if they're equivalent, why use one over the other? Well, let me put a couple more points forward on that. And you alluded to them earlier, right, which is if you have fewer data points. You're kind of going in the wrong direction or you're you can't reject something so it kind of makes you feel like oh I passed but you know what does that really mean of course you can think it through what it means but it's not straightforward you know you reduce the data points you know you can't reject so then you default back that is not intuitive that order and it does it does not lend itself to good analysis if you took the 50th percentile on the level three and you said well let's collect more data points you actually get you know a better estimate of what the performance is going to be that's what you expect it to be so that's number one the data operates intuitively the way you think it should for the level three. So why not use the level three? Level two is backwards. And then secondly, when you increase the confidence in a level three, it works the way you think it would work, which is, let's say I'm 50% confidence. That's the average. Let's say I'm 95% confident. That means that I'm very confident because out of 100 cases, 95% of them should follow this performance. What if I'm 98%? Well, that's even more confident. It just makes sense. What happens with a level two? It actually goes the other way. Because if you say I want to be 95% confident, I have to be 95% confident that I can reject. Now what if I want to be 99% confident I can reject? Now I've got to be even more sure that I can reject, so I'm less likely to reject. So that's how the level 2 works. The level 2 is non-intuitive unless you're really familiar with it and working with it a lot. The level 3 makes more sense. And because you can make it equivalent to how we use the level 2, there isn't a lot of need to keep the level two around let's just use a level three let's use whatever level of confidence we think makes sense for the job we're doing recognizing there's different levels of confidence they don't all have to be 95 some could be 50 some could be five percent you want to say what's the most optimistic performance but the level three works intuitively and so that's what this paper is about is trying to encourage people to look more at that level three and use it appropriately the idea that it's more data it's always wider well it's wider if you're you know putting a higher demand and a higher criteria on it but if you bring the criteria back it's it's uh very similar 33:54 Rhett So Chris why haven't you been demanding more people do level threes 33:59 Chris Because I think there's a a big question between who's no this is real because I'm still not sure if I personally have a position on who should be driving this boat. So on one hand, you have operators who have been inspecting a specific piggable segment or a system with multiple piggable segments call it since the early 2000s. Let's just say that with a standard technology. But at the same time, you also have have service providers that know their systems and that also have a one to many relationship where they could be gathering data based on essential variables. And so really it's. While both people should be working really hard to understand the actual performance of these ILI systems and we want it to be collaborative, the reality is it's hard to really know who's the best person to push, right? If an operator is leveraging an ILI service provider because that provider does a good job of gathering data and they have a ILI system unity plot with their level three, or should it really be the operators that are saying, hey, on every asset to understand if I can use the ILI data, I'm going to do a level three. Back to what question are we asking? I'm not sure yet if it's fair to say which of those two questions is correct. What I would say is this is it's we've been picking a long time and I'm not shy about this, right? I think in this case, you know, this paper shows that a specific ILI system, a service provider should be able to tell you what their tolerances are based on a tolerance interval for a level three. So the first question I would say, and you're saying, why don't you press? I would say to all of our operators out there. So before you award work this next time come around, do an experiment. Ask them for the level three assessments on the ILI systems that are being bid and see what you get. 35:41 Jason You know, the other thing I would add to that, Chris, really good points. We want to go into this eyes wide open. We don't want to ignore data. We don't want to ignore history. We don't want to ignore experience that we have. I think there's proper ways to bring that in. So far, what we've been talking about. with this paper is just the data that's in front of us from digs or from other data that we get about that we can put on a unity plot. But some of the other ideas that you're talking about, we've got experience. We've used this technology stack in the past. We have SMEs. We've run this same tool in the same line, you know, not this time, but we ran it last time. And then there's a sister line. It was constructed the same and has the same morphology of threats and so on. And, you know, some people are saying, well, hey, what about all this experience? Am I not supposed to bring that? into this analysis of course of course it should be brought in um I think the level two is not quite the right way to bring it in because if we say like we're going to be kind of ignoring this but we can ignore it because this other thing is also there that's not quite the right way they can compensate for each other potentially But what I would suggest in this case is we do something more like a level three, and then we bring in the prior experience to beef that up. So rather than just look at the data we have on hand, we say, well, we got this data. And sometimes it's very slim data. You might run a tool and get very few data points, very few digs. You don't have very much data. Can you bring some of this other data in to supplement? I think the answer is yes. Do it responsibly. Do it judiciously. I think the answer is yes. I think it's very hard to do that with a level two. I think you can do that with a level three. You can do that with a Bayesian analysis using priors and prior information. And, you know, this is actually a paper I'm interested in collaborating with somebody on in the future is how to do that with the priors, how to bring in that previous experience. So that's real. That's real. And we have to consider it. And I'm not suggesting in any way that, you know, we should discard that or we should be hardlined about this or that. I think we got to bring that information. 37:40 Chris I’ll ask this simple question here is it's for the audience in 1163 we've already I told you we defined level two as indefinite answer and we say level three is extensive validation so i'll ask you this question can you do a level three using six data points 38:02 Jason Well in the IM106 paper we recommend 10. so I would say six is probably too few to do it now the reason for that is I mean you can still do the math the reason for that is is that there's something called small sample size uncertainty and small sample size uncertainty means basically if you don't look too hard you know there's a wide range of what good what reality could be because you haven't you haven't measured too much the smaller the sample size is the wider the bounds get in a level three because they say well you know actually if you've only measured three times or four times you know the reality could be you know quite a bit different than what you have and so it will push out those boundaries and if you have only you know four or six data points those boundaries are very wide they're wide because you don't have enough information they're not wide because your performance was bad They're wide because of that small sample size uncertainty. So what you can do in those cases, if you're doing a Bayesian analysis, is you can bring in a prior, which says they're like pseudo data points. It's like, I know something about this tool. The evidence in front of me is not the only thing I know. That's how you can bring in something like subject matter expertise or previous experience to say, well, that's kind of like data that I have. So instead of six data points, you can start to bring that up. Maybe that's the same as 12 data points or 15 data points or something like that. Six by itself with no priors, you're going to have wide bounds is probably not usable. 39:29 Chris So what I was targeting was two things. One, I remember we had this conversation offline once. And so I was curious if you bring that in. So you nailed that one. I think that was great. And the second thing is to kind of draw this line in the sand around what's extensive, right? So if it's not six, then 10, right? And if it's 10, is 10 really extensive? I would say that the average operator would not assume that 10 digs is extensive. Because they don't have a timeline by which to achieve the 10 digs. Or how they achieve the points. Features. 10 digs exactly so my point with this last segment was rather around the idea of a level three doesn't have to be thought of as this extensive validation work right and I think that was the message I want to bring home and I think you've done a good job today Jason of saying a level three allows you to ask the right question and if we could wrap it up by basically saying and level three doesn't have to be considered as extensive validation it's just doing the math. 40:30 Rhett Do you see what he's doing there? Do you see what he's doing? I want you to witness what happened. That's full circle. What that is is that's that subconscious jealousy coming in because he's not here to control the episode. So he just projected and said, as we try to wrap it up, he's trying to send me signals to wrap up the episode even though he's not here. 40:55 Chris Because I'm not close enough to you to grab your thigh. I'm not close enough to give you the pinch. Rhett He can't feel the energy in the room. And he can't read it. 41:00 Jason Apparently I'm supposed to be grabbing your thigh right now. 41:02 Rhett So on that note, audience. Okay. I want to say thank you very much. Jason, I really enjoyed this conversation with 1163. Thanks for the work that you're doing on statistical front in the pipeline industry. Really think it's great. Love collaborating with a fellow consultant and love seeing, you know, again, two great PPIM plenary session papers and really getting to present on both of them. So thanks for joining us. Really appreciate it. 41:22 Jason Yeah, it's great to be here. And, you know, I think working together, we're trying to make the industry better. 100%. We're trying to improve analysis. We're trying to make the industry more profitable, more efficient, you know, all those kinds of things. You know, so, you know, I appreciate your work as well. All right. Thanks for inviting me. 41:38 Rhett Thanks for joining us on this PPIM episode as we're on Pipeline Things with my guest co-host, Jason Skow, and my guest co-host life partner, Christopher DeLeon. We'll see you again. Thanks. 41:48 This episode was executively produced by Sarah Etier. Thank you to our guest, Jason Skow of Integral Engineering. If you'd like to reference the source material, the PPIM 2026 paper advances in API 1163 Level 3 validations by Jed Ludlow, Jason Skow, and Ryan Stewart is where you should look. Finally, thanks to PRCI REX 2026 for the use of the room in the filming of this podcast.

Previous Episode

Level With Me: How Complicated is Level 3, Really? With Jason Skow

Show Notes

Episode Transcript

Other Episodes

What do a burger menu and crack detection technologies have in common? With Rogelio Guajardo

Come for the Turbocharger, Stay for the Brisket, with Rienk De Vries

Rhett Third-Wheels Christopher’s Discussion about API 1163 Level 3 Validations with Jason Skow