January 19, 2021
| Season

B2B experiments on low traffic & building your own platform


Chad Sanderson




We learn how B2B companies can run experiments too, even when you don't have heaps of traffic at your disposal.
Recorded during
This is some text inside of a div block.
Audio only:

Episode guest

Chad Sanderson

Head of Product, Data Platform
View full profile

Episode host

Guido X Jansen

Guido has a background as Cognitive psychologist and worked as CRO specialist on both the agency and client-side. In 2020 Guido won the individual Experimentation Culture Award.
Guido X JansenGuido X Jansen


Book(s) recommended in this episode


Please note that the transcript below is generated automatically and isn't checked on accuracy. As a result, the transcript might not reflect the exact words spoken by the people in the interview.

Guido: [00:00:00] Chad Sanderson, who is the head of product for a data platform team at school voice, where he manages school, voice internal experimentation, and machine learning platform. Along with data warehousing, streaming and discovery tools. We're talking about a why they build their own experimentation tool. We talk about how they are managing their internal experimentation culture and how to run offline experiments in case you missed it.

In the previous episodes, we spoke with Tim Ash about his new book unleash your primal brain. You can listen to the episode on the Ciroc fate website or in the podcast app you're listening with right now. Welcome to season three, episode two.

Chad: [00:01:03] Right now I'm working at a company called convoy. They're a digital freight brokerage. I can talk a little bit more about them later, but as for me I'm running their data platform team. So the data platform. Is a sort of a multi-disciplinary data related. Pod space, which includes like our data warehouse, all of our data, infrastructure, tooling, data, visualization, our machine learning platform.

And then also our experimentation team, which we've not only built a center of excellence there, but we've also built a tool from the ground up before this, I was at Microsoft where I helped run their experimentation platform. Along with Ronnie, Kohavi sort of the grandfather of AB testing. And then part of that, I was at Sephora in some way, doing very similar things.

Guido: [00:01:50] Yeah. And you're now at a convoy for roughly one year, about a year. And what was the status of CRO at convoy before you started? Yeah,

Chad: [00:01:59] they cobbly was very they love testing. In fact, part of our. WBR which our weekly business review, where all the leadership sits down and looks at key metrics and how they're performing.

Most of those actually have to be validated by an AB test. So experimentation is a pretty deep part of the culture. I think some of the other aspects of CRO that you and I are familiar with, haven't gotten quite as deep yet, especially on the marketing side, but on the product side there, they're doing a pretty good job.

Guido: [00:02:31] Okay, nice. And one of the topics that we want to talk about and touch upon today is building your own experimentation platform. So was it already done when you started that?

Chad: [00:02:44] We had a pretty rough tool. It did a few things relatively well. It did a few things very poorly.

So over the last year, we've put a lot of time and effort and energy into actually making it serviceable to most of our customers. Yeah.

Guido: [00:03:00] So many people might now be wondering why there's so many options on the market. Why would you build your own your own AB testing platform?

Chad: [00:03:09] Yeah. Yeah, that's a good question.

It's also, usually the first one. I always get, bring up that we built something. The main reason is that when we were essentially deciding what direction we wanted to take the team, all of convoy as a whole. We sat down and we thought about the problems that we have specific to our business and what our problems were probably going to be in the future, and then looked at the market of AB testing tools and says do any of these actually satisfied those problems to a degree that we would feel comfortable with and happy with?

And our answer to that question was no. And there were a few specific very specific problems that we absolutely needed. Solved. So for example we needed the ability to tie all of our experiment back to metrics like margin and profit, and sometimes very algorithmically complex metrics that are all calculated offline.

And at the time, the tools that we saw couldn't support that we also needed a lot more complex statistical test designs than a T tests. We Cabo has a super small business. We only have around 50,000 customers or truckers that we work with and a much smaller number of shippers around a hundred or maybe even less.

So if you don't have a huge sample size, then your test design has to be relatively complex. It has to do a lot of interesting things and we needed a tool that could support a pretty wide degree of complexity. And we just didn't see that out there.

Guido: [00:04:44] Yeah. So in that sense, it's more like a B2B, a smaller business case that might be then it might be helpful to build your own tool, but then again you still need to have the luxury of being able to build it.

So do you have developers in house? Is it something you outsource? How are the,

Chad: [00:05:00] yeah, we have developers in house. So there are four right now. There are four engineers. That working on the experimentation platform, I think for the first year or so, there was two, I didn't join as a product last year. So now we have four engineers is one engineering manager that sort of manages everybody across all of data platform.

And then myself to help out with vision and direction. Yeah.

Guido: [00:05:28] And I almost hear our listeners through the channels screaming, but how did you build a business case for this? Some people already did this. Hiring for people to build your experimentation platform. That sounds way more expensive than than just using a random AB testing tool.

Maybe even the most expensive one. Yeah. Yeah.

Chad: [00:05:51] That's a good point. I think it probably is. I think so, like I mentioned when we did our cost benefit analysis. The question that we asked is if we paid and I don't know off the top of my head, how much  AB testing third-party AB testing tools cost these days, but we'll just throw out a number and say like $300, $300,000.

Okay. $300,000. That's basically at startup costs, like maybe one and a half developers maybe even make the case for one, if it's a pretty senior developer. If we go with a tool that doesn't allow us to do the things that we functionally needed to do, where are we going to lose experimentation?

Like where are we not going to be able to do tasks? We're actually not going to be able to do tasks on the vast majority of our machine learning models. And machine learning is the essence of Conway. And take a step back to explain what convoy is, because this may make more sense.

Commonly is a digital freight brokerage. And that means that when a shipper has freight, do they want to move between two points on a map? Usually they don't go directly to the trucker. That's going to be carrying that freight. There's just so many things that could go wrong. The carrier might have a family emergency, and then all of a sudden you have this critical load.

That's just sitting there and now there's no backup. So a broker sits in between the shipper and the trucker. And as free comes in, they're the ones responsible for making sure it's paired up with the right most economic person to take that load. Convoy has entered the space and said, we want to try to do as much of that matching as we can with data science and with machine learning and algorithms.

So ML is literally the core of the business. And if we weren't able to run experiments on ML and the way that we wanted the, why you didn't get a testing tool, right? Like we could take the money to optimize our app a little bit, but the impact that we see from doing that, just to throw out a relative number.

We might see a couple hundred thousand dollars in improvement from UX changes. We might see several hundred million dollars in improvement from changing our experimenting on our algorithms. So that was what was the trade off. We have to make if paying the money for the developer allows us to do experimentation on this hugely critical functionality, it was actually more than justified.


Guido: [00:08:12] Yeah. So in that sense, it sounds like a relatively easy business case. The existing platforms don't even do what you want them to do and don't apply to your business. But was it the completely complete starts from scratch or were there some things in the machine learning community and that there's a community out there that you could use or reuse anything open source?

Chad: [00:08:34] We didn't open source anything. There's a few things, there are some things that are open source that you could reuse if you want to take a less expensive route than we did.  At least there's some stuff now. I don't know if back then when the platform first started, which was actually a couple years ago that there was anything of any good out there, really the most complicated piece from an engineering perspective that we needed to think about was assignments.

And that's the process of when a randomization entity, it could be a user in our case, we like to randomize on shipments and lanes, which is like from Seattle and San Francisco, essentially a distance between two points. Whenever that unit, that new entities available to have a test run on it, you have to randomize them.

And that seems like something that you could do relatively easily by flipping a coin. But it's actually not that easy. There's a lot that you have to think about when it comes to scale. If you get, a million requests to randomize in the span of a second, which can happen if you're, if there's like a bug or something like that, you don't want the whole business to fall down.

If you're dialing traffic up and down, you don't want people to get re randomized every single time. So that was an early investment that we had to make. That didn't really have an open source solution. Yeah, that was probably the biggest one.

Guido: [00:09:50] Yeah. Were there are big changes this year and all of those algorithms that change based on all the things happening in our world.

Chad: [00:09:58] Yeah, there were right. Yeah. Yeah. Definitely. The freight market in general is super volatile and it's very sensitive to, especially sensitive to things like COVID, but even seasonality, we see freight the margin that we're making on any particular shipment will. Really fluctuate wildly throughout the year.

So when COVID hit, we saw some of the biggest swings that we've ever seen. And we had to really think about how we could tune our models more for stability. So during that time, experimentation was actually super.

Guido: [00:10:31] Yeah, I can imagine. Yeah. Yeah. Interesting. And how do you get everyone in the company?

I can imagine this is quite Quite a complex matters that you're trying to optimize for it's machine learning. On top of that, there's a lot of experimentation. How do you get the company? How do you get buy-in from everyone in the company to, to embrace

Chad: [00:10:50] this to embrace experimentation just in general?

Yeah. Yeah. The way that I've found a lot of successes by focusing on measurement and by asking the question. If you're making some change, whatever that changes, it could be a web change. It could be a change is happening offline. It could be a change to a model, could be a strategy process change.

How do you actually know that the thing that you did work? Like how do you know

Guido: [00:11:16] for sure. So basically you asking questions to make people insecure. Yeah.

Chad: [00:11:25] Make them insecure. But also oftentimes most people who run these teams are very smart, right? Like they know what they're doing. They know that they probably can't measure it very well.

And when you, when the perspective that you come at them with is. Experimentation is a solution to a measurement problem that you have. And because what their job is to do is to make the company a whole bunch of money, prove it, and then increase their head count. If I can go to you and say, I can help you prove it, meaning I can help your team get funded better.

Then that's a much easier way to get adopted than here's 10 test ideas that you can run that I think might make you money, but you have to give me resources for that.

Guido: [00:12:06] Yeah. And in that sense, do you feel like that's a culture of innovation and a culture of experimentation? Is that

Chad: [00:12:12] similar or culture of innovation and experimentation are very similar?

I think it's slightly different. I think that a culture of experimentation is almost like saying a culture of science, meaning when we are making a claim about the world, Can we point to some evidence, usually statistical evidence that backs up our claims evidence, a culture of innovation. Isn't quite the same as apt, but it's closer to, are we consistently taking risks?

Are we taking chances? Are we doing things that, otherwise people in our position would be scared to do. And then, or can we combine the two to then measure those risk-taking opportunities with statistics?

Guido: [00:12:59] Okay. And this is also something that's that your platform tries to enable within the company or pure, purely for any experiment that none has done or,

Chad: [00:13:09] For our platform, we focus pretty much on  that's not exactly true.

So we do focus pretty strongly on experimentation, but our platform is, was designed to be super flexible. So the idea is. Any type of statistical model you want to use, you can use it, any kind of metric you want to use. You can do that and you can do it all, basically for the most part out of the box.

Any type of dimension you have, you can do that. And so you can slice it by anything. So the practical effect of that, that we've seen is that people will use the tool to do some really crazy stuff and to test. I'll say innovative things that most people using a third party platform maybe wouldn't test.

So as an example we have this concept of combo of pilots. Where a pilot is just something that you just do, you turn it on and then you have to have some way of monitoring it over time.  You're on the ops team, for example. And you're trying to build something that maybe makes the op team slightly more efficient, or maybe automate some of the emails that they get.

That's not really something that you can AB test. You don't want 50% of your ops team to have to open emails and they have a standard operating procedure and it doesn't match what the rest of the ops team. So they just turn it on. And what we've been seeing is this move to using an experimentation platform to start measuring these types of changes as well.

Guido: [00:15:03] What kind of KPIs are you guys optimizing for? So if you have all these different teams doing the experimentation, are they optimizing for the same thing ultimately,

Chad: [00:15:12] or. No most teams are most teams. You have a specific set of metrics that they're accountable to every half year. This is like the, OKR type of approach.

Yeah. We do have what we call core metrics. So these are things that the entire business really cares about and we haven't implemented this yet, but what we're trying to move to is. Any of those core metrics that can be run on an experiment. So like variable cost per shipment is something that we like to think about.

Are we reducing or increasing the amount of variable costs every time, a new shipments or comes in on our platform? That's a really important metric to the company. And if somebody is running a shipment level experiment, then what our platform do is automatically track that metric whether or not the team actually wanted to include it.

Guido: [00:16:02] Sure. And then they big the statistical model themselves. You're saying. And of course, it's nice to have that a that's great to have that flexibility, but it also puts a lot of responsibility in the person designing the experiments. Of course. So how do you make sure that people yeah they pick the rights statistical model.

Chad: [00:16:20] That's a good question. For the most part at convoy, our experiments are designed by data scientists. So these are generally not people who are like encountering experimentation for the very first time. They usually have a pretty good concept of this is this statistical technique that I want to use.

That makes the most sense for my use case. And we're just giving them a forum to select that. Now I do think something that we should probably do as the future goes on because we do want to make experimentation more accessible to everybody. Is probably how a set of predefined templates say, okay, if you want to do an offline experiment and here's some criteria that you have to meet, then you just run this one.

If you want to do an online experiment, and it's a relatively simple AB tests on the internet, then you can just use a T test and we bundle all that stuff.

Guido: [00:17:15] Yeah.  Already applying some machine learning techniques on the. Experimentation platform itself to help the experimenters, right?

Chad: [00:17:24] We're not, but that is probably gonna be our some of our goals for the platform in 2021. And. Really that's that's less about being smart right now. Like being really smart about which test wins and which loses in these types of things and more about, can we automate some of the work that our analysts and data scientists are doing and allow a machine to take over.

In places where a human being, doing all this digging and segmentation to find interesting things is pretty unreal.

Guido: [00:17:58] Yeah. And that's a, a lot of data scientists and a lot of web analysts are. Often, sadly used to create reports, right? That's not something you should you should be doing actually, you should automate all those things.

And then when automation goes wrong, sure. You step in, but it's not something that's a data scientist should be. Yeah. It's

Chad: [00:18:20] a waste of time, right? Like they're the data science best resources, their brain. And it's the critical thinking part that's important and valuable. It's not like the, doing the same thing over and over again, managing the data, putting it into the right format, finding out leaders like a machine can do all of that stuff.

Guido: [00:18:38] Yeah. Don't waste that brain on creating PowerPoints.

Chad: [00:18:42] Exactly. Yeah.

Guido: [00:18:43] When are you going to spin off the product and creating your own AB testing company? Probably

Chad: [00:18:50] never.

Honestly, there are some great tools out there. I would not in the, anybody in the AB testing space. I think you saw Optimizely got acquired. How long ago? Pretty recently, like a couple of weeks ago now,

Guido: [00:19:08] a couple of weeks ago.

Chad: [00:19:10] Yeah. Optimize. It got acquired and they were the golden boys of this space.

And yeah, I don't know. I just wouldn't want to there's a lot of competitors that do a lot of really interesting things. I don't know if I would be interested in going head-to-head with any of them,

Guido: [00:19:27] but it does sound like there's a big gap in the market, right? For for B2B companies that are open to experimentation and open to learning.

But don't have the traffic on their website maybe to do it. But Gail Talk to those companies, what are the things that you can experiment? How can you be creative in that maybe you don't have enough traffic on your website, but one of the things that you can experimental. Yeah.

Chad: [00:19:51] I'm sure you know this, but there's a inverse correlation with the amount of traffic you have and the size of the impacts that you need to see before you can detect something.

Conway has that same problem and there's really no. Statistical algorithm that can never sort of change that fundamental issue. So that plays a really big effect on how we do experimentation. Instead of looking at things like let's optimize a UX flow, let's change a button color here or there, or maybe try changing some copy.

The only things that we look at are big changes, which kind of flies in the face of typical advice, make an incremental change. And then you'll be able to prove that this is the real thing. And this is what caused the change. And then you can increment from there. I actually think that for a lot of companies, that's a really bad way of testing.

Because incrementality is not going to deliver very many business results.  Instead. Really taking a long time to think about your customer, thinking about what are the major new features and offerings that we can provide doing some deep user research. And pre-testing on that. And then having a really strong opinion on, I believe that this thing is going to be a giant improvement for the company and then using experimentation as a validation tool for all of that research, you've done.

That's the process I would suggest to a B2B. The last thing I'll say about that is I would, a lot of people actually don't know, is it a lot of these techniques, take this sort of incremental approach to experimentation. It started at massive companies like Google and Microsoft.

So for them it makes sense, right? Like when you have billions and billions of users that are coming to your site every day, the amount of change that the amount of impact that you can detect is Extremely small and you can detect it in a couple of days. So it actually makes total sense for them to make those incremental changes because they can see them all and they can see them really clearly.

But for everybody else, it's insane advice. I think.

Guido: [00:21:54] Yeah, exactly. And I think for all the dogs that I have with people in Shiro, the most creative things come from people that don't have that amount of traffic. The amount of traffic if you have a lot of traffic, it basically makes you lazy.

Yeah. Dampers your your creativity is not great for your innovation. And and usually and it does bigger things are also, for me, at least they're more fun to work on.

Chad: [00:22:20] Yeah, I agree. All we've seen some really cool stuff in combo where like I'm constantly amazed at some of the tests that are coming out.

One that I thought was really cool. Is we have a process called an RFP that we go through with our partners, our shipper partners, where we essentially bid on the on the shipments that they have available. And we have a, we had a team that started doing experimentation on those bids.

So when they would come in from a big shipper, we would randomize the response that we gave. So we would have some prediction of this is what we think this particular shipment is going to call. When we put it on the market, we would supply a premium to some of them, a discount to some limit in an Uber premium to some of them and send those back and then, and do an experiment there.

And that was awesome. And actually like extremely impactful. And it's just a creative thing that you have to do exactly. As you said, when you don't have a lot of budget to build some like amazing really complex system.

Guido: [00:23:18] Yeah, exactly. Do you guys also do a I'll find testing?

Chad: [00:23:21] We do. Yeah, we

Guido: [00:23:22] do. So do you have some examples?

Chad: [00:23:26] Yeah, pretty much all of our examples from, so we've, we actually have a, an internal tool that we call kingpin. And kingpin is essentially an ops. It's a platform for the operations team, so that they can see, winner details about all the shipments and when they're arriving and when the orders are going out and whether they're on time or not, and things like that.

And one of the issues there is that the team is who manages those tools are constantly trying to make improvements to them so that they can make our operations team more efficient. But like I mentioned before, if you make a change to that to that platform, then you need to actually make it for the entire operations team because ops team members, not only do they communicate with each other really closely, but you'll need to know.

For example, if somebody took a particular action on a shipment and then it went back into the system and then it came back and now an ops personnel has to take another action. You definitely don't want to run an experiment where some things are visible and other things are invisible. It'll just confuse ops people.

If the SLPs  are pretty significantly different and you're sitting right next to somebody, you're looking at a screen and you need help with something and you look over and their screen is something different because they're getting a different version of that. So these are the types of offline experiments where we really have to think more like a lawmaker.

Lawmakers running experiments on laws is actually quite common where somebody will say we have created a law and we are going to ship it in one part of the country for a certain period of time. See what the effect is, and then roll it out everywhere else. That is an experiment. And that is a type of experimentation that we try to do relatively frequently.

Guido: [00:25:11] Now w what are the main differences when you went on to set up an offline still digital, but offline experiment, not the web experiments for, as a WebEx?

Chad: [00:25:20] I would say that the big differences are typically you can't control your you can't control your randomization as much, right? When you're online, you have a very tight control over which user gets to see which thing.

And in the real world that may not always be the case. So you have to come up with creative ways to bucket your control and treatment. The example I gave a second ago, what we did was we put everybody before the time that we launched the experiment into the control. And everybody, after we launched the experiment into the treatment, and then we use the time that the feature launched as the dividing line between those two things.

And then we had our two buckets and we could actually compare. And there's some other mechanisms that you could use to take away bias and this type of stuff, but like just that level of thinking is really important. And there actually are times when you can do randomization offline. So when I used to work at subway we would very frequently roll out new sandwiches and new features to a trial population.

And you just have to be specific and very careful about which stores you roll out to. Often, because subway is a franchisee restaurant. Like you have, it's franchisee owned. There's no, no sort of ownership on the subway side. So that means in order to, to trial a new sandwich, you have to opt in.

And the people who are opting in are obviously not necessarily the same stores as the one we're not opting in. So those are the types of things that you have to think about when you're experimenting offline. Yeah.

Guido: [00:26:54] Interesting. Yeah. So is there if you do that and do you also do experiments that are not even digital or do you like it.

Chad: [00:27:05] We haven't done too many of those at convoy. That is something I'm very familiar with. I think there's a lot of there's a lot of the, they're not e-commerce but just regular commerce, yes,

Guido: [00:27:17] it does exist.

Chad: [00:27:18] Those still exist, but in anyone, in any place where you have an offline store, so that was definitely the case with Sephora.

And it was also the case with subway. There was a pretty good amount of totally offline experimentation, where we had almost no insight from the digital perspective, what was going on. And at somebody in particular, they experimented on everything. They would experiment on like opening times and closing times and staffing and how they arrange the actual ingredients to build the sandwich and things that they try to shorten the lines or potentially make the lines longer.

All types of stuff like that they ran experiments and really running the experiment is relatively trivial. If you're able to get your randomization, so if you're able to figure out that, how do we bucket control and treatment in a way that's trustworthy? There's a ton of stuff that you can do after that.


Guido: [00:28:13] And for all these experiments and also at the offline and online where do you guys get your, basically your inspiration? And that sounds a bit random, but where do the ideas for these experiments come from? Do you do a lot of user research or those surveys, or I can imagine that's also a bit harder to do when you don't have the number.

Chad: [00:28:32] Yeah. It's. It's yeah, it can be a more, certainly more challenging. I think the good thing about these types of offline experiments is that your customers are definitely a little bit more approachable in the sense of you have some insights into their day-to-day activities that perhaps extend beyond the web.

The challenge in digital is I have no idea who that person is that just bought something.

Guido: [00:29:03] Any anyone I didn't beat to be by definition, you have a business relationship. You basically know who you are.

Chad: [00:29:09] Yeah. You know who your audience is. Hopefully if you've built up a pretty good rapport with them, then you can, it's easily easy to reach out to them and talk to them.

And they're friendly.

Guido: [00:29:19] Hopefully. But that's so that's basically the main way you guys get to get input for experiments or,

Chad: [00:29:26] yeah I think user research and a lot of direct talking to customers is a lot of where that comes from. I think there are certainly places that are more analytical.

So for example, something that I know the operations team thinks about relatively frequently is emails. And when you're in ops, you have to respond to a lot of emails coming in from shippers emails coming in from chuckers. And if you build instrumentation into your platform or you're using a third party service that is able to collect, and track all these emails for you, then you can see where people are spending their time.

And if you have some type of categorization mechanism, You can see the specific type of emails that they spend a long time replying back to. And when are they getting replied back to, and what's the Delta between when the email is received and when it's responded to, right? Like you have all this data, and that's another sort of inspiration point, obviously not just emails, but any other place where you're able to capture sort of the behavior of whatever your customer is.

That's another place that we've seen experiment inspiration come from.

Guido: [00:31:03] And now looking back, you've worked for convoy for a year and what's your second VR going to be? Like, what are you going to focus on? Yeah.

Chad: [00:31:10] So our first year, my first year at convoy was really about. Making the entire experimentation funnel, I guess you could call it of really simple, really usable, really straightforward.

So when I was there, it was possible to deploy an experiment, but analysis was pretty challenging. We didn't really have this bring your own algorithm component. That's something new that we added. We didn't have the ability to automate a workflow, which is another really big thing. That's basically being able to say, I want to do something really interesting.

So maybe I want to start my experiment at 1% traffic. I want to create bunch of monitors that check whether or not some metrics are failing. And then if the answer is, if everything looks good, then I want to roll that up to 500, check the monitors again, then roll it up to 50% and start computing a totally different set of metrics.

Maybe for the first few days, I only want to look at those super short-term metrics that are really good indicators of whether or not it's experiment, failing. And then after that, I want to start looking at more longer term metrics. And there, there is a cost component there. A lot of people don't see that because that's abstracted away by the AB testing company.

But when you're doing these large scale computations of data, that really in essence, that's what you're paying for.  You're paying for all of these joins across, hundreds of thousands or hundreds of millions of data points. So there, there is a cost component there that we try to affect positively.

So that was 20, 20 and 2021 is more about analysis. So we've solved most of that problem. And now it's going to be, how do we get data scientists out of the business of spending so much time doing debugging and like doing this type of segment discovery. So what segment actually performed well and what didn't and trying to automate as much of that as we can.

Guido: [00:33:03] What do you think you spend in a year?

Chad: [00:33:06] Yeah, I think we're in a pretty good place. I think into the year I think most of the big analytical insight automation work is done. And I think the next big challenge is how do we start meaningfully reducing the time that it takes to run experiments.

That's going to be my big sort of post 2021 goal is with small sample sizes. How can we cut the runtime before you can make a decision by 50%?

Guido: [00:33:37] Okay. Are there on there KPIs that you look for the experimentation program itself?

Chad: [00:33:42] We look at a few things. The big goal for us right now is templatization.

So how many experiments get run that are templatized? And at one point we did have like pure numbers of experiments. It's still something that we track and we report on, but it's not our key goal anymore. And the reason we're focusing on templatization is we can say that when someone runs an experiment, that was essentially prebuilt, they're just changing the content.

There's a lot of time savings happening. I think the scientist doesn't have to select the metrics. They don't have to choose the algorithm. They don't have to set the sample size or do the workflow. So there's like several hours to more than a day of savings there. And we're just adding all that up.

It's like a productivity enhancement for the business.

Guido: [00:34:27] Okay, but then almost by definition, those are not the most creative experiments, right? Because there's a template templates for them or,

Chad: [00:34:34] Oh no. Th so the template really only applies to the design, but the actual experiment itself is usually pretty open so they can experiment on anything.

But you may want to say, for example yeah, I'm doing some crazy new change, but there's actually a pattern for testing that type of crazy change. And I just want to use that like measurement pattern and that algorithmic pattern.

Guido: [00:34:57] Okay. Fair enough. And you've been to podcasts before you've been to a CRO events before sat on panels before.

Maybe you've touched on that in the past hour already, but what do you think. The insights. What are the insights that you have based on what you've done with a zero before that you don't see a lot of other zero zeros have,

Chad: [00:35:20] Oh yeah. Good question. So I think, okay, so there's a couple of answers here.

One thing, and probably I contributed to this, honestly, but I think that people get too worked up about the statistics. I think they care too way too much.  When you were running an experiment, the real question that you should ask from  what statistics do I use perspective? Shouldn't be like, which one performs the best in simulations or whatever it is.

That's fine. But the question is would a different algorithm have resulted in a different, a practically different result? And if the answer is no, and the vast majority of cases, it is no, that it doesn't matter. Who cares? It's a waste of time to think about that's one thing.

Another thing I think I already talked about this incrementality thing that I really disliked before, but I think using Oh, I got one. I have a good one, actually. So this is something we haven't really talked about yet. The a big, I think something that CRS do, which is not great is use the results of an experiment as a forecast for the future is that's to say, this experiment made $50,000 over the course of the test and I ran it in two weeks.

And so if I continue to run it for an additional full 50 weeks at a hundred percent, then I should make, 10 million or whatever the number is. That's not right. And the reason it's not right is because a statistical the outcome of a statistical test, the P value is not predictive. It's actually backwards looking right.

It's showing you, this is what happened at some point in the past. It doesn't say anything about what's going to happen next month or the month after that or six

Guido: [00:37:14] months after that? I think, or maybe hope that a lot of shit, especially as do know this, but they are forced almost by managers or of their team or company or whatever to do predictions to say.

Okay. What is the value that you've added as a CRO, especially as the Orsiro team. Yeah, exactly. So  what are better ways for them to do this? Yeah,

Chad: [00:37:38] I think there's too, there's sort of two philosophical perspectives that you could take, right? You could take the perspective of  as a experimentation team or as a CRO team.

It's actually not our job to add a whole bunch of money bottom line to the business. That's like not what we're here for. You could say is what my real job is to go and think about really interesting things that we could be investing in new products, new features, new, whatever it is and validate those offline.

So validate those with our customers and validate those through user testing. And then experimentation just becomes a mechanism like a guard, like a, basically a guard rail to say, is anything going wrong here? So that's one way that you could use experimentation. That's a hundred percent valid and I've actually seen quite a few businesses that are starting to think about AB testing in this way, when they don't have predictive capabilities and the way as a team that you justify that, as you say listen, we actually caught.

20 out of 30 things that were bad. And we know that this would have harmed the business in some meaningful way because we caught them in a parallel universe where we didn't have AB testing, we would have deployed them. So that means that you could say, we know for a fact that this thing would have heard us by $10,000 or $20,000 or whatever it is.

And we don't know exactly what the pain would have been over a long period of time, but we know that it probably wouldn't have been 10 or $20,000. So we're actually saving the company money. And the more experiments that we do, the more money that we, we can say that's one perspective you could take.

The other thing that you could do, if you really want to show ROI is invest in a forecast. And invest in a forecasting predictive metric. This is actually something that we've done at convoy as well. And essentially what you do, and this does require some data science slash analyst effort. So it's not cheap and it's not necessarily certainly not free, but it could be cheap, I guess it depends on who's on staff.

But essentially what you do is you can build some type of predictive model that takes in a set of short-term KPIs. And it looks at your data and looks at how those short-term KPIs influence longer-term KPIs. And you could say based on how the market is fluctuating and based on the performance of our tests over long periods and based on how these metrics typically move together, we think that this experiment is going to deliver a 3% revenue uplift over six months.

And then you'll still, you're still going to be wrong, but at least you're going to probably be a lot less wrong than if you're just using a P value as a forecast.

Guido: [00:40:21] Yeah. That's great insight, I think. Thanks. So of a final thing I wanted to talk to you about experiments, not expert.

Yes. Yes. That's the website. Yeah.

Chad: [00:40:34] What is it? So this is new a few months ago I realized that I have talked to a lot of people in the space about experimentation and there's I think in general, a lot of these problems around AB testing are a lot more complex than sometimes people can think.

And learning about them from a high level is great. But when you actually go to implement, there's a lot of problems that stuff breaks. It doesn't work the way that we expect we make there's there's I think Luke w from Google made a post this is a while ago, maybe over a year ago where.

He was taking the stuff out of AB testing. He said, these are the results that CRS say AB testing has delivered, but here's the actual sort of business trend. And they're in like diverging directions,  and those those are the types of things that happen.

Cause there's just a lot more complexity that the people realize. So I started thinking. Since I've been in this complex space for awhile and I've solved a lot of those challenges. It would be cool to provide some of that information back and to work with teams and potentially individuals and consultancies on like really going deep theoretically and scientifically.

So that's what that website is about.

Guido: [00:41:49] So people can hire you as a, as like a mentor or are you actually working with them?

Chad: [00:41:55] I probably don't have the time to do the actual work. I wish that will be really fun, but yeah, as a mentor or working through particular use cases or problems, whether it's like culture related or science related anything experimentation related is stuff that I love and would be willing to, and always able to talk about.

Guido: [00:42:14] Awesome. So if people are interested in that, the link is in the show notes of this episode. Yeah, my final question then who should I invite for an upcoming CRO CAFE episode, who is something who has something to tell about a topic that you think that's really interesting.

Chad: [00:42:29] Yeah. So I definitely think Jeff Ferris, if you want another person from convoy, Jeffers would be somebody great to have.

Jeff is a economist. He formerly was at Amazon and he was working with the experimentation team at Amazon for a really long time and his specialties and this type of a long-term prediction and making predictions about the world and also an experiment. So there's some really cool stuff that he's done in the past.

Another person that I think would be really interesting to talk to. I don't know if you've had Jonas on Jonas man, his last name escapes me, but he's building an AB testing tool himself formally at booking. And we have a bit of a different approach to the AB testing sort of thought process and mentor Jonah's album.

Sorry, Jonas but super smart guy. Great to talk to knows a lot about experimentation, AB testing and just a wellspring of knowledge.

Guido: [00:43:30] If you can introduce me to both then that would be great. And then we'll invite them for an upcoming episode. Absolutely. Thank you so much, Chad.

Our time is up. Unfortunately. Thank you so much for sharing all your insights. I think there are, there's a lot of knowledge in there and as we can definitely use and hopefully apply it to our own zero work. Thanks. Awesome. Thank you. Thank you. Bye-bye. And this concludes the season three episode two of the  podcast with Chad Sanderson.

Make sure to check out the show notes on the zero fate websites. Next episode I talk with El Tina Vaughn veer. She's head of experience and insights at engineers. And we're talking about a why humans using data, especially in these times of the crease. The human contact is so important. Talk to you then, and always be optimizing.

View complete transcript

Here, help yourself to a cup of CROppuccino

Join our mailinglist to find out when special episodes go live, what awesome industry events are coming up and to get exclusive offers from our partners.
You'll get an e-mail roughly once a month and of course, you can unsubscribe at any time
(but like coffee, this newsletter is addicting, so you probably won't want to...)

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.