July 27, 2020
| Season

Why statistical knowledge is still crucial when running experiments


Georgi Georgiev


Web Focus LLC


What are the 'gaps' that experimentation tools don't (or can't?) tell us about when running experiments and making decisions?
Recorded during
This is some text inside of a div block.
Audio only:

Episode guest

Episode host

Guido X Jansen

Guido has a background as Cognitive psychologist and worked as CRO specialist on both the agency and client-side. In 2020 Guido won the individual Experimentation Culture Award.
Guido X JansenGuido X Jansen



Book(s) recommended in this episode

Statistical Methods in Online A/B Testing


Please note that the transcript below is generated automatically and isn't checked on accuracy. As a result, the transcript might not reflect the exact words spoken by the people in the interview.

Guido: Can you give us a short introduction of your journey into CRO?

Georgi: [00:01:16] Thank you for the question Guido. Well, it's been a long journey for me. Actually. I started as a digital marketer, doing, SEO, AdWords management and, actually building and managing websites. And I obviously become aware of, analytics very, very early and the utility of it.

And so in the course of my career, I actually transitioned more into this data analyst role over time. But I didn't have like much training in statistical analysis of data of any kind. So I was soon hitting limitations where I will be looking at data and not being sure of how to interpret it. I would so many times forget some trend or some high level change in let's say bounce rate or number of visits from some traffic source.

That would be like, okay, but how do I know this is not like part of the usual of the data that I see on a daily or weekly basis. And even if it isn't part of that, how do I attribute Kohl's to that? How far that connected to underlying causes and that's been a major, right. Well, for me to actually start exploring, exploring statistics, design of experiments and all that.

Just the desire to be able to connect causes sweaty facts or the vice versa. If, if at all possible,

Guido: [00:02:35] I think a lot of people recognize those, those challenges that you have, but I don't think a lot of people like you dive into this sticks. like naturally dive into that, like you did. So w what would make you think, okay, let's, let's, let's just do this.

Let's, let's, let's become the expert, and, and online, and addiction statistic.

Georgi: [00:02:54] Well, for me, it wasn't much of a choice. I mean, once you really look into the details of what can you get out of coral, just observational data. And then when you understand the power of just to rank an experiment, And it is just nowhere near, I mean, it's just a necessity to run an experiment.

If you really need the causal link. And so it's just a non question for me. If you really need solid understanding of what's happening, then you're unexplained moments. Then you'll need to know the stats behind that because, otherwise you would be running experiments, but then the results won't be speaking to your words, they could, I mean, you all will be fooled by randomness as, you know, one book title girls, quite often, even with experiments.

Guido: [00:03:43] Yeah, so, and, and like 15, 20 years ago, we needed to run experiments by ourselves or run statistical models on that by ourselves and, and check all the, variables, and said those are shells, but nowadays we have a, a range of AB testing tools and those tools neatly tell us, this is the winner, this is the loser.

so. A lot of people starting out in this field might assume, okay, this is the winner, there's a loser period. So why do you think it's still important to have a statistical knowledge about what's happening? If those tools just tell us. What's happening anyway,

Georgi: [00:04:20] right? Yeah. I mean, we're really blessed.

Yeah. Yeah. It's, it's a complex question, but I'll try to answer it succinctly. So we're really blessed in your online marketing field and a usability field because we really have these awesome tools and. Or while the browsers are still, you know, allowing some kind of tracking to go through. we, we do have amazing abilities to measure so many metrics for pretty much all of our, I watch that traffic and that's obviously a huge advantage with F over, let's say scientists and psychologists, consumer behavior, or even some of the hard disciplines like physics.

And that's all great, but unless you understand the rationale behind the statistical models and the, th th the design, the design parameters of the experiments that you're running, it's going to be very hard for you to make sense of what you see. And in particular, it's going to be very hard to design efficient tests, because most of the tools out there, and I mean, Even going on, go out to Malaysia and say, Oh, of the two South, they are, none of them are pretty much, give you.

The parameters that you need to design efficient tests, any it's the mystics and the experiments. It's all about efficient use of data, but how do you define efficiency? That's the problem, right. People face and they start off stuff to wander, to ask questions like, okay, how long should I run this test for what?

Shouldn't my level of uncertainty be? What's what's the acceptable and certainty of the result from this experiment. And, you know, these happen. Two basic questions, but then there are really wants to suggest, should I run just one variant versus a control or should I run 10 versus a control just to you don't have a better chance of finding out what works best or, you know, should I be looking at the data on a daily basis on a weekly basis?

Or should I just start and then examine the DNA? And the question just by work wants to start digging into, into it. And we doubt understanding the, the. Reasoning behind the stats, you really will be, subject to this one. you know, One solution fits all problems approach, which really doesn't work. So for example, if we, if you're a startup and you're testing something, which is probably going to be live on your website for six to 12 months, does it really make sense to around with these standards of floods?

Let's say 95 from 99% significance and maybe very high power for a very small effect size. And that test that takes, let's say three or four months, or maybe even six months to run with these parameters. It doesn't really make sense to do that. And at the same time you feel are a big company funny, with like a very, very well established processes, very well established products.

It doesn't really make sense to run with these parameters either because for many tests you would actually need much higher levels of certainty before you can act. and. Yeah. That's, that's something that people easily fall into what you've done if they don't understand the reasoning behind. So they would fall into these, Black box solutions, which are really just one size fits all.

Yeah. You know, even within a company or the, in a company unit or over time, each test will be different. Each test would, require different parameters to make the most efficient way of getting to a decision. And, and unless you understand the reasoning behind, behind it, You know, the two will always recommend the same thing over and over, back at the moment to give you an example, I'm running a series of failure tests with Google optimize and I mean, the first 15 tests stopped within 32 days of a running dose.

So if I'm like the startup looking to make an agile change, I will be waiting for over a month without any kind of decision. And I'm actually planning to let the rest of the test run for whatever, however long they take, just to see, okay, how long would it take Google optimize to stop this test? Because it becomes ridiculous at some point like it unnecessarily long, and that's a product of their imposing their standards on every single test around their platform.

and that's that not, I'm not singling out Google optimize here. That's something that many other tools do. And even the ones that they, that give you some opportunity to alter these parameters. If you don't know what you're doing, how are we going to alter them in any form or manner? You're just going to be like a blind man, trying to feel their way around to try on there at that time.

That's conflict for him. Yesterday's brainstorm was so good. I really liked step's idea of running that test on the call to action buttons, making them orange will really make them stand out. Don't you think?

Guido: [00:09:14] Yeah. Right. Do you want to design real AB test winners and achieve enormous conversion of lifts, then stop, brainstorming and take a scientific approach.

If you can read Dutch, follow the steps, then online influence the bestseller on the menu, even book Delta now and rule in the authors course and become an expert in applying proven behavioral signs yourself. Go to  dot com for more information and free downloads. When doing this when running experiments inherently there's there's There are, there are a non things.

We, we, accept a certain risk that we make a wrong decision. and while managers have rules of proteins or digital marketing themes, they just want answers. It just wants certainty that something will work or doesn't work. So how would you respond to that?

Georgi: [00:10:04] Well, I would say there is a always cost, always a trade off associated with certainty.

And for example, statistical significance are requiring higher levels of certainty there limits your post test risks. So after you implement, let's say you have a winner and you implement, but it's significantly increases the duration of the test. And so it increases the risks during testing. So if you are actually.

What you're testing is actually worse than your control. We are incumbent solution. Then you're going to be incurring these costs over the duration of the test, and then they in a similar manner, let's say, are requiring the higher statistical power. So that's a higher probability to detect actual effects.

Ah, that's also going to have. you know, this limiting effect on your gains. So, testing that basically require you to test for longer just like increasing statistical significance would, and, you know, if what you're testing is actually better than the control, that's a longer time period in which you're not reaping the benefits from that improved solution.

So that these trade offs, they need to be explained to marketers. They need to be explained to higher ups. And I think more, most people have an intuitive understanding of that, which, you know, might be misleading in some cases, but with a little bit of examples, with a little bit of work that can be overcome, I think.

A one to do that are AHS. I think it's great to just run a couple of air tests and maybe not even tell them your colleagues, that these are ETAs. Just say, yeah, we're running these two or three AB tests here. That's thinks ideas for X thing. And then, you know, you can just present them the results and say, okay, what would you like to implement based on these data?

And, when they say yes or no, and then you can say, well, that's actually an air test, so there's nothing to implement. And, you know, situations like that, obviously it's a little bit hard to replicate that. but I think even just looking at the data and, About how it changes over time will help people get this intuitive understanding of the variability, which is not, not so.

And that's what we need for us simply because most times, like we, we live in a world with fixed, properties. So if I drop my phone, the ground is going to fall like 100% of the time. so, you know, we are not used to dealing with in probabilities, but that's something that needs to be explained. So

Guido: [00:12:36] yeah, you mentioned, some people have a, an, an, an innate sense of this and intuition of, of how this works.

which might not actually, be true when we talk about the AB test, but do you think we can each teams or, or the managers of those teams, there's experimentation mindset? Is, is that teachable or is it, is it all the way through?

Georgi: [00:12:58] I definitely think it's teachable. as I mentioned, AA tests, that's one way to do it.

Another is to just examine cases where they didn't experiment, where they just went and went on with it and implemented. And then, they had, horrible consequences. And that happens, I think, often enough that you'll have plenty of opportunities to do that. I've actually shared plants, such example from our own website and just the other day, which had one other, such like a disaster where we lost a lot of money because, we were too quick to implement, based on, I'm going to say it's related to, access policy compliance and we were just, okay.

Yeah. Well, we will have to implement this. we need to be compliant. Bullshit life, you know, there's no, no point of discussion here. And the ones later on, we can get the data. And I think, it's our decline is due to, the virus epidemic situation. But no, it's just our own stupidity. So that's put in briefly.

Guido: [00:14:03] So basically what you're saying is let them make mistakes. That's the best learning. they can probably have.

Georgi: [00:14:08] I think we will be there. Just take things apart. You need to explain how these mistakes could have been prevented with a variation, high probability. If there were tests in place, if there were procedures to test every decision.

And, I think that that's, that's something that people don't really understand when they go into testing most of the time, unless they should. They know the statistical side of things they're doing that. They're testing ideas. But the truth is they're not there. I think particular implementations of ideas.

And even if the idea is great, the implementation could be broken or could be suboptimal and might have no effect or even a negative effect. And sometimes even a bad the idea, like a bad reasoning for why you're doing something might lead to a really great user experience, which actually improve sales for reasons other than the ones that you initially went with.

So I think, Yeah. And ms. Facer and good opportunity to explain how that, that's my same vein.

Guido: [00:15:06] Now that's a good point. You, you, you need a proper hypothesis and you probably need multiple experiments to test to validate that hypothesis rises not, not one experiment. And then, you can fellow date, your RD.

You already know certain or, That your ID is working yes or no. Now it might be that particular implementation that you just happen to use and not necessarily validate that exact ID.

Georgi: [00:15:31] Exactly. So, for example, would you have a background in psychology? You might have like a certain idea of how consumer behavior might be influenced or, assisted in a positive manner.

And, you know, you can come up with a one test, two, one, one implementation, one, A way of changing your website or your checkout experience or whatever to test that, but maybe you've overlooked some other factor they are. So if you're really testing the idea of, this psychological mechanism will help us, then you will definitely need a bunch of tests to try and get the different stages of the website, maybe triangulate with different messaging with different visuals.

just because sometimes it's it's, the implementation. Which is more important than the idea. It has a swamping effect on the actual, you know, there might be underlying through effects and then there might be a negative effect from the implementation. So, yeah, and also sometimes, I mean, Dustin dos are not perfect or sometimes there would be issues with the that's the set up itself.

So even, even the ideas ago, even if the implementation is good, the best is poorly set up. For whatever reason, there are some measurement issues. And you still get biased results, meaning that they don't reflect the underlying reality.

Guido: [00:16:49] Yeah, exactly. and so you worked with, several companies doing this.

Do you see certain companies or verticals or, or people being exceptionally good at this, or way worse at doing this, accepting this mindset?

Georgi: [00:17:06] Yeah, I would say so. I think. Kind of naturally digital companies or data driven companies like businesses, which are data driven by by day by day or core are much quicker to, get on the AB testing train simply because they've had a lot of experience with, observational data and they are keenly aware of the dangers of over interpreting what the data says and, They are very quick to recognize, okay.

If we can run a test, a natural controlled experiment, that's going to be much, much better than just looking at some they can analytics, let's say. so yeah, I think these companies that traditionally have used data or where data is their core business, they are very quick to come to grasp. and then on the other side, I think it's, It's the commerce eCommerce companies then lead gen companies and then publishers and like, yeah, but we should in general, that's the scale.

I'll put it in. simply because

Guido: [00:18:09] that's, that's like an, like an order

Georgi: [00:18:11] in way most proficient and most quick toward the uptake. It testings to least simply because, I think it's partly, I think if. Having the data, the ability to measure or eCommerce websites, it's very

Guido: [00:18:26] version point is completely online, right?

so the, the closer your conversion is, and if it's completely online, the easier it is to adopt that

Georgi: [00:18:33] that gets much easier to come up with metrics, that are both measurable and actionable and speak to the business. bottom line. Whereas for publishers, it's much harder because for example, let's say you're optimizing for pixel rate of a title.

Well, on some level, yes,  is always better, but there are situations where that's not necessarily the case. And, sometimes, you know, there are, let's say, let's say that your most  content actually, Draws in the lift, advertisers. So the lowest quality, so they will have the lowest bid therefore for their arts.

And so you might be increasing your number of impressions by let's say two fold, but at the same time, that decreases your average revenue per million by, By some by some, but for me it was I'm sorry, by some margin, then you might actually be losing money by, by increasing your impressions. So there, there are these difficulties that I think are part of the reason why such companies are usually slower to adapt or data driven.

Guido: [00:19:35] SiteSpect offers a worldwide unique AB testing, personalization and product recommendation solution. Sites paperworks service sites without any tags or scripts, which guarantees an optimal performance, the SiteSpect solution eliminates delays and the chance of any flickering effects. And this approach also ensures that the current and future brows through rules like ITP and ETP, don't make an impact on your AB testing and personalization for more info visit  dot com.

He also see, a difference in this, this kind of risk management management between, established businesses and start. Sure,

Georgi: [00:20:14] sure. Yeah. And I'm adding this, this speaks to my previous point about one size fits all solutions. And so. Let's say you're a startup or you're a much more agile than a well established, mature business with thousands of employees, let's say.

And you can turn on, turn on a dime. you're happy to, to implement high risk solutions simply because, you know, even if it doesn't work, you can just pivot and, Obviously the testing environment, there would be much more agile, much more, you know, the, maybe the risk of missing a true effect would be even, of higher priority than the risk of breaking something, you know, especially in those very early stages, whereas for a mature company, the equation usually goes the other way.

And also another major difference is the period during which whatever you do will persist so forth. The fact of the year, you might have a horizon of six months, 12 months, even we are, but that's, that's rare. So what you do now now will stay and have an effect for let's say any year, but for 'em. You know, mature company, somethings that you do might be there for 10 years or 15 years.

Seven, let's say, I mean, a software from the Microsoft office here. I mean, how many drastic changes do you see there over the past 10 years? Not that many. Right. And so the stakes are higher. And, so the risk is it's large there, and I think that's, that's one of the major differences

Guido: [00:21:49] new, you also see a difference, or maybe do you think there should be a difference in the, in the objectives, of, of how those companies run experiments between for example, startups and more established businesses

Georgi: [00:22:01] objectives?

I'm not sure, not sure about objectives. I think. On the business side, they should be most of the stuff. I think  of the life of a startup. You start measuring different things and caring about different things. Let's say if initially you're all about the position. At one point, you need to transition to retain the existing you as advisors, existing clients.

And that's maybe one of the biggest changes that happens. And it brings with it a whole lot of new metrics and new types of experiments to run. I think other than that, not the objectives should be fairly, fairly same similar.

Guido: [00:22:43] Okay. And, and for, for our startups, I can imagine, like you said earlier, some, sometimes it might be as a startup.

You don't have a lot of data, maybe not enough data to, to run the experiments you want to run. maybe you do. And then sometimes you say, okay, but we're. Maybe previous, like offline research already told you something and you're going to implement something. Anyway. Would you still advise companies then to run a beta test or run an experiment to validate that or if you're going to implement it anyway?

Georgi: [00:23:12] Yes. Sure. Because, just the simple fact that the given approach works for others doesn't mean that it will work for you and for them, or even if it works, speaking to our previous, what we previously covered. Your implementation might be faulty or might be significantly different, even if you don't think it is.

So it might be a simple layout thing or simple cower thing or simple wording. Difference that actually, it was what, what made it work for those like 10 other companies and you're taking their experience, but it's not necessarily the experience. And I mean, I think you know that from psychology experiments where.

Context is a highly, you know, the, some of them are highly sensitive to context. And so our results, we want to replicate under mild variations of the context that don't have to be substantial to have substantial effects.

Guido: [00:24:06] Yeah. I literally, I had a company they wanted, so they were working on a new design of their website and they, as an example, they were a big fan of, one of the larger, Our clothing companies in the Netherlands, I'm doing this online, but they were selling bicycles.


Georgi: [00:24:25] Well,

Guido: [00:24:26] that's it. There's there's a there's. I expect there to be a difference in how people purchase, like, $20 t-shirts or $2,000, bicycles.

Georgi: [00:24:37] Exactly. And also there is the difference between the, consumer oriented versus business to business projects. So for example, one test we did was informed by, okay, can we be more aggressive with our, But it's just a button, the wording, and it turns out we can only go so much because we're a, B to B and it just doesn't really make sense to be aggressive as you will be with some like really, like spur of the moment purchase types of, items.


Guido: [00:25:07] yeah, you just mentioned that, that there's a difference between, how companies are suited to, to run experiments. And, we kind of established that the closer to the metric is your, your business metric is to what you can actually measure the better it is for you. So for e-commerce the purchase is online.

That's something you can measure within your analytics. so that's, that's, that's a positive thing, for, more like news websites. Those metrics are more longterm, or, or even offline. Ah, so it's harder to measure. So you have, metrics, that's tried to indicate a positive, behavior, like click through rate on the, on a title.

like you mentioned, but I mean, you don't know, surely want people to click on your title. You want them to read the article to click on the advertisements and so on. but also even for, for eCommerce, just one purchase might not be the optimal way to optimize, or the optimal metric to optimize for.

So how do you advise companies to, to pick their, the metric that they're actually using software e-commerce for me, it sounds logical to more look at like Lorelei, like lifetime value. If they even know what their lifetime value is from it, for their customers. so there's, there's a wide range of what you can pick as a, as a metric.

The ultimate four can be click through rate, very simple, very straightforward, Donny, a lot of traffic for that on the other ends. And the other end of the spectrum can be lifetime value. but it might take a very long time to run the experiments. Even if I have this, metrics, some companies don't even have that.

so what, what is, what's your advice for companies on, on picking a metric?

Georgi: [00:26:47] I'm a first piece of advice would be to not be, to not be going for convenience. So most of the literature, most of the examples out there, most of the 2000, there are pretty happy to help your calculate compression rates and to run tests based on conversion rate as a primary, My measurement

Guido: [00:27:06] as like the name of our industry.


Georgi: [00:27:09] Exactly. And that's also the metric which clients would be most comfortable with. You know, they will be familiar with it. so convenience and that's something you should avoid. And the reason is, as you said, you know, the clothes, the metric is to your business. Bottom line, the easier it is to interpret it into okay.

Doing, do we actually want to make this decision? Or, do we actually need more data to do it? And we come to the dreaded trade off key word here. So it's a trade off. If you will go by conversion rates, you will know something about the rate at which users purchase, but you won't be missing information of how many items did they purchase, what these items are worth.

And then if you have some kind of, as, as you mentioned, lifetime value model, what's the, No, probably like lifetime value of those, new customers that say they're new and, yeah, with conversion rates, you can go quicker, but you have. less information to act on. So it's actually, the, the, the trade off is pretty, pretty much.

it's a constant thing. There is no shortcuts. That's something I've been trying to instill in my readers for a long time that, yeah, there, if it looks too good to be true, it's too good to be true. If it says that you can run the test in two weeks instead of, two months, Probably trading off something somewhere.

And if you don't understand what it is, you're in danger, because then you might be making decisions based on information, which doesn't really mean what you think it means. And that couldn't be worse than not doing experiments at all in some cases simply because you will make the change, you think, yeah, it's solid or we've tested it.

It works. And then if your observational data following the implementation says, otherwise you will be much less inclined to question that decision because I mean, that's it right? So it's has to be something else.

Guido: [00:29:04] Marketing budgets have suffered. And the sheriff for AB testing has been impacted too. If you want to keep that thing to enterprise standards, but save 80% on your annual contract, you can consider it.

But there's some release you can take advantage of full stack and hybrid features, strong privacy compliance, no

Georgi: [00:29:23] blink and enterprise.

Guido: [00:29:26] Great security. Feel good about your smart business decision infest, what you saved? Becking your CRO program. Check out the www adults.  2020. Am I right to assume that, if you don't have a lot of data, like with most, most startups, you, you, you look at the trade off is okay, well, let's, go for it.

The easier things or the things that we can at least, test for. We can, we can validate for load level. Maybe, we need to optimize for click through rate. And then the more mature you get, the more data you get as a company, the further away, the metric is, in your business life cycle that you, that that's the one you pick, to optimize for each, or you go from, conversion rates, off, off of click through rates.

So conversion rates for maybe your newsletter to conversion rate of, actual orders through actual lifetime value. Is that, is that a right? A simple,

Georgi: [00:30:23] no, I would actually not recommend that. Sorry to disappoint, but yeah, I, I, my, my understanding is that you should use the best metric that you have available, but you should alter your expectation about the uncertainty with which you would need to act.

So if the metric is a, If you can measure average order value or sorry, average revenue per user, then yeah. It's. If it takes your life three months to get to 95% significance. Dan. Yeah, you can see, okay. We're three months is way too long. It doesn't make sense for us to test for that long. Yeah, we should test from just two months, but we will be happy to accept.

Let's say 90%, a significant stress cold as a decision, you know, first score and I think that's the more. The more honest approach, if you will, because this way you're still measuring what you really want to measure. And that's question number one, and only in the second place, do you start specify the design of the experiment?

Which includes things like the statistical significance, the power, I mean, will affect the trust and all that. So. First you should be measuring what really matters to the business. What is easy to interpret without, without doubts, because let's say you achieve an improvement in conversion rate. Well, is it because, is it actually a positive for the business you can't really say, unless you also look at average order value and, Who asked you to do that?

You're actually looking at the average revenue per user, which is just the product of the two. And so what are we doing here? Like you're you're if you do that, you're likely, lying to yourself

Guido: [00:31:58] till you're saying, yeah, you shouldn't, you shouldn't make concessions on the metric. The metric should be, should be pretty much the same throughout your lifecycle of the company.

That that should be. A fixed thing and that the best thing you can measure and that connects with your, with your business goals. but, but do, do those concessions on a confidence level or, significance?

Georgi: [00:32:21] Yeah, exactly. I mean, I think this way it's easier for everyone involved, festers, usability experts, hierarchs to understand what's actually going on because otherwise, if they're smart, they're, they're going to start asking these questions.

Okay. Conversion rate is up, but what about Valor, Charlotte? What about, you know, our revenue? At the end and inevitably your results will be interpreted as if they pertain to the business metric, which are your higher ups care about. And believe me, most of them are not measured by a commercial rate to their immersion by the money in the bank and insult yeah.

The closer it can be the, to that the better,

Guido: [00:33:00] and can also be different for different business units or different countries that you're in depending on the situation. Right. Or depending on the strategy of the company. I, I once worked for a company selling flowers online, and in some companies, sort of sorry.

So in some countries they were the market leader. So then you want to ultimize for profit. you're already the market leader, so that's fine. You don't necessarily want to, massively expand your, Your your customer base, because it's already the largest that there is. But if you're a challenger in the country and country, you might even be fine with losing money for certain, maybe you hope for the next two years, you're fine with, playing break even.

not, not making any profit at all, but is that still the business that needs to recite on those strategies? Not AB testing is not going to give you an answer on it.

Georgi: [00:33:45] Absolutely. Absolutely. And there are also businesses where let's say a time is essential, so maybe you're in the food business or something elsewhere.

If you don't sell the item, it's gone. Basically her op it's an on the loose it's valid completely. So you have different things to consider, but that should all go into the metric like that. That's the idea for Aniko Harvey's overall evaluation criteria. So it should encompass these things. Ideally it could be a composite metric.

It could be one single, you know, number, which, Combines different metrics, but yeah, the margin comes first and then you'll try to estimate it and certainty in the most cost efficient way available to you. And that's, that's where statistics.

Guido: [00:34:34] Yeah. My advice to a digital teams run when picking their metric is usually to ignore whatever the digital team is doing, but, but go over to the finance team or to CFO and ask them what they are looking for.

That's a great advice because then, then you know, what to your company and the management's actually seeing. They can, you can try optimize for that. That's great advice. So you, you mentioned, destined for two or three months. I think that's a, that's a hotly debated topic in this Euro industry. How long to run tests for, so what is your, what, what's your answer to that?

How long should I run a test? I mean, there's limited of course, to, I used to say, we can make concessions to, to the significance level. but do we also make concessions in how long we can run the test? Is there a maximum or a minimum?

Georgi: [00:35:18] These are basically the two parameters that we need to trade off against each other there's duration and a significance level.

So, We can go into all the different costs and benefits that come with the longer tests. So obviously on the cost side, if you're testing longer something which is hurting your revenue, or you're actually losing revenue during the test. And at the same time, if what you're testing is better, you are leaving money on the table.

So would say the longer you're testing. So these are the trade offs during the test duration. And then once you implement accounts exploitation, and let's say you expect from the moment you started the test, to the moment where you expect like a complete design overhaul or something like to, to change drastically for the business, let's say that's three years.

Well, if you thirst for three months, then that leaves you three, two years and, nine months to exploit whatever you find. And if you test for six months, then that leaves you just two years and six months. So that's less time to reap the benefits from, from your test. And these are the basic trade-offs one needs to have in their head.

One thinking about. It has duration. There is no hard and fast rule about that. Actually I've designed the whole, Metadata assessing what's the optimal balance between the test duration and the significant stress called, and you can actually access that as a tool on vine and you can plug in a lot of those business metrics.

and also a few statistical parameters, like let's say the historical or the baseline conversion rate that you expect. Like the usual thing you will do, the plant has statistically and with the business information provided it will, Outputs, you know, an optimal design, so an optimal in the sense that it balances the duration of the test.

We the significance that can be obtained. That is the certainty that can be obtained. And that's my solution to this problem. Yeah. It's far from perfect, but I've not seen anything close to it yet, so I hope it's going to be useful to the Suffolk.

Guido: [00:37:25] Yeah. So there's not necessarily a one size fits all answer to the question.

How long should I run a test?

Georgi: [00:37:31] I don't think anyone can. Can you give us the chance? Yeah.

Guido: [00:37:35] Thank you so much for, for being on the podcast. We're almost running out of time already. I think I, I feel like we could talk for another hour, at least. So you you've been working on, obviously you've written a book you've created a course at CXL you run analytics, toolkit.

What are your plans for the upcoming 12 months? What are you working on?

Georgi: [00:37:54] Actually, I mean, for the past five, six years, I've mostly been working on, okay, what are the best statistical methods that we can apply to online and the testing? How do we do it actually in practice, like, or how we overcome practical limitations?

this. Balancing between the risk and reward, like optimal risk reward calculations. That has been a major thing for me in the past several years as well. And now I think the more important work ahead for me is making sure these ideas can reach the largest base possible, making them easily digestible.

So communications improving the way I can communicate these ideas is actually going to be a major focus for me. Experiments AA tests of blog posts. And, obviously talks like, the one that we just had, I hope would help some people too. No Ella have any, if you have an easier time understanding, these quite crucial ideas.

Yeah. We'll say,

Guido: [00:38:49] and as a final question, do you have any book recommendations for our audience besides of course your own

Georgi: [00:38:55] over the past six months, the bus book, maybe just think I've read has been in, Ronnie Kohavi and, yep. D again, what's what was the third altar? I think, just watch your mind control  is the title highly recommended a lot of good advice, a lot of practical examples from obviously around of us experience and yeah, it's a.

I think for middle level and above, like, you have to have some experience with AB testing to appreciate it. Yeah. If you do, then I think it's going to be valuable for you.

Guido: [00:39:28] Yeah. For, for those that don't know, only worked for Microsoft before and now work for Airbnb still. I think, I mean, Airbnb wasn't, they had some layoffs, but I think Ronnie is still, he's still working there and yeah, his book, we will we'll link to his book in the, in the show notes.

And I think it's available on a Kindle. I don't think it's an audio book. Yeah. It's but I think it's definitely on Kindle and of course the physical form, or you can order that it'll be in the show notes. Yogi. Thank you so much for sharing this with all of us. We'll definitely also link to your website.

Of course, in the show notes. If, if people want to learn more about you and they can reach out to you, I think online, LinkedIn, Twitter, what do you prefer if they have any questions?

Georgi: [00:40:09] LinkedIn is best and thank you for costing me with always been a pleasure and a delight to talk to you today.

Guido: [00:40:15] You're welcome.

Thanks so much. Thanks, bye. Bye. And this will include season two, episode 30 of this hero cafe podcast with Yogi Yogi. Although we started out as a Dutch bolt guest, we are putting out more and more English content. If you want to skip all the vets, go with the group, please go to CRO.CAFE. Let's cut face slash English to see an overview of our English episodes and.

Gripe good notifies about new English content. If you're interested in promoting your product or services to the best heroes specialist in the world release, take a look at Sera, those cafes slash department fishy how we can collaborate next week. We'll have an RA English episode in which we're going to discuss why empathy is probably the most important skill for any CRO professional.

And we'll be doing that together with zero specialists, Armani bolt from the UK block to your next episode and voice.

View complete transcript

Here, help yourself to a cup of CROppuccino

Join our mailinglist to find out when special episodes go live, what awesome industry events are coming up and to get exclusive offers from our partners.
You'll get an e-mail roughly once a month and of course, you can unsubscribe at any time
(but like coffee, this newsletter is addicting, so you probably won't want to...)

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.