While a different application than shown here, my project management class taught me one of the biggest benefits of Monte Carlo simulation - estimating uncertainty.
Traditionally, net present value calculations are done with single point estimates. For example, analyzing a rental property we want to buy, we'd estimate the vacancy rate, interest rate, property appreciation, maintenance expenses, and do all of that on a cash flow time-adjusted basis.
There's variability on every aspect of the through, and understanding how much can easily swing your decision.
Traditional methodology is to pull the trigger on the action if the net present value is greater than 0. We put in a variety of estimated factors, ran it through Crystal Ball (simulation software), and found a 75% chance the NPV would be negative. It suddenly isn't the safe investment it looked like originally.
---
In practice, it's tough to get people in business to understand uncertainty. They gravitate towards "the number", which is almost surely wrong all the time.
The real danger is not understanding that this "uncertainty" estimate is a function of your assumptions. How you model the distribution of your inputs is huge, and often not stated clearly.
"Just take a guess - we have to have a number in this box by the end of the meeting". I was literally just told that yesterday. No one else who would look at that number will know it was a totally random guess pulled out of thin air to satisfy ceremonial box-filling. They will treat it as an estimate, and make plans accordingly.
And if you make up a number that is too obviously small or exceeds some unspoken upper bound, you'll be asked to re-estimate anyway. Sometimes the best way to respond to that is to finesse the discussion into coming up with a number that, while it will have no relationship to the actual effort, at least reflects what the stakeholders hope and desire it to be. At that point the team, if they are smart, will examine scope and re-plan to come up with some level of effort that they are confident can be done in the time hoped for.
In other words, get the people who want a number to tell you what number they want, then use your best efforts to scope the effort to one you can be pretty confident will fit.
GIGO, the first thing I learnt, as I entered the industry 20 years ago. This was from a 60 year old engineer who told me that experience is only a nice name for "all the @#$% I made I will try not to make again".
A very nice thing about Monte Carlo simulation is that at the end your distribution of results are all within the feasible range. If you do error propagation using uncertainty on your parameters, you can get non-sense results.
For example, suppose you have a pendulum, you simulate it with error propagation and you have a non 0 probability of having an increase of the energy in the system over time.
It is easy to spot this in such constrained example but with more complex models, this is sometimes pretty hard to control.
> A very nice thing about Monte Carlo simulation is that at the end your distribution of results are all within the feasible range.
I guess Monte Carlo helps provide conservative estimates behind a façade of rigour, but the truth of the matter is that in the end it's still GIGO.
Any empirical distribution only reflects the empirical measures that were used to generate it. If you bundle everything from the time it took employee A to walk the dog while allocated to project Foo to the time it took employee Z to fix a nasty Eisenbug while allocated to project Bar, and Foo and Bar used totally different tech stacks and team members and even approaches to project planning, that distribution is meaningless in estimating, say, how much time it will take employee G to implement a React widget.
I’m smiling a bit while reading your comment. You are completely correct that that estimate is virtually meaningless. And yet… that doesn’t mean it’s completely useless!
I’ve been using a somewhat meaningless Monto Carlo inspired approach to project planning for a while in my consulting company. The vast majority of the projects I do have only a small amount in common with previous projects, so when I’m estimating I’m informed by past estimate vs actual numbers, but not really relying on much other than intuition and gut feelings from past work.
My basic approach is to estimate 2- or 3-sigma upper/lower estimates for each task, modelled (incorrectly) as a Normal distribution in hours, and then sum the random variables to come up with a final distribution (whose variance is quite a bit smaller than what you see on the original bag of tasks). From there, if I am making a quote, I’ll quote at +3-sigma x hourly rate as a high-end effort estimate. If it seems like a very meeting-heavy client, I’ll either add meetings in as development tasks or just pad it out by a %age or fixed hours/week.
This technique has worked amazingly well for me, and it’s been quite rare that I blow the estimate, and in the… one time I can think of, we missed by very little and there were tasks that we hadn’t thought of when we did the estimate.
To your point though (with walking the dog), there’s a really key thing that this process doesn’t capture, somewhat on purpose: the resulting estimate is in effort-hours, not delivery date. While modelling per-task effort hours as Normal is suspect, modelling task delivery dates as Normal is completely irrecoverably wrong: delivery dates, in my experience, only ever slide in one direction. People get sick and the project slips a week; people don’t ever get super healthy and effectively knock out 80 or 120 hours worth of tasks in a week.
I honestly haven’t found a good way to estimate delivery dates very well. At one point I did put together some regression on my “actual billed hours per week” based on my billing, but ran into the same problem. “Oh, my dog died and my mother-in-law got sick during that project.”
Someone who’s better at statistics might have a better way to model that as a high-skew distribution, but when I tried doing that myself I ended up with a distribution that didn’t feel like it did a good job of capturing the non-negligible long tail of things that slow down calendar estimates without burning hours.
The traditional distribution for costs and dates is Beta-PERT. Douglas W. Hubbard uses log-normal distribution. I am a pessimist, so I like Beta-PERT with a fat, fat tail.
> If you do error propagation using uncertainty on your parameters, you can get non-sense results.
Only if you do it wrong, generally. Events which are impossible under some hypothesis should have zero probability under a model for it and not be sampled.
Estimating uncertainty is preferred, but if you have people who insist on a single number, then teach them that it should be the median instead of the mean. Means tend to overvalue "moonshots" where there's a 99% chance you lose money, but if the payoff in that 1% where you win is large enough, it can still result in a positive mean. And a mean can often be an outcome that isn't actually possible, but falls between several options, while a median is always an outcome that can actually happen, and there's ~50% chance you'll get a better one.
It's not just business people. I have yet to find a task tracking system (Jira etc.) that lets you assign a range of points to a task.
People try and use nonsense like Fibonacci numbers to imply uncertainty, but then just add up all the numbers to get a number with no uncertainty measure.
Story points are a curse on the software development industry. However much people say "they're indicative, and don't map to hours", someone, somewhere, will map them to hours.
The most accurate project plan I was ever involved in had only 3 values that could be assigned to a piece of work during the early estimation phase: hours, days, and weeks. Each of those was then turned into a range of possible hours they could represent, with that range expanding as you got into larger units. You could then slice and dice the numbers however you choose to get anywhere between the most pessimistic timeline to the most optimistic timeline. Probably unsurprisingly the project was delivered somewhere between the two.
That seems reasonable, but I would say you don't have to resort to such crudeness. In my experience I know the difference between a task that will definitely take weeks and a task that might take a day or might take weeks. Just let me write that down!
Even if you don't have any idea about the uncertainty we already have a crude way of measuring it - planning poker! Just record everyone's guesses instead of throwing away the uncertainty information. There's a huge difference between everyone guessing 5, and some people guessing 1 and others guessing 20.
I agree about points being stupid though - there's simply no way to avoid it being converted to/from time, because that's the actual unit of work.
Sorry, reading back I definitely come across as disagreeing - I would kill someone for task tracking software which supports giving the degree of uncertainty on an estimate.
> Monte Carlo simulation - estimating uncertainty.
By definition, "uncertainty" is the thing that does not have a PDF ... You can quantify risk based on assumptions you make about underlying probability distributions from which you are drawing. Far too often, even in Monte Carlo, people decide to work with the easy distributions instead of the most appropriate distributions.
The "easy" distributions tend to admit actual mathematical solutions which means Monte Carlo is helpful if you can't do the math but not strictly necessary.
Monte Carlo shines when you cannot get nice solution and there is your opportunity to not be constrained by the need to do so: Draw from appropriate distributions.
Also, know the difference between LLN and CLT and when to appeal to which to justify your methods.
I would think that you could have "uncertainty" even if you do have a PDF. Maybe there is a formal definition of uncertainty that I am not aware of. The PDF can describe the uncertainty for every outcome.
Risk is what you can put a probability on (i.e., quantify). Uncertainty is what you can't. Uncertainty encompasses unknown unknowns. See Knight[1] and Keynes[2].
Good rule of thumb even if it sounds overly simplified.
> This simple example shows how the net present value may lead the firm to take unnecessary risk, which could be prevented by real options valuation.
Even with a monte carlo on NPV, you may still make a more risky decision than needed; as decisions are rarely just "do or dont" but can be sequenced pending more information.
One of the concepts that many people don’t get is differentiation between risk and uncertainty. These same people will then attempt to “model risk” and end up with (ironically) an “uncertain estimate”
They define risk as having statistical noise or a varying parameter in the estimate, while uncertainty is lacking information about the target. Doesn't the presence of noise in an estimate imply that there's information you aren't accounting for? I don't see the importance of the distinction.
Traditionally, net present value calculations are done with single point estimates. For example, analyzing a rental property we want to buy, we'd estimate the vacancy rate, interest rate, property appreciation, maintenance expenses, and do all of that on a cash flow time-adjusted basis.
There's variability on every aspect of the through, and understanding how much can easily swing your decision.
Traditional methodology is to pull the trigger on the action if the net present value is greater than 0. We put in a variety of estimated factors, ran it through Crystal Ball (simulation software), and found a 75% chance the NPV would be negative. It suddenly isn't the safe investment it looked like originally.
---
In practice, it's tough to get people in business to understand uncertainty. They gravitate towards "the number", which is almost surely wrong all the time.