Investors can obviously improve their returns by buying stocks that perform well. It’s also a good idea to avoid stocks that do not. Avoid bad investments, make good ones, its all so simple.
Let’s define a good investment, in this case a stock, as one that delivers above market returns over a defined holding period. It makes sense to define good and bad investments this way. We could just buy the market and get the market return, but we have decided to try and beat it thorough stock picking, so we better be good at it.
Lets them imagine there are 1000 stocks in an index of which 250 will turn out to be good investments as and 750 will turn out to be bad ones. The bad investments are those that deliver below average returns over the holding period.
An investor is going to wade through these thousand stocks and buy the ones that they think will be good investments and reject the ones that are though to be duds. Since no investor is perfect, some good stocks are not going to be bought, and some bad ones will be. Some good stocks are going to not be bought, along with the happily ignored bad ones.
Assuming there are no restrictions on portfolio size, which means each stock in the index is assessed as either good (buy) or bad (don’t buy) is it better to improve picking winners or avoiding losers to improve portfolio return?
This article is a more technical repeat of a previous article which a reader might like to take a look at to get a taste of what is to come.
Lies, damn lies…
We will be borrowing heavily from statistics, particularly medical statistics to answer this question, so I thought it worth including a primer on the subject. Feel free to skip forward if not interested.
A patient is unwell. Their doctor thinks they have disease X. There is a test for disease X so they use it on the patient to see if they test positive for disease X. The test will report positive or negative. But, the test is not always right. It might say positive when the patient has the disease, but also when they don’t. If the patient does not have the disease the test will report correctly that they are negative for it, but occasionally it reports positive. Thus there are four outcomes true positives who test positive and have the disease, false positives who test positive but don’t have the disease, false negatives who test negative but do in fact have the disease, and true negatives who test negative and do not have the disease.
| Disease present | Disease absent | |
| Positive test | True positive | False positive |
| Negative test | False Negative | True negative |
Say we test a thousand people for the disease, and we know for sure that 400 of them have it, We know that the test reads positive 80% of the time when we test someone that actually has the disease. This is the sensitivity of the test. Our test will pick up 320 of the diseased people, but miss 80 of them.
| Disease present | Disease absent | |
| Positive test | 320 | 120 |
| Negative test | 80 | 480 |
We also know that the test reads negative 80% of the time when we test someone that does not have the disease. This is the specificity of the test. So, our test will pick up 480 of the non-diseased people by reading negative, but it will mislabel 120 non diseased people as having the disease.
If we take the 320 true positives, and divide that by the sum of the true positives and the 80 false negatives we get 0.8 or 80%, which is unsurprisingly the sensitivity of the test. The specificity of the test should also be 80%, since we set it up that way, and if we take the 480 true negatives and divide them by the sum of the true negatives and the 120 false positives, we do get 0.8 or 80% as expected.
…and medical statistics
There are a few more statistics to consider. One is the tests positive predictive value (PPV), which is the probability that following a positive test result the tested person will truly have the disease, and is calculated as true positives divided by the sum of the true and false positives. Also, there is its negative predictive value (NPV), which is the probability that following a negative test result that individual will truly not have that specific disease, and is calculated as the true negatives divided by the sum of the true and false negatives.
Defining the PPV and NPV allows us to make some interesting observations. If we have a test of fixed sensitivity and specificity (80% for each) but apply it to different groups of 1000 people with say 400, 200, 100 and finally 50 diseased members, we see the PPV decreases whilst the NPV increases.
| Number of diseased people in 1000 | Number (%) True positives | Number (%) False positives | Number (%) False negatives | Number (%) True negatives |
| 400 | 360 (32%) | 120 (12%) | 80 (8%) | 480 (48%) |
| 200 | 160 (16%) | 160 (16%) | 40 (4%) | 640 (64%) |
| 100 | 80 (8%) | 180 (18%) | 20 (2%) | 720 (72%) |
| 50 | 40 (4%) | 190 (19%) | 10 (1%) | 760 (76%) |
To understand why, we look to the calculation of PPV and NPV. PPV is calculated as true positives as a proportion of both true and false positives. As the number of diseased people in the population falls from 400 to 50, we see the number of false positives for every true positive go from 0.33 to 4.75. We are shrinking the numerator of the PPV equation—the true positives—faster than the denominator—true positives plus false positives—so the PPV falls by definition. It may be clearer to think of it like this: As the disease becomes rarer, the chance of anyone person selected at random having it decreases, it becomes like looking for a needle in a haystack, so given its more likely you are looking at a healthy person right off the bat, a positive test result is more likely to be a false positive.
| PPV | NPV | |
| 400 | 73% | 86% |
| 200 | 50% | 94% |
| 100 | 31% | 97% |
| 50 | 17% | 99% |
A similar argument can be made for why the NPV rises. This is calculated as true negatives divided by the sum of the true and false negatives. As the disease becomes rarer, the number of true negatives for every false negative climbs from 6 to 76, and we have the numerator of the NPV calculation increasing faster than the denominator. A randomly person is much more likely to be healthy when the prevalence of the disease is low and a negative test result is more likely to be a true negative.
Thus NPV and PPV depend on the prevalence of the disease, so when I described their calculations earlier, well, they are calculated as described routinely, but also with formulas that use the sensitivity, specificity and population prevalence if the sample sizes in the positive (Disease present) and the negative (Disease absent) groups do not reflect the real prevalence of the disease,
Finally there is the accuracy of a test calculated as the sum of true negatives and true positives divided by the total of everything (true and false positives and true and false negatives).
Before we move on to applying this to investing, a quick side note for the curious. One might wonder how a test sensitivity and specificity can be calculated. Take sensitivity, for which you need to test people for a disease, get results, then decide if the results are true or false positives based on whether they have or do not have the disease…but you are seemingly determining their disease status on the results of the test for which you are interested in calculating the sensitivity of. Well, in practice you have have a gold standard test or tests which always detects the disease if present or rules it out if not.
Think of a proposed test for lung cancer that relies on detecting some molecule emitted by tumour cells and present an afflicted persons breath. You would have a group of people that are known to have lung cancer because they have symptoms and signs of the disease, imaging that highly suggests it and have had a bronchoscopy and biopsy and a pathologist has identified lung cancer cells in the biopsy. Then you have people with no symptoms and signs, negative imaging findings and bronchoscopy and biopsy results showing nothing but normal, healthy lung cells.
You use the breath test on the all the people, record the positives and negatives, then compare the results to their actual disease status and you calculate the sensitivity and specificity.
Positively predicting value
Before the aside into medical statistics we defined a good investment as one that delivers market beating returns over some period of time, and a bad investment as one that returns less than the market average.
Let’s say we have some investment process for picking stocks we fancy as market beaters and rejecting stocks we don’t. We pull up a list of FTSE 100 stocks, apply our process and end up with a short buy list. We buy those stocks and then wait for a year.
We will be able to get the index return over the year fairly easily. It will then be straightforward to work out which of the stocks we picked beat that index return. These are the ones predicted to be good investments which actually turned out to be so. These are our out true positives. Our false positives will be the ones we picked that did not beat the average.
There is another longer list of all those stocks we did not pick. We can work out which ones beat the average. These are our false negatives, Those that we did not buy and did not beat the average are our true negatives.
We have four possibilities. We predicted the stock to be a good investment and bought it, and it either is, if it beats the average, or is not, if it underperforms it. We also have those stocks that we did not think would be good investments, which we did not buy. They can either be good investments or not as well.
| Turns out to be a good investment | Turns out to to be a bad investment | |
| Predicted as a good investment | True positive (good decision) | False positive (bad decision) |
| Predicted as a bad investment | False Negative (bad decision) | True negative (good decision) |
Now, these measures will not be as robust as those for medical tests, nor have the same applicability in practice, but we have enough information now to calculate the sensitivity and specificity of our investment process, and its positive predictive value (PPV) and negative predictive value (NPV).
How many stocks beat the index
An example usually makes things clearer. We used our investing know how to buy 20 FTSE 100 stocks, and reject 80 of them. We did not buy 80. A Morningstar report reckoned that of the largest 1,000 stocks, only 20% outperformed their indices between 2011 and 2021. Let’s run with that and say that of the ones we bought, 15 beat the average and 10 did not. Of those that we did not buy 5 beat the average and 70 did not.
| Turns out to be a good investment | Turns out to to be a bad investment | |
| Predicted as a good investment | 15 | 10 |
| Predicted as a bad investment | 5 | 70 |
Just based on the raw numbers, it looks like our investment process is pretty stellar. But, we can get a little more clarity by finding the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and accuracy of our process to be:
- Sensitivity 75%
- Calculation: 15 / (15 + 5) = 0.750
- Specificity 88%
- Calculation: 70 / (70 + 10) = 0.875
- PPV 60%
- Calculation: 15 / (15 + 10) = 0.600
- NPV 93%
- Calculation: 70 / (5 + 70) = 0.933
- Accuracy 85%
- Calculation: (15 + 70) / (15 + 10 + 5 + 70) = 0.850
That sensitivity figure tells us that out of all the stocks that turned out to be good investments, our process designed to root out those stocks identified 75% of them. Sensitivity relates to type I errors. Formally, this is when a true null hypothesis is rejected. Informally this is when we say a stock is a good one and it turns out to be a laggard.
When our process said a stock was going to be a bad investment, it was highly specific being right 88% of the time. Specificity relates to type II errors. These occur when the null hypothesis is accepted as true, when it is false. In stock picking this would be when we predict a stock will be a bad investment, but it actually turns out to have be a good one.
The positive predictive value tells us that 60% of the stocks the process flags as good investments will actually be good investments. Of the stocks deemed to be poor investments, 93% of them will turn out to be duds according to the negative predictive value.
The accuracy of the test tells us that the processes got 85% of the good and bad investments in the correct good and bad buckets.
Miss the top stocks or avoid the duds?
Let’s say we can make some tweaks to our investment process. One tweak we can make will increase its sensitivity from 75% to 85%, the other will increase its specificity from 88% to 93%. For some reason we can only make one of the tweaks, so we try the sensitivity tweak, and move into the next financial year
The fictional FTSE 100 index we are using always has 20 stocks that beat the average and 80 that do not. Our new process categorises 17 of the market beating stocks correctly which we buy. It misses three which we do not buy. Of the 80 duds the process mistakenly labels 10 as good investments which we buy because of its 88% specificity.
| Turns out to be a good investment | Turns out to to be a bad investment | |
| Predicted as a good investment | 17 | 10 |
| Predicted as a bad investment | 3 | 70 |
Howe about the specificity tweak. Well in that case we end up buying 15 of the market beating stocks, but only 6 of the duds.
| Turns out to be a good investment | Turns out to to be a bad investment | |
| Predicted as a good investment | 15 | 6 |
| Predicted as a bad investment | 5 | 74 |
Now lets put it all together and calculate the specificity, sensitivity, NPV, PPV, and accuracy of the three investment processes and tabulate the results.
| Original | +10% Sensitivity | +5% Specificity | |
| Sensitivity | 75% | 85% | 75% |
| Specificity | 88% | 88% | 93% |
| PPV | 60% | 63% | 71% |
| NPV | 93% | 96% | 94% |
| Accuracy | 85% | 87% | 89% |
Look at what has happened to the PPV. Improving sensitivity by 10% increased it from 60% to 63%. Bumping up specificity by 5% increases it from 60% to 71%.
We should care about the PPV of our investment process more than any other metric. Its the hit rate of our stock picking game. It reflects the proportion of stocks in the portfolio, that were picked for being good investments, that actually turn out to be good ones. Thus, is a typical stock market, where the prevalence of good investments (as we have defined them) is around 20% investors are better off improving their ability to identify and reject bad investments, without restrictions on portfolio size. We also get a bigger boost in accuracy from a relatively smaller increase in specificity.
Why is this so? Because there are more bad stocks then good. If you picked at random you are far more likely to end up with a bad stock then a good. So, you better get better at recognising that stock as a bad one or otherwise you will end up with a portfolio full of them.
Portfolio returns
There is an index of 1000 stocks of which 200 are good and 800 are bad. The good stocks return 10%, the bad 2%. The market return is 3.6%. There are six investors. One has questionable skill in picking winners and avoiding losers with a specificity and sensitivity of 50%. Another is a bit better with the same specificity, but a sensitivity of 60%. yet another has a specificity of 60% and sensitivity of 60%. Another investor has a specificity and sensitivity of 70%, one has a sensitivity of 80% and a sensitivity of 70%, and out last investor has a sensitivity of 70% and a specificity of 80%.
How will these investors fair? Well, lets see and build their portfolios. The table below shows the results. Note that Se50 means sensitivity of 50%, Sp60 means specificity of 60% and so on.
| Se50 Sp50 | Se60 Sp50 | Se50 Sp60 | Se70 Sp70 | Se80 Sp70 | Se70 Sp80 | ||
| Portfolio | Good | 20% | 23% | 24% | 30% | 40% | 47% |
| Bad | 80% | 77% | 76% | 70% | 60% | 53% | |
| Remainder | Good | 20% | 17% | 17% | 11% | 7% | 9% |
| bad | 80% | 83% | 83% | 89% | 93% | 91% | |
| Portfolio return | 3.6% | 3.85% | 3.91% | 4.44% | 5.20% | 5.73% | |
| Portfolio “alpha” | (0.40%) | (0.25%) | 0.31% | 0.84% | 1.60% | 2.13% |
The portfolio return and “alpha” is what we are really interested in here, but the composition of the portfolio and remained (stocks not bought) are of interest. Starting with the Se50 Sp50 investor improving sensitivity by 10% improves return, alpha and composition, but not as much as improving specificity to 60%.
The Se70 Sp70 investor has better return metrics and composition than the Se50 Sp50, Se60 Sp50 and Se50 Sp60 investor. But, improving their return is better served by increasing the specificity to 80% and leaving the sensitivity unchanged.
So there we have it. Investors should get better at avoiding bad stocks rather than improving their ability to pick stock market winners