This week, Jeff Skilling received a well-deserved sentence of 24 years for his creativity at Enron. I wouldn’t take the sentence too seriously. Remember that Mike Milken—the Junk Bond King of the 1980s—was sentenced to ten years. After two years, he was diagnosed with terminal cancer, petitioned for early release, received early release, and then had a miraculous recovery. Today, he is hobnobbing with the rich and powerful through his personal foundation.

At least Skilling did us the favor of pushing Amaranth from the popular headlines. But professionals will continue to sift through that blow-up’s aftermath for some time to come. And predictably, the recurring debate about value-at-risk has, well, recurred.

It started when Hilary Till of Premia Capital Management released a damage control piece assessing aspects of the Amaranth case. She assured us that the market move associated with Amaranth’s loss was a nine standard deviation event. If you don’t know what that means, it is statisticians’ speak for “it won’t happen again.”

In somewhat of a rebuttal piece, Chris Finger of the RiskMetrics Group pointed out that Till’s nine standard deviation claim depends on how she chose to calculate one standard deviation. If you calculate a standard deviation using historical data, you will get a different result if you use three months of data or six months. You will get a different standard deviation if you equally weight the data or give greater weight to more recent data.

Finger presented the following chart of the three-year history of a particular natural gas spread Amaranth was exposed to. It also shows the standard deviation of that spread calculated over time using an exponentially-weighted moving average (EWMA). In the chart, the EWMA standard deviation rises over time, reflecting the increasing volatility of the spread. Because it weights recent data more heavily, the EWMA indicates a higher standard deviation in the days leading up to Amaranth’s distress than would a uniformly-weighted moving average (UWMA). With a higher standard deviation calculated using the EWMA, Amaranth’s loss becomes a three standard deviation event rather than the nine standard deviation event Till obtained using (presumably) a UWMA. Anyway, that is what Finger concluded.

Now for some bad news. This is just one example. Before we all go off and implement EWMA in our value-at-risk (VaR) measures, we might want to contemplate other cases. EWMA tends to perform well when volatility gradually increases from a modest level—precisely the case Finger considered. However, EWMA can give a false sense of security if there is a brief lull in the market. In a scenario where there is a sudden market move in the midst of a market lull, EWMA will perform poorly compared to UWMA. I discuss the relative merits of UWMA and EWMA in my book *Value-at-Risk: Theory and Practice*. Honestly, both UWMA and EWMA are crude. We use them only because there are no good alternatives. What is needed is high-dimensional GARCH or stochastic volatility models. The biggest problem with VaR today—and this problem has festered unresolved for ten years now—is the lack of such sophisticated high-dimensional time series models.

What does this mean for users of VaR measures? That depends on what you use the VaR measures for. VaR is just a tool. Like any tool, it is useful for certain things and not useful for others. Screw drivers are also tools. They are useful for driving screws but not useful for driving nails. The next time you hear someone dumping on VaR, ask him why he isn’t dumping on screw drives as well.

One group whose expectations for VaR far exceed its abilities is bank regulators. They want VaR to represent the 99% quantile of a trading book’s ten day loss. Philosophically, that is like asking for the third decimal place of the probability of rain five days from now. There is no meaningful difference between a 12.4% probability of rain and a 12.7% probability of rain. Likewise, there is no meaningful difference between a ten-day 99% VaR of $4 million or $7 million. A ten-day 99% loss is something that happens maybe twice a decade. It is not something you can make precise probabilistic assertions about. I’m sorry if I am bursting any bubbles here. I would be okay with one-day 99% VaR or ten-day 90% VaR, but ten-day 99% VaR isn’t meaningful. We can talk about it the same way we talk about unicorns, but that doesn’t make it real. Of course, bank regulators are in the habit of getting what they ask for, so banks calculate large numbers, dress them up in all sorts of mumbo jumbo about extreme value theory or copulas, and call them ten-day 99% VaR. They even cobble together “backtests” that purportedly “validate” these numbers.

That is bank regulation, but I am more interested in financial risk management—there is a difference. For monitoring market risk, one-day 95% VaR is fine. It is also meaningful—in the same way that there is a meaningful difference between a 40% probability of rain and a 70% probability of rain. Also, for risk management purposes, the actual numbers produced by a VaR system aren’t so important as is their trend. If you calculate one-day 95% VaR consistently from one day to the next, when the number jumps, you know something is going on. If the number gradually increases, doubling over a two-week period, you know something is going on.

Forget about distinguishing between three standard deviation or nine standard deviation market moves. Forget about the third decimal place of the probability of rain. Forget about ten-day 99% VaR. Instead, watch how your VaR numbers change from one day to the next. There is an old saying on Wall Street that is worth retreading for financial risk management: “The trend is your friend.” For monitoring market risk, the absolute level of your VaR numbers isn’t that important. Watch how they trend.

Glyn, why do you criticise Basel’s use of ten-day 99% VaR? Banks have been calculating it for years. They are backtesting their models too. How can you criticise a backtest? It is empirical. The data speaks for itself.

I don’t want to put words into Glyn’s mouth — but if you look at the 95% confidence bands on the 95% VaR and the 99% VaR you will have an idea why he objects. Also, the smaller the tail that you are looking at, the more the Peso problem rears its head. Missing data is a huge problem when trying to estimate information about the tails.

Glyn’s quote:”I would be okay with one-day 99% VaR or ten-day 90% VaR, but ten-day 99% VaR isn’t meaningful.” Well, I used to be responsible for reporting the 99.5th (not 99, but 99.5th) percentile exposure over the next 5 year period. I think all they were really measuring was the model risk of me.

Gosh, 99.5% five-year VaR!? Did they believe the moon is made of cheese as well?

In the blog posting, I described two analyses (by Till and Finger). They looked at identical data, yet one came up with a standard deviation that was three times that of the other. If they expressed their results as value-at-risk figures, one would still be three times the other. Please understand that neither analysis was right or wrong. They were just two different (largely reasonable) approaches to the same statistical problem. There are numerous other sources of variability in VaR results. Perhaps the biggest is calculated correlations. That is a huge can of worms. The problems become most pronounce when you consider extreme cases, such as 10-day 99% VaR. While it is pretty reasonable to model asset returns as joint normal for the 90% or 95% quantile, that assumption is really not reasonable at the 99% quantile. But if we don’t assume joint normality, what joint distribution should we assume? Whatever choice we make will profoundly affect results at the 99% quantile. Also, correlations are known to become exaggerated during extreme market moves. Accordingly, correlations you calculate from historical data will likely underestimate what will be experienced in a 99% quantile extreme market event. There are various ways this might be addressed, but as with the disparity between Till’s and Finger’s analyses, those solutions will produce a wide variety of differet outputs.

What about the backtesting?

You have two reasonable approaches and get two completely different VaR 99%. You apply the approaches to a number of historic dataset and you get — guess what — a number of VaR 99% pairs, with huge differences between the VaR 99% numbers in each pair. You cannot tell which of those is closer to the true one, as you do not know the true one. (One of) the problem(s) with the VaR 99% is that you cannot apply reasonably backtesting to assess model risk…

Clearly, banks aren’t actually backtesting ten-day 99% VaR measures. A bank will only suffer losses in excess of ten-day 99% VaR only two or three times every ten years. That doesn’t give them enough data points for a meaningful backtest. Here is what is actually going on. Basel gives banks permission to calculate their ten-day 99% VaR as one-day 99% VaR calculated with ten-day standard deviations. I know this sounds bizarre. What it does is save the banks from having to account for events that transpire during a ten-day VaR horizon (like options expiring, coupons being paid, floating rates being reset, dividends being paid). That stuff would make calculating ten-day VaR very complicated. Instead, Basel says the banks can calculate one-day VaR but do so assuming standard deviations that you would use for ten-day VaR. Now, such a VaR measure is impossible to validate because it makes no predictions whatsoever about the actual losses the bank will incur. Banks do not (and cannot) realize one-day losses arising from ten-day standard deviations. The notion is meaningless. So what do banks do to backtest their VaR measures? That is between them and their regulators. I suppose different banks do different things. In most cases, I suspect, the banks are validating their one-day 99% VaR measures (i.e. their VaR measures using one-day standard deviations) and representing that as a valid backtest for their ten-day 99% VaR measure (i.e. the same VaR measure but with ten-day standard deviations). There are two problems with this. One is that most banks have inadequate data for even a one-day 99% VaR backtest. The other is a HUGE problem. It is that a backtest of the one-day 99% VaR measure is not a backtest of the corresponding ten-day 99% VaR measure. The assumption that ten-day 99% VaR can be calculated as one-day 99% VaR using standard deviations scaled up by the square root of time is simply wrong. The square root of time rule may be reasonable for calculating 90% VaR. It is not reasonable for calculating 99% VaR. The reason is volatility clustering. If the markets experience a 99% quantile move one day, they are far more likely than is typical to experience another one the following day, and the day after that. What this means is large ten-day market moves are far more likely than the square root of time rule applied to one-day standard deviations would imply. Ten-day 99% VaR figures calculated according to the Basel simplification substantially under-represent the actual ten-day risk. In summary, whatever the banks are doing to “backtest” their ten-day 99% VaR measures is TOTALLY MEANINGLESS.

Glyn, First of all, thank you very much for this most interesting blog. I would like to mention another group of naive VaR users whose expectations are even more problematic than bank regulators: European Mutual Fund regulators. Funds using derivatives extensively for investment purposes (versus “hedging” purposes, whatever this means…) are classified as so-called “sophisticated” products. The UCITS III Product Directive requires such funds to calculate VaR on a daily basis, that is 30-day 99% VaR… VaR is seen by many people in mutual fund circles as a tool to “measure leverage”. More specificially, leverage is believed to be controlled by the requirement that the 30-day 99% VaR of the fund with derivatives may not be higher than 200% of a 30-day 99% VaR of a so-called “reference portfolio” (basically a portfolio excluding derivatives). I’m looking forward to the “sophisticated” discussions of forthcoming accidents with “sophisticated” mutual funds… I would be very much interested in hearing some opinions of “VaR practitioners” coming from the classical banking/trading book-context on what is going on with mutual fund regulations in Europe.

Hi Andreas: Thanks for your interesting post. Based on your explanation of what the regulators are doing, I think their goals are admirable, but we may take issue with some of the details of implementation. These days, it is hard to define exactly what constitutes an “investment,” but I know that a derivative isn’t one. Any fund that purports to “invest” in derivatives could use a healthy dose of regulatory oversight, and it sounds like that is what European regulators intend. By limiting these “sophisticated” funds to 200% leverage, the regulators are ensuring they don’t go out and do anything too “sophisticated” with investors’ money. Bravo! I also like their thinking in terms of leverage. For investors (as opposed to, say, traders) leverage is a better way to describe their risk than is value-at-risk. Tell a pension fund that their 30-day 95% VaR is $20 million, and they won’t quite know what to make of the number. They will wonder “how does this compare to other pension funds that are similar to us … ?” If you tell them that their portfolio has leverage such that it is 120% as risky as the S&P 500, then that speaks to them. Now “leverage” is one of those words that, like “investment” is becoming increasingly difficult to define. It used to be that leverage was about borrowing money and placing it at risk. Suppose two investors both have $1000. One invests it in the S&P 500. The other borrows $2000 and invests the full $3000 in the S&P 500. We say the second investor has 300% the leverage of the first. The leverage was achieved through borrowing. But today, the investor could have achieved similar exposure through a whole variety of mans — repo, derivatives, securities lending, a structured note, etc. Since we can no longer define leverage in terms of borrowing, how can we define it and encompass all these different forms of leverage? What the European regulators have done is come up with a wonderful operational definition: “leverage” is the ratio of a portfolio’s VaR to the VaR of some reference portfolio. I like that! Of course, they are using a questionable VaR metric to implement their definition. I won’t recount all that is wrong with 30-day 99% VaR, having done so already for ten-day 99% VaR. The absolute magnitude of the numbers that will be produced will be utterly meaningless. However, there is some good news. Those numbers may still have some relative significance. Let me explain. Suppose we construct to different VaR measures that purport to calculate 30-day 99% VaR. We apply them both to some portfolio, call it Portfolio A. One VaR measure comes up with a VaR of $400,000. The other one comes up with a VaR of $900,000. No surprises in the discrepancy — with 30-day 99% VaR, the absolute magnitudes are meaningless. Suppose, however, we apply both VaR measures to another portfolio, say Portfolio B. For this portfolio, the VaR measures come up with respective VaRs of $1,200,000 and $2,600,000. Again, the VaR measures disagree on the absolute level of the risk, but THEY DO AGREE ON THE RELATIVE LEVEL OF THE RISK. The two VaR measures both indicate that Portfolio B is approximately three times as risky as Portfolio A. The reason is that, if a VaR measure uses, say, very high standard deviations for key factors, this will inflate the VaR numbers it calculates for all portfolios. Such a VaR measure will still be good at saying “this portfolio is more risky than that one.” The bad news is that this nice property of VaR measures tend to manifest itself more with reasonable VaR metrics, such as one-day 95% VaR, than with unreasonable VaR metrics, such as 30-day 99% VaR. Many people think it is somehow conservative to calculate VaR with extreme VaR metrics that produce large numbers. It isn’t. A VAR metric is just a measurement convention. It doesn’t make any difference whether we calculate distances in inches or meters. People should think of VaR metrics the same way. The choice should be driven by which VaR metric can be most conveniently and meaningfully calculated. I would strongly encourage the European regulators to forget about 30-Day 99% VaR and go with one-day 95% VaR. Also, it is absolutly essential that the “sophisticated” funds use the same VaR measure to calculate the VaR of their investment portfolio and reference portfolio.

Glyn, I agree that the notion of comparing VaR of portfolio containing derivatives to a a reference portfolio without derivatives is an elegant one, I have problems with the idea of “measuring leverage via VaR”. Leverage is essentially a deterministic concept (=a leverage of 4:1 means that it takes an adverse market move of 25% to drive a portfolio/organisation into severe solvency issues), but VaR is a probabilistic concept making a statement about one point located at the far left tail of the profit/loss distribution. The situation is further complicated by the fact that two portfolios with exactly the same VaR (or VaR-to-reference-portfolio ratio) can have very different leverage in the sense of financial gearing. I think what the regulator wants is two things: 1) Keeping an eye on mutual fund solvency with the help of an overall exposure limit of 200% (national implementations of UCITS III usually also specify that derivatives exposure have to be backed by liquid spot market “underlying” positions), 2) Requiring at least “sophisticated” products to publish a market risk indicator (VaR) in order to create at least a minimum of transparency of market risks. Just bouncing thoughts.

This is an interesting topic. I am traveling, so I don’t have time for a reply, but I will write again over the weekend. Thanks for your patience.

Hi Andreas: I think we are struggling with how to define leverage. Ultimately, a definition should reflect how a word is used in practice. In the case of “leverage,” usage seems varied. I don’t have an adequate definition to offer. You offer a definition along the lines of “a leverage of 4:1 means that it takes an adverse market move of 25% to drive a portfolio/organisation into severe solvency issues.” This has some nice elements. It makes leverage a function of the extent to which capital is at risk–without explicitly mentioning capital (which would require defining capital, which is another can of worms). A problem with the definition is the question “25% of what?” What do you mean by a 25% market move? You go on to comment “The situation is further complicated by the fact that two portfolios with exactly the same VaR (or VaR-to-reference-portfolio ratio) can have very different leverage in the sense of financial gearing.” The word “gearing” appears frequently in the British literature but not in the American. I have always assumed that “gearing” was a synonym for “leverage,” but maybe that is not the case. Please let me know. You mention two additional requirements that usually accompany implementations of UCITS III. The first, that “derivatives exposure have to be backed by liquid spot market ‘underlying’ positions,” acknowledges that leverage is not risk, and that two portfolios may have the same leverage but different market risk. I just received a nice e-mail from Hilary Till, and she pointed out a piece by Martin de Sa’Pinto (*VaR at Risk? Models Required, Apply Within*, HedgeWorld, Oct. 31, 2006, password protected at http://www.hedgeworld.com/news/premium/read_news_printable.cgi?section=peop&story=peop2756.) is worth quoting: “leverage can be more or less risky, depending on the instruments used. A 300% leveraged fund that is largely held in U.S. Treasuries runs little danger of running into a liquidity trap … But the same fund invested entirely in illiquid derivative instruments based on volatile underlying assets becomes highly risky … In other words, there is leverage and leverage, and the same level of leverage can produce vastly different levels of risk.” The second requirement that usually accompanies implementations of UCITS III, that “sophisticated” funds publish a market risk indicator (VaR) in order to create at least a minimum of transparency of market risks, is, as we have already discussed, problematic. If 99% 30-day VaR is used, and each fund has its own model, numbers will be all over the place. I suppose the regulator could specify its own standard VaR model to be used by all funds. Or, they could simply require that leverage be reported rather than VaR …

I fully agree that the meaning of leverage is at the very core of this debate (and several other debates). I can also confirm your observation that usage of “leverage” differs a lot. “25% of what?” – a (combined) 25% movement in the “market risk factor(s)” in the portfolio. “In other words, there is leverage and leverage, and the same level of leverage can produce vastly different levels of risk.” – I agree; I am actually about to sort out the issues and also the terminology here… http://www.andreassteiner.net/performanceanalysis/?Risk_Measurement:Advanced_Topics:%26nbsp%3BLeverage%2C_Exposure_%26amp%3B_Risk …it’s a fragment only, have mercy. If time permits, I would like to expand that into an article. “Or, they could simply require that leverage be reported rather than VaR …” – I think reporting both a leverage and a suitable market risk indicator would be the best solution. But maybe I’m the only one how would prefers a fund a with market risk b and leverage c over a fund x with market risk b but leverage 10*c…