Wednesday, January 6, 2016 - 01:29

This article explains how you can predict the amount a company’s law department spends on outside counsel based on the revenue of the company. Such a prediction can be useful for law firms, for example, when estimating how much of a given company’s expenditures the firm receives (the so-called share of wallet) or when proposing additional work (share of future wallet). For law departments, they might want to bootstrap a comparative analysis of their own spending against competitors who will not divulge proprietary information.

To explain how we make these predictions, we draw on data from the benchmark surveys conducted during the past five years by General Counsel Metrics, LLC. Specifically, the data set covers the outside legal expenditures and revenue for 145 submissions by 105 different law departments in the energy or utilities industries (the power industry, when grouped together). Collectively, the companies reported 2,916 lawyers and $670 billion in revenue; 89 of them are U.S. companies, 33 are Canadian and seven are British.

To predict spending on law firms (and the small portion on other service providers), we use a powerful statistical tool called ** linear regression**. Regression predicts the value of a

Using the data set described above, software generates the linear regression relationship as the boxed ** equation**. That equation enables someone to predict what a particular power department would spend on law firms if we fill in the revenue for that department’s company. The

**External legal spend = $219,900 + (.001963 times revenue) + error term**

So, for a hypothetical billion-dollar company, start with the constant ($219,900), then add to that the product of its revenue multiplied by the coefficient (0.001963): The legal department is predicted to spend $4,162,000 on outside counsel and vendors (typically, law firms make up 90 percent of the external spend). Put another way, adding $1 billion of revenue would be associated with increasing external spend by $1,963,000 (on average).

Enough math. Take a look at the graphic plot with the straight blue linear regression line. In the plot, each law department is a point based on its value for company revenue on the horizontal axis and its external spend on the vertical axis. The plot sorts the revenue figures from the lowest on the left to the highest on the right.

The line is the ** best-fit line**. The software creates it based on

** Confidence intervals** reflect the uncertainty of the regression line, and they show as the lighter gray portion above and below the line. You can be 95 percent confident that the vertical range contains the true external spend for a company with that revenue. If the predictor data is indeed associated with the outcome variable, the more data a plot has in an area, the narrower the confidence interval.

The second plot has a curvy line. This plot uses a variation of linear regression called ** loess** (locally weighted scatterplot smoothing). Each smoothed portion of the line comes from a least squares regression over a small range of values of the y-axis variables (but instead of ordinary least squares, it employs the impressively named weighted quadratic least squares). In plainer words, the software figures out the regression fit line for local regions of the revenue data to produce a more nuanced form than the straight-line regression imposes.

When software calculates the best fit line, it also provides some additional insights. For example, ** adjusted R-squared** tells us what percentage of changes in external spending are predicted by revenue in the model. In our data, it is 41 percent, which means other factors influence external legal spending besides revenue. One of the factors would be the number of lawyers in the department. If we included that figure in our analysis, it would become a multiple linear regression and would improve the adjusted R-squared value.

To rely on linear regression, the data must conform sufficiently to four requirements or we need to take other actions. (1) The distribution of the residuals must be ** normal**. By which we mean that when the residuals are plotted on a graph by count, the shape is reasonably close to the often-seen bell curve (relatively few numbers far to either side on the tails and most of them clustered toward the middle of the distribution). (2) The

One final note: How is ** correlation** different from regression? Correlation estimates the

Now, that wasn’t too bad, was it? You have learned about regression, a tool that enables an analyst to take a set of data, model it and extract predictions based on new data. Moreover, you can visualize the regression results, interpret the reliability of the range of predictions they produce and quantify their explanatory power. That’s quite a gift to bestow on legal managers!

**Rees Morrison**, *A principal at Altman Weil who also leads General Counsel Metrics, which offers the largest benchmark report of law departments ever done. *rwmorrison@altmanweil.com