Part 1: Introduction – Part 2: Timer – Part 3: Volatile formulas – Part 4: Lookup formulas – Part 5: Sorting source data – Part 6: Table or List? – Part 7: Multiple criteria lookups – Part 8: Sum with multiple criteria – Part 9: Other factors – Part 10: Summary
Part 4: Lookup formulas
So far in this series we have installed the MicroTimer, learned how to identify slow calculating formulas, discovered why volatile formulas can cause slow calculations.
In this part of the series, we are going to start looking at which formulas to use for a simple look-up. All of the data and workings from the tests can be downloaded here.
There are a number of lookup formulas, each with their own advantages and disadvantages. For our purposes today we will not be focusing on which is the best or most versatile lookup formula, but purely on which is the fastest.
The formulas we will be considering are:
I realize that OFFSET is a volatile formula, which in Part 3 we decided we should avoid, but we will include it just to see how it performs in terms of speed. All of the data in our example will be unique, so using SUMPRODUCT, SUMIF and SUM (array) will provide the same result as the other lookup formulas. If the data were not unique we would not be able to use these as comparable formulas.
Setting up the test
I started with 99,999 individual lines of data to represent all the sales of various salesmen. These transactions were based on random data. I removed the duplicate records and totaled the sales for each unique record. This left me with 42,941 unique data lines, which had the same total value as the original 99,999 lines.
From the 42,941 I selected 100 lines at random for the test. I carried out 5 tests, using the MircoTimer, to test how long the 100 lines took to recalculate.
The table below shows the results:
The slowest to fastest formulas were:
|Formula||Average Time||% of slowest time|
Unsurprisingly the SUMPRODUCT and SUM (Array) formulas were the slowest by a long way. These are the slowest because they have to carry out the most internal calculations. The formulas will iterate through every line in the source, calculate a TRUE/FALSE result, multiply the TRUE/FALSE result against the value then SUM all the results – phew that’s a lot of calculations. This means that Excel is carrying out unnecessary calculations to return the value of the single lookup.
SUMIF calculates in less than half the speed of SUM (Array) and SUMPRODUCT. This was a surprise to me, as I was expecting this to be similar to SUMPRODUCT.
INDEX/MATCH and VLOOKUP both performed very well. The time was 20% of SUMPRODUCT and only 2% slower than the fastest. Once these formulas find a matching result they stop iterating through each line of the source. As a result, it can be a very fast formula.
OFFSET/MATCH was the fastest, which was a surprise to me. The time was just 18% of the slowest. As stated in Part 3, this is a volatile formula, which will recalculate every time, even if there are no changes to recalculate. So, whilst it is the fastest formula, I would not recommend using it, as it could cause the rest of the workbook to calculate slowly.
If we are using anything other than VLOOKUP or INDEX/MATCH for a basic lookup then we could be waiting much longer than we need to. Whilst OFFSET/MATCH was the fastest the negative impact of it being a volatile formula is probably not worth the risk.
In the next part, we will take a look to see what impact sorting the data has on the calculation speed of our lookup formulas.