One of the things which scares me in Excel is seeing this message:
So many things can go wrong with linked files, yet unsuspecting users tend to use them with pride. “Look, I’ve linked this so that I don’t have to update a cell, isn’t this cool!”. If they knew the risks they might not be so quick.
Let’s explore this area and hopefully by the end you will see why it’s so risky. Or if not, at the very least, you’ll be aware of the possible issues.
How to set-up a link to an external file
Setting a link to a cell in another file is easy, it’s just like linking to any other cell in a workbook. In a cell type the equals symbol ( = ), click on a cell in another workbook, then press return.
The formula will look something like this.
When you close the linked file the address will change to include the full file path of the linked file.
Provided you can remember the syntax, it is possible to create a link without even opening a file. That means Excel files can be read without opening them, pretty cool eh? Though, Excel will ask for the password when updating the link to a protected file.
Having created a link, the next question is how to update it. There are multiple ways.
If both the linking and linked files are open at the same time, the values will update automatically when a recalculation occurs.
Opening the file
When opening an Excel file which contains links to other files the following message will be displayed.
Clicking Update will update all the links in the workbook, whilst Don’t Update will not retrieve the linked values.
Edit Links window
From the Data Ribbon click Edit Links button.
The Edit Links window will open. This window can also be found in File -> Info -> Edit Links to Files.
Select each link and click Update Values.
Single cell update
A single cell can be updated by entering and leaving edit mode:
- Double-click a cell to enter edit mode, then click away into another cell to leave.
- Press F2 to enter edit mode, press enter to leave.
- Click on a cell and click inside the formula bar to enter edit mode, click away to leave.
The link to a closed workbook will be updated when the cell containing the link requires recalculation. The calculation chain in Excel is a tricky concept. Generally, a cell is only recalculated if it contains a reference to a cell or preceding cell which has changed.
The following link will never update through a standard recalculation as it has no preceding cells in the same workbook.
The following link will recalculate each time Cell B1 (or a preceding cell to B1) changes.
Calculations with linked cells
In the previous section, we saw an IF function with a linked cell included. But we can push this further and use a linked file within other formulas and functions.
The following shows the VLOOKUP function with the table_array argument referencing cells in a linked workbook.
Try it for yourself; the VLOOKUP behaves the same as normal; it does not care if the cells are in a closed or open workbook.
Other functions work in the same way, the SUM function for example.
Named ranges are also available. The function below includes a named range called myLinkedData.
The problem with linked files?
As mentioned in the introduction, there are some significant problems with linked files:
The linked data can change location
If a cell is referenced within its own workbook Excel tracks what happens to that cell. For example, if Cell A1 is dragged to Cell B1, any cells referencing A1 will automatically update to B1. Perfect!
If the linking file is closed Excel has no idea when a cell has moved. For example, if Cell A1 is dragged to Cell B1, the file which links to Cell A1 will continue to link to Cell A1 (which is now the wrong cell – ooops!).
The risk can be mitigated slightly by using Named Ranges. However, if it is a colleagues file, what is to stop them deleting the Named Ranges, or renaming worksheets, just because they want to (work colleagues are nice like that!)
Therefore, linking to any file which is not tightly controlled by you is dangerous.
Sub-links are not updated
When some Excel users discover the ability to link to other files, they go a bit loopy; linking everything in sight. It is not long before they have a spiders web of links across multiple workbooks. When a link updates it does not update any sub-links contained in the linked file.
- Workbook A contains a link to Workbook B.
- Workbook B contains a link to Workbook C.
- When opening Workbook A on it’s own the Update links message box will appear.
- Clicking Update will bring the values from Workbook B into Workbook A.
- Nothing happens to the links from Workbook B to Workbook C. These links are not updated, therefore the final values in Workbook A may not be correct.
- For the links to update correctly, all the files must be open at the same time.
Data from last saved version is retrieved
Data can only be retrieved from the last saved version of a file. If your colleague opens a file and changes some values, the link will not retrieve the updated value until the file is saved.
This is not a problem by itself, as that is how I would expect it to work. However, there is no warning message to indicate the file is currently open, so it may not be obvious if the file is up-to-date or not.
Excel caches the external data
An Excel file is constructed in a specific way to be read by the Excel application. When a link is created, the linking file keeps a copy of the data from the linked file.
We will look into the Excel file format later in the post, but for now, let’s prove that the linking file does cache the data:
- Create a VLOOKUP between two workbooks.
- Save both files and close the workbook with the source data
- Rename or delete the Excel file with the source data
The link is no longer valid as the file does not exist, So what happens if you change the lookup value?
Answer: The new lookup value is returned. Hmm…. that should not work, but it does. This means you can keep retrieving values from files which no longer exist!!!
You might see this message.
If you click Continue the calculation will continue to return values from the cached source file.
If Excel caches external data in this way, it means any file may contain data from linked files even if the file cannot be found. Any users receiving that file could access the cache and see data they should not see.
Let’s suggest a HR department sends workbooks with salary information to each individual employee. If there are links to a file containing all the salaries, the cached salary information will be distributed to every employee!
Where is the cached data?
So, where is this cache of data which I keep referring to.
- Create and save an Excel file with links to other files
- Rename the Excel file created in (1) above to have a .zip file extension, rather than a .xlsx file extension
- Open the zip file and find the \xl\externalLinks\externalLink1.xml file.
- The file will look something link this:
The .xml file above displays all of the linked data. If it were salary information we could see that Jack has a salary of $24,000 and Sally a salary of $45,000.
When is it OK to link files?
As you’ve seen so far, linking files doesn’t always behave the way you might think. So, when is it OK? I use the following guidelines:
- I must ‘control’ all the files involved
- There must not be sub-links (i.e. linked files with links to other files)
- Named ranges must be used, not standard cell references
- If sharing the file, it must not contain sensitive data
These will avoid most of the potential issues. But will also significantly reduce the number circumstances when it is useful. Now that we have PowerQuery, it is worth asking yourself if that is now better solution for your particular scenario.
Linking files is not for the faint-hearted. What seems like a simple solution opens up a whole world of complexity, which most Excel users are completely unaware of.