Spurious correlations and Big Data

I have been a Time magazine subscriber for decades. And while I generally enjoy being informed, I also read Time knowing that they have a definitive ideological slant, which is very evident in their over-reporting of certain topics. So I read Time carefully, knowing that I do not agree with all their viewpoints, and also that I should not believe everything I read. Time is also not generally known for graphical excellence—in fact, you will easily find Time being used as fodder for examples of how not to do charts. And I also find it amusing to look at their charts and infographics, although I must add that there are definitely times that Time does get it right (and in all fairness to them, it is unfortunately so that when they do get it right, we don’t find people singing their accolades).

I am, as I have indicated in an earlier post, doing a lot of work with Google Trends data. One of the biggest challenges with this data is the problem of dimensionality or overfitting, which, simply put, means that when we have data on masses of predictor variables, we are bound to find some which, by chance, correlate well with our variables of interest—in other words, the more dimensions we add, the more likely we are to run afoul of spurious correlations.

I have just again browsed through some of Tyler Vigen’s hilarious spurious correlations, like the 0.99 correlation between the number of lawyers and suicides or the .093 correlation between per capita cheese consumption and people who died by becoming tangled in their bedsheets. (Which in itself raises some interesting questions, like “How do between 400 and 900 people in the US get it right to get so tangled up in their bedsheets each year?” Perhaps the statistic includes infant deaths, which would be very tragic).

In fact, it seems you can “prove” anything with research these days, including, for example, that intelligent people are messy, cursing insomniacs (also here), who like trashy movies, and appear to be lazy (but note, again, how popular media slants research findings to support what they want it to say, not necessarily what it does say).

I also unearthed an old issue of Time (1 August 2016) to see this: Political foods.

   Time 2016-08-01 Vol188 no5 p17a  Time 2016-08-01 Vol188 no5 p17b  Time 2016-08-01 Vol188 no5 p17c

You can also read more on this page and this page.

This has to be one of the most spurious of all spurious correlations, and is a good example of overfitting. They have a big data set (Grubhub’s data). They choose 175 dishes. Then for each dish, they calculate the correlation between the number of orders and the two percentages of Democratic and Republication votes. That’s two correlations for each dish, for a total of 350 correlations. Presumably, each district is a data point: This page notes that they ran “correlations between the share of orders for 175 dishes and the average share of votes going to Democrats or Republicans in each district.” And then they just chose those dishes that showed the highest correlations.

Here, it is much more likely to say that certain foods enjoy more support in certain geographic areas (or are more freely available in certain areas, as Time does acknowledge), just as the Republican and Democratic parties also enjoy more support in certain geographic areas, but it is one of the most trite deductions imaginable to claim that one is more likely to eat certain foods because of one’s political affiliation (or, heaven forbid, that the food you eat determines how you vote—Time suggests, hopefully in jest, that “Of course, we all know that eating a hamburger makes you more likely to vote Republican”). the correlation should at least make sense. For example, certain religions have certain dietary prescriptions, and so an association between religious affiliation and dietary preference makes intuitive sense, but political affiliation being directly associated with dietary preference just makes no sense. Indirectly, maybe (e.g., if certain religious or cultural groupings tend to hold a certain political affiliation), but not directly.

Remember that correlation is only a measurement that shows the degree of association between two variables (not even agreement, as Bland and Altman pointed out). So remember that correlation does not prove anything. That also does not mean that nothing is proved by correlation. The main point is that correlation should be correctly understood. It only shows the association of two variables. But there must at least be some understandable link between the variables, and, more importantly, all spurious variables must be excluded.


Giving feedback in an Excel worksheet UI

I am creating a tool as part of my work at the University of Sydney to extract data from Google Trends via die Google Extended Trends API for Health. Really an exciting project, and a tip of the hat to Google for making this awesome tool available.

Part of the project is getting multiple samples of the data from the Google servers (I am hoping to get a publication out soon about the why and how of this kind of sampling), and then aggregating them (just calculating data-point averages, really), and charting them. Since most of the output will end up in an Excel workbook, it just made sense to put the whole thing in Excel.

The first part of this is creating a UI with which the user can specify the extraction that needs to be done.

I have built a worksheet-UI, in similar fashion to that recommended in the Excel developer’s “bible”, Professional Excel Development. Part of the UI involves giving feedback when input is still required or inappropriate. These feedback messages can, of course, be given in different ways. Using VBA, a message box can tell the user what is needed, but the problem with that approach is that message boxes can be cancelled, and the input still not corrected. So it makes sense to display on-sheet messages which persist for as long as the issue exists, and these can easily be done using in-cells formulas. The question then becomes, where to put these error messages.

An early version (slightly condensed) of the UI looked like this:

Figure 1    Early version of UI error messages

The two salient points from this figure are firstly that there are error messages in several places (below the API_Key, although no error is currently displayed, below the Target, and in the Messages block). This can clutter up the UI very quickly, so perhaps it is best to put all error messages in one location, like the Messages block. But that brings me to the second point, which is that each cell in the messages block contains one error-checking formula. So in Figure 1, the first four error messages are not displayed (because there are no errors), and the fifth is. This just looks weird. Of course, I could write my error checking formulas to display some kind of “All’s well” message for each formula, but my general aim is to reduce clutter, so I prefer to have the blank cells than a succession of “All’s well” messages.

Figure 2 gives an idea of the structure of the worksheet, showing some of the error checking formulas. It can be seen that I have made extensive use of named ranges to simplify things, and that most of the formulas use a simple IF function with a blank string (“”) displayed if there is no error.

Figure 2    Error checking formulas in UI

So, I was a little dissatisfied with the look. If some issues are addressed and others not, then unsightly gaps are created between the various messages.

What I wanted was a list of messages, where the various issues are displayed top-down. That means that each cell in my list (range) of error messages must be able to display any of the error messages, or the list of error messages must “magically” rearrange themselves so that the displayed messages are at the top of the list.

This is how I went about doing it.

  1. The first thing to do was to create an off-area range where each message is displayed—off-area meaning a range of cells that will be hidden when the finalised version of the workbook is released. So I cut the original list of error messages and pasted them in cells $N$16:$N$23 (columns K to XFD will be hidden in the final version), as shown in Figure 3 (which actually shows the end result, but that is useful for understanding the process).

Figure 3    Moving the error checking formulas

  1. I added a single “All is well” message. One is fine, so long as it appears right at the top. In cell $N$24 I used this formula:
    {=IF(SUM(LEN(N16:N23))>0,””,”All inputs are completed. Data extraction can be commenced.”)}
    For the uninitiated, the braces {} indicate that this formula is an array-entered (or CSE) formula. You do not type the braces, but enter the formula and then press Ctrl+Shift+Enter (hence the CSE alias) to “transform” it from a regular formula to an array-entered formula. It essentially looks for zero-length strings in all the error message cells, and if so, returns the “All’s well” message shown.
  2. To figure out which of the error messages to show in the original message area, I added some formulas in columns L (with the heading “Index” in Figure 3) and M (with the heading “Display position” in Figure 3). The text descriptions in the “Refers to” column (Column O) are not used, and are just for my own reference, so that I can debug or modify the formulas later on.
    Column L has the formula =COUNTA($N$16:$N16) in cell L16, and so on down—i.e., L24’s formula is =COUNTA($N$16:$N24).
    Note that the first part is entirely absolute ($ signs before the column and row references), but that the second has a relative row reference. Thus, as it is copied down, the row changes. Essentially, it is counting the number of rows from the fixed starting point if $N$16, and using that as a means of “numbering” (or indexing) my error messages. Simple, really.
  3. Next is the magic bit. The formula in column M that tells Excel whether to display the error message or not, and if so, in what position. It again uses an absolute starting point, with a relative row end: =SUMPRODUCT(IF(LEN($N$16:$N16)>0,1,0)*(LEN($N16)>0))
    Essentially, what this formula does, is count the non-blank error messages. The first part, IF(LEN($N$16:$N16)>0,1,0), basically says, “If the cell length is longer than zero (i.e., not a zero-length string), then return a 1, but if its length is zero, return a zero). The second part, (LEN($N16)>0) returns True or False to the statement “length is greater than zero.” But True, in Excel, equals 1, and False, in Excel, equals 0. So when the position is multiplied by zero (i.e., when there is no error message), it returns zero. When the position is multiplied by 1 (there is an error message), then the position is returned. Perhaps not that simple, but it works. Figure 3 shows that the 1st, 3rd, 5th, 7th, and 8th error messages will be displayed in the 1st through 5th positions.
    the formulas (partly) are shown in Figure 4.

Figure 4    Error message index and display position formulas

  1. On the off chance that I will be adding more error messages, it makes sense to create a dynamic range name that encapsulates my list of error messages, with their indexes and positions. So I added a range name MessageBaseList with this formula:
    =’Query specification’!$L$16:INDEX(‘Query specification’!$O$16:$O$30,
    COUNTA(‘Query specification’!$N$16:$N$30))
    I somewhat arbitrarily used row 30 as the end point, as that is near the end of my visible interface, and I judged it unlikely that I would be adding more error messages than that!
  2. All that remains is to use a formula in my original message area to look up the error messages that have display positions (i.e., those values in column M greater than 0), and return them.
    Again, I use the absolute anchor with a relative end to allow the formula to adjust as it goes down:
    This formula essentially self-determines its position in the message display area using the COUNTA function, and then uses the INDEX function to check if there is an error message with the corresponding position, and if so, displays that. Again, perhaps not quite that simple, but it works!
    Note that the message area is essentially E16:E30, although I merged cells to get enough space to show the messages. Generally, I would advise against merging cells like this, but the layout and the fact that I had to make provision for up to 30 search terms (the limit allowed by the Google API) meant that I really had to make the best use of the space I had at my disposal. The solution works, so I will settle for that, but I do confess that it would be simpler if I had not merged cells.

The basic concept is that the error messages that are active will be displayed, from the top down, in the display area. I can finally add my single “All’s well” message to indicate when there are no more errors, which does help with feedback but still minimises clutter (Figure 5).

Figure 5    “All’s good” message

I will be publishing the complete tool when I am done, and I will update this blog post with a link when I do, so that you, dear reader, can see the effect in action!


Not all keyboard shortcuts are created equal

I thought I would do something different today for IKSD. Like so many things in life, one may find a whole spectrum of keyboard shortcut users. Let me elaborate:

First, there are those who don’t (but, I would say, who should). These are the people who use the keyboard to type letters and numbers. For everything else, there’s a mouse. Want to move the insertion point one character to the left? Use the mouse—there’s no need to bother with tapping that pesky little left arrow on the keyboard when you can reach over, drag the mouse to where you want it, click, and then move the mouse pointer out of the way!

Then there are those who discover one or two keyboard shortcuts that they find really useful, but never really venture into the realm of learning more. These are very often the Alt-set (not the alt-right!)—those who use Alt+xxx to enter special characters not available on the keyboard. You will often find a petite list of Alt codes pasted onto their PC box or the bottom of their monitors. They may even get someone more knowledgeable to help them create some custom shortcuts for certain special uses.

The next level is those who start learning keyboard shortcuts, and use the most basic set for certain tasks. These are the ones who learn about Ctrl+C/V/X (but somehow never seem to grasp the Cut/Copy/Paste—XCV keyboard layout connection) and some others (Ctrl+P, Ctrl+O, etc.). They generally only use system-wide shortcuts that work in all their programs for the most common tasks. And perhaps some program-specific exceptions like learning about Shift+F3 in Word (but don’t ask them what it does in Excel [it’s Insert Function, by the way—one I never use). These all, are in reality (or is that metaphorically), like the laity in the world of keyboard shortcuts.

After that, we move to the functionaries. Even here, we find different levels of progress, just like one progresses through degrees at university or coloured belts in karate. But all of these are the people who have embarked on the journey. The journey of self-improvement, or (before I start sounding like a movie-martial-arts-mystic), the journey of actively discovering and using more shortcuts. However, I recently came across, and while I didn’t try it, I like the idea. Here, you can envelop yourself in the misty mountain, just you and your keyboard, no mouse to be heard lowing in the fields outside, and become one with the keyboard. Ok, I’ll stop now.

The point I want to make is that, while I readily admit that I am not the supreme grandmaster of keyboard shortcut-dom, there is a sure sign that someone has progressed beyond the level of just being an initiate, but is well on the way to becoming a master. And that is when they start understanding, not only that different programs have different shortcuts (that’s really for those who have just moved past initiation), or that certain shortcuts may do different things in different programs (getting into the coloured belts now…), but that even the same shortcut might work differently in different contexts within the same program (surely this must be black belt knowledge by now?).

Blackbelt keyboard shortcuts to master

I want to illustrate just a few of those, and the tool I will use for this is Excel, since it is probably the program I use that does this the most. I suppose I could list a number of others, but this is more of an effort to illustrate the point, than to provide a comprehensive list. Here is a simple workbook that you can use to practise these shortcuts and see the differences. It contains a small data range, duplicated across three worksheets (Normal sheet, Filtered list, and Table). Also, outside of that are list of numbers for rows ($J$1:$J$12) and columns ($A$17:$E$17) so that you can see when a worksheet row/column or only table row/column are being inserted deleted. Also, note that the discussion below is as for Excel 2016.

Consider the following keyboard shortcuts:

Shortcut: Ctrl + D

Normal data range

Fills cell from row(s) above. This includes all cell attributes (formatting (direct and conditional), validation, etc.) and contents.

Can fill down multiple rows and across multiple columns (although direction is always to fill down, not right).

Normal data range (filter applied)

When used on a single cell, fills cell from row directly above, even if that row is hidden by the filter.

When used down several rows, fills the top row down to all cells displayed by the filter, but leaves cells hidden by the filter unchanged.

Table (unfiltered)

As for normal data range.

Table (filter applied)

As for normal data range with filter.

Normal data range (and table) Normal data range and
table (filter applied)

Figure 1    Ctrl + D

Shortcut: Ctrl + `

Normal data range

Copies down a cell’s contents from the cell directly above. Only contents are copied, no other attributes.

Can only copy down from one row and one column (i.e., the cell above the active cell).

Normal data range (filter applied)

Copies down the value from the first row above not hidden by the filter (i.e., no longer the row directly above)

Table (unfiltered)

As for normal data range.

Table (filter applied)

As for normal data range with filter.

Normal data range (and table) Normal data range and table (filter applied)

Figure 2    Ctrl + ‘

Shortcut: Ctrl + +

Normal data range

Displays the Insert cells dialog, dynamically choosing between shifting cells down or right to make place for inserted cells.

Normal data range (filter applied)

Does nothing.

Table (unfiltered)

When one cell or several cells across one row are selected, automatically inserts a table row (not a worksheet row).

When and several cells across one column are selected, automatically inserts a table column (not a worksheet column).

When several cells across rows and columns are selected, inserts multiple rows or columns, based on which have more selected. In other words, if a symmetric range is selected (x rows by x columns), then x rows are inserted. If an asymmetric range is selected (x rows by x columns), then x rows will be inserted if x > y, and y columns will be inserted if x < y).

Table (filter applied)

Does nothing.

Normal data range (and table) Table (unfiltered)

Figure 3    Ctrl + +

Shortcut: Ctrl + –

Normal data range

Displays the Delete cells dialog, dynamically choosing between shifting cells up or left to fill up the deleted cells.

Normal data range (filter applied)

Deletes the entire row (but asks for confirmation).

Table (unfiltered)

When one cell or more than one cell within a single row are selected, automatically deletes a table (but not worksheet) row.

When cells across more than one row are selected, automatically deletes the table column (not the worksheet column).

When several cells across rows and columns are selected, deletes all selected rows or columns based on which have more selected (as with Ctrl + +).

Table (filter applied)

When one cell or more than one cell within a single row are selected, deletes the entire worksheet row (but asks for confirmation).

When cells across more than one row are selected, automatically deletes the table column (not the worksheet column). If the deleted column was filtered, then the filtering in the table for that column is removed (i.e., all rows hidden based on the filtering criteria specified for that column are shown).

When several cells across rows and columns are selected, deletes all selected columns (note the discrepancy with Ctrl++). If any of the deleted columns were filtered, then the filtering in the table for those columns is removed (i.e., all rows hidden based on the filtering criteria specified for those columns are shown).

Normal data range (and table)
Table (unfiltered) Table (filter applied)

Figure 4    Ctrl + +

I did not discuss Ctrl+R, which fills to the right, just as Ctrl+D fills down, because its functionality is not affected by context.

Another set which I could get into, but which I will for another day, is how copying and pasting to- and from filtered lists/tables differs from copying to- and from normal ranges. I think for IKSD 2018, I may do a post just on filtered list/table shortcuts, and their idiosyncrasies. If you don’t want to wait that long, you can embark on your own journey of discovery—the truth is, however, not within, but the truth is out there!

These are the first ones I could think of as I sat down to write this post. Are there others that you can add? Let me know in the comments of others that you would add to my list.


Getting back the old Word Spell Checker

If it ain’t broke, don’t fix it. This, I suppose, is easier to say than to do. Still, it is something that I have, on more than one occasion, wished that Microsoft would take to heart. Items from recent memory include replacing the ultra-efficient Print dialog and Print Preview with the ultra-inefficient Print-Preview-and-Print in Word 2010, which is jack of all trades and decidedly master of none. Or removing the Autocorrect option from the spelling error context menu in Word 2013. Or removing the red/green/blue wavy underline colour scheme for spelling/grammar/contextual errors (admittedly, problematic for the colour-blind) and replacing it with red for spelling and blue for grammar and context combined (which I am still not certain works for all types of colour blindness, although that is not my area of expertise). Or the way they keep messing with the most-recently-used (MRU) file list—each version since Word 2007 has seen this change, and from my perspective, the iteration that was most useful to me was Word 2010, now two versions back already.

The problem

Another of these “improvements-as-regressions,” which I am writing about here, is replacing the Spelling and Grammar dialog with the Spelling pane (also in Word 2013, and persisting in Word 2016). Microsoft has had a long history with panes, a history that, to me, seems to have been very conflicted. For example, moving the Mail Merge wizard to the Mail Merge pane, and then obviating the need for it with the ribbon (the pane can still be accessed, though). Personally, I find panes to be, err, a pain!

The Spelling pane is especially irritating.

Compare the pane to the old dialog:


Figure 1    Spelling and Grammar dialog vs Spelling pane

First let me say what is nice about the pane (to be fair). If you open the pane and leave it there (which I am very unlikely to do), then as you move around your document, when you place the I-beam on an error (misspelled word or grammatical/contextual error) then you immediately get the spell checking options for that error. And (in an American accent), the pronunciation of the suggested word you have selected if it was a spelling error (I do not know how this is going to help me make up my mind which is the correct word, but hey, it’s there!). And, in an insincere nod to the disabled, the Spelling pane options have accelerator keys highlighted (e.g., Change) which you cannot use (or at least, I have not figured out how to use them). Yes, I am being sarcastic. The “features” of the new Spelling pane are not really, in my mind, all that great. I could easily get by without them. Really.

What is so bad about the pane, you may ask? Well, it’s not what it’s got that’s bad, but rather what went missing from the dialog to the pane. As with the context menu, AutoCorrect is missing. As is Undo. Microsoft might argue that Undo was not used here a lot, but when I was zipping along through a long document, trying to whip the spelling into shape, then I found that I have often clicked the Change button a bit too hastily (perhaps it’s just me). Having Undo there was really a time saver. I could also immediately go to Options…, which, in all truth, I can still do in the usual way when using the pane. I could also focus only on spelling by turning the grammar checking off with the Check grammar box, which also helped me manage my workload. So yes, I miss those tools. Oh, and the keyboard accelerators actually work. How about that?

The solution

So what do we do when Microsoft throws us a curveball like this?

Do what I did with the Print dialog (you can read about it in my book or see the discussion I have in the Word uTIlities Help file)—take back what is (rightfully) ours!

The nice part about this is that you can decide how fully you want to revert—you can keep the Spelling pane and add the Spelling and Grammar dialog, or you can more fully replace the Spelling pane with the dialog.

The first part of this is to add a macro that will give us the old dialog. So, press Alt+F8. Then, in the Macro name box of the Macros dialog, type SpellingAndGrammarDialog and click on Create (Figure 2).


Figure 2    Creating the SpellingAndGrammarDialog macro

This will open the VBA Editor, with the Sub and End Sub statements for the macro already created. All you need to add is one line of code:

Sub SpellingAndGrammarDialog ()
End Sub

Press Alt+Q to get back to Word and just remember to save the Normal Template.

To add this newly created tool over and above the Spelling pane, just add it to the QAT. To replace the Spelling pane, assign the F7 key to the macro, instead of ToolsProofing. If you really can’t live without the Spelling pane now and then, you can still access it from the Review ribbon tab (modifying the ribbon to point to the macro is beyond where I want to go with this post). Now, to get back to real efficient spell checking, just press F7!


Announcing Word uTIlities 2.0

Well, it’s been two years since the last version of the Word uTIlities was released. Note: that means no new versions released to the adoring public constituency (all ten of you J) but not, by any means, no programming being done on my side: Considering the amount of code, I have almost doubled the work done for the first versions. But now the Word uTIlities are being released in an update that represents such a reworking of much of the core of the tools that I have decided it is time to increment the main version. So welcome to Word uTIlities 2.0!

What’s new in the Word uTIlities?

Word 2016 compatible

First up, all the tools have been tested and work with Word 2016, 32-bit and 64-bit.

New dialogs

Thanks to code posted by Dean Kinnear, I was able to capitalise on changes to Word 2013 to give users of Word 2013 and 2016 better message boxes. So if you are using Word 2013/2016, some of your dialog boxes will look like this:


Instead of like this:


They might even look like that in Word 2007 and Word 2010, depending on some system settings, as I have tried to keep things consistent across versions.

New reports

The reporting tools have been totally redone. Three reports are generated at present: The style summary, the list of bookmarks, and the list of index entries. In fact, the three separate reporting tools have now been replaced with one central tool that generates consistent reports for all three functions (and will allow me to easily add to the list of reports in the future). Now users are presented with a choice of whether they want the report in Excel or Word. The information-gathering process for the reports has also been streamlined for faster reports, although the report generation is a bit slower initially because the program first checks to see if it can connect to Excel. Once it knows what to do, though, the final report is also generated quicker than in previous versions.

Improved Backup

Even though it was working well, I have totally rebuilt the Backup tool. It is not only faster, but also now no longer needs to close and reopen the document when the backup is being made. The tool also now searches for existing backups and can then automatically determine how to name successive backups. What this means is that when you do your first backup on any given day, the process will be just as normal. If you make the second backup, it will, as in previous versions, ask you whether you want to make an alphabetic-suffix or time-stamp suffix second backup. However, what is new is that from there, it will find those backups, and will then automatically continue with your choice. So if you make a third backup on the same day, then it will automatically use the next in the sequence of the same suffix (alphabetic or time stamp). Therefore, making multiple backups on the same day will be much simpler. It also does a lot of other background stuff, like checking the amount of disk space for the backup file, etc.

I still have more plans for this tool, but that will come in a later update.

Document Statistics

Okay, I really didn’t do much here (i.e., nothing, really). This is a Microsoft Word tool (not mine at all), but it is useful in certain contexts, and it is so deeply hidden that I thought it deserves a little more attention. So all this tool does is open the Word Document Statistics summary.

Style Summary

Thanks to a very insightful collaboration with Howard Silcock in 2014 (Thanks, Howard!), I was able to squeeze a bit more out of this tool in terms of overall speed and functionality. Not much, but generating a style summary is such a laborious process that I felt every bit helps. I also have things that I want to try with this, as I really believe that it can be improved a lot, but I will need to do a lot of learning before I get there.

Numbered equation

Changes Microsoft made to Word 2013 document formats meant that this tool was not well behaved with Word 2013 any more, nor with Word 2016. But that has been fixed. I have had to sacrifice the centred equation (which you still get in Word 2010 and Word 2007), but there really is not much that I can do about it without Microsoft making an about turn on the changes they have made to the document format, which I consider unlikely.

Change Revision authors

Some tweaks to work around idiosyncrasies in Word document structure that caused an error.

Change Comment authors

In a moment of weakness, I forgot that when an input box gets cancelled, it takes a blank string as its argument, which meant that when you open the Change comment authors tool, and cancel it, all your comment authors get removed regardless. This is probably not what should be happening, so I changed the functioning of the tool so that cancelling the dialog, or emptying the input box and clicking on OK, will result in nothing being done. To change the comment authors, enter the new name and click on OK (as in previous versions). To remove the authors completely, enter DELETE in the input box (which is also now the default when the tool is launched).

Fit Hyperlink

Some minor tweaks to improve how it works when used inside a table. Still not perfect, but I’m getting closer….

And what’s in store

I am working on two new tools which I think are really great. These will be released as versions 2.1 (and possibly 2.2). I will give you a hint as to what’s in store. Think revisions and think dates…. Once again, the impossible will be made possible right here.

I also have a number of other things that I would like to include, but it takes time…. I will, of course, post about them here when I eventually get them done.

So, without further ado, go take a look!


Correcting the Windows 10 number formatting settings for the South African region

A while back I decided to bite the bullet and go for Microsoft’s free Windows 7 to Windows 10 upgrade offer.

I did cause myself a bit more pain than I needed to, because I decided it was also time to declutter my pc a bit, so I went for a full format-and-install (although the other PCs in my home I just put through the normal Windows Update upgrade routine).

As I started settling in, I soon realised that there we things about Windows 7 that I missed (like the Recent Places feature in Windows Explorer which has survived several Windows versions, but has been “upgraded” in Windows 10’s File Explorer. But there are also features in Windows 10 that are welcome additions (Virtual Desktops are very nice).

However, when I settled down to getting some serious work done, I quickly realised that there was a serious mistake in the regionalisation settings for Windows 10. Instead of the comma for the list separator (which it has always been in previous versions of Windows set to the South African region, and which, in fact, it was in Windows 7 before the update/upgrade), Windows 10 substituted the semicolon. And the decimal separator, which was always the period, was now the comma. This may seem like a small thing, but it led to all kinds of problems: Comma-separated files were not being imported properly into Excel or Access. Bank statement values were not being recognised, because the financial amounts used a period to indicate the decimal (as they should), Excel formulas now had to be typed with semicolons instead of commas to separate the arguments (it’s amazing how ingrained that habit of using commas was), etc. So I needed a solution.

A quick search on the Microsoft Community site revealed that I was not alone: But that, of course, didn’t help. The big issue was that when I wanted to change the items in question, the values I was looking for were not there. Specifically, in the Customise Format dialog (which I will show soon), the Decimal symbol and List separator boxes show only the comma and the semicolon, respectively, and clicking on the dropdown arrow for those boxes do not reveal the items I wanted (the period and the comma, respectively). I started preparing myself to edit the registry (and I will show this in a moment too), but just when I thought I was stuck, it occurred to me that these might not be list boxes, but combo boxes, and that even if what I wanted was not in the list, I could just type it in. And so, the (anticlimactically simple) solution is to just open the Customise Format dialog, and type in the correct values.

These are the steps: Right click on the Start button, and choose Control Panel (Figure 1).

Fig 1
Figure 1    Opening the Control Panel

In the Control Panel, choose Change date, time or number formats under the Clock, Language and Region group (Figure 2).

Fig 2
Figure 2    Control Panel

This opens the Region dialog. Here, click on the Additional settings… button (Figure 3).

Fig 3
Figure 3    Region dialog

This opens the Customise Format dialog (Figure 4), where you can type in the correct Decimal symbol and the correct List separator (the other settings are fine).

Fig 4
Figure 4    Customise Format dialog

Of course, it is so that you would have to do that for each user on the PC, and if it happens to be a PC with numerous users registered, and you are comfortable with editing the registry, then it may be quicker to correct these values for all users on the registry. To do this, right click on the Start button, and choose Run (Figure 5).

Fig 5
Figure 5    Opening the Run dialog

The type Regedit and click OK (Figure 6).

Fig 6
Figure 6    Run dialog

After asking your for the Administrator’s permission, this will open the Windows registry (Figure 7). Where you need to go is as follows: First, go to HKEY_CURRENT_USER>Control Panel>International.

Fig 7
Figure 7    Registry changes for Decimal symbol and List separator

There, double click on the sDecimal and sList keys, and type in the correct values (Figure 8).

Fig 8
Figure 8    Editing the sDecimal key

Next, go to HKEY_USERS, and then repeat the process for each listed user (e.g., .DEFAULT>Control Panel>International, etc.).

Simple as that.

However, remember these steps, as I would not put it past Microsoft to “correct” these settings during some or other update (although I do say that without any warrant).


Custom number formatting: Getting Excel to show date and day in one cell

If there is one skill you have to learn that will really make a different to the quality of your Excel work, it is learning to use custom number formatting. Here is one example.

It often happens that I have a list of dates against which data are going to be entered, but I would like to see the weekday as well (this will help me with the data entry).

An example is shown in Figure 1. My traditional approach, which is evident there, was to add a column and use the formula =TEXT(B3,”DDDD”) or =TEXT(B3,”DDD”) to show the day.

Date and day in two columns

Figure 1    Column to show day of week

About a year ago, I thought to myself that this extra column was really unnecessary, and that I should try to remove it. My immediate thought was to try and combine the date and day in the Date column, and I realised immediately that the only way for me to do that would be with custom number formatting. I knew that dates are stored in Excel as a serial number counting from 1 January 1900 (albeit with one intentional error—see here as well for some more info, and note Microsoft’s “diplomatic” choice of words!). So, for example, the date in cell B3 (2015-01-01) in Figure 1 is stored as the serial 42005. The display of the date is added with cell number formatting. So then the thought came to me that I could try using two different date-style custom number formats in one cell together, and it worked.

So the date format already added by Excel is yyyy/mm/dd, and I decided to try something like ddd yyyy/mm/dd, as seen in Figure 2, which also shows the result in column B (and I have already deleted the “Weekday” column which was in column C). the ddd format tells Excel to show the cell value in the short day format—I could also use dddd for the full day.

Figure 2    Adding a custom number format to show both day and date

That’s a simple change, but it is very effective. Now I only need the date column, and don’t have to copy down additional values in the next column to keep up to date with it, if you will excuse the pun.


Using keyboard shortcuts for Windows Explorer

In support of International Keyboard Shortcut day, I want to increase everyone’s productivity in using Windows Explorer.

Now I must add that one of the best productivity steps you can make is not to use Windows Explorer! I use Total Commander for most of my file management duties, and there are several (hundred) good reasons why I do. But having said that, if you work on Windows, you cannot really escape Windows Explorer completely, as it is the native file management tool, so most programs, when opening or saving files, simply make use of Windows Explorer for that file management portion. So even if you are working in Word or Excel or a multitude of other programs, when you open or save a file, you are actually using Windows Explorer for that action. And, I have found that some these basic principles described below, with a little creativity, can also be used when doing file management on a Mac (although I am not showing that in this post).

Thus, learning to use Windows Explorer effectively will prove to be a big boost to your productivity (just think how many times you browse to open or save a file each day, and the gains will be readily apparent).

To distinguish between these contexts I will speak, below, of working in Windows Explorer or of working in File Management (some of the shortcuts only work in one of those contexts, or work differently between the two).

I will also not cover “standard” keyboard techniques, like using Tab to move between parts of the window, pressing Enter as the default action, selecting items (files/folders, in this instance) using Ctrl or Shift in combination with the mouse, etc.

One last note. These are specifically for Windows Explorer, meaning they will work for Windows 7, most of them will work for Vista, and a smaller subset will also work for Windows XP. Having said that, apart from the ribbon addition with its own set of navigation tools, most of these functions will also work with File Explorer (i.e., Windows 8 or Windows 10), although I might miss something new from those over here.

Just to help with the explanations below, Figure 1 shows my names for the various parts of the interface, so that you know what I am referring to.

Figure 1    Windows Explorer Interface

Getting there

Charity begins at home, they say. So keyboard shortcuts should start with actually accessing the tools. For many programs (including all the Office programs), these keyboard shortcuts will get you started:

Listing 1    Shortcuts for Opening or Accessing Windows Explorer

File Save As dialog (from most applications)

Ctrl + O
File Open dialog (from most applications)

ÿ + E
Open Windows Explorer application

The basics

Like most Microsoft programs (I hate working in programs where the developers have not included a good stable of keyboard shortcuts), the Windows Explorer interface comes with a good set of keyboard shortcuts, only they are perhaps less easy to discover than the more traditional interfaces.


Listing 2    Shortcuts for Working with the Windows Explorer Interface

Alt + D or F4
Select Folder Address box (Alt + D selects the entry in the box, F4 selects the box and shows a history of previous entries)

Ctrl + F or F3 or Ctrl + E
Select Search box

Refresh the current window

F6 or Tab
Moves among panes in Windows Explorer (Adding Shift reverses the direction)

Alt + P
Toggle the preview pane (one of my favourite “hidden” shortcuts)

Ctrl + Scroll
Switch between views (e.g., List, Details, Small/Medium/Large/Extra Large icons). (I wish there were a keyboard only version of this, so if you know of one, enlighten me. I find the scroll functionality on this particular tool a bit fiddly).

Cancel in Windows File Management

Ctrl + N
Opens a new window in Windows Explorer

Ctrl + W
Closes the current window in Windows Explorer

Toggle full screen view—can also be done using standard Windows 7 navigation tools, ÿ+Up Arrow for full screen (maximize), or ÿ+Down Arrow for normal size (restore)


Listing 3    Shortcuts for File and Folder Management

Ctrl + Shift + N
Create new folder


Delete selected files or folders (to recycle bin)

Shift + Del
Delete selected files or folders (semi) permanently

Alt + Enter
Open Properties dialog for selected item

Ctrl + Click
Selects multiple individual items

Ctrl + Click and drag with mouse
Copy the item (Windows appends ” – copy” to the file or folder name)

Ctrl + Shift + Click and drag with mouse
Create a shortcut (*.lnk) for the selected item (Windows appends ” – shortcut” to the file or folder name)


Listing 4    Shortcuts for Windows Explorer Navigation

Alt + Right Arrow
Follow breadcrumbs forward

Alt + Left Arrow
Follow breadcrumbs backward

Alt + Up Arrow
Moves up one folder level in Windows Explorer

Displays the previous folder in Windows Explorer (i.e., same as Alt + Up Arrow).
Moves up one folder level in Windows File Management

A short explanation of the breadcrumbs is in order. As you navigate through your various folders, Windows Explorer maintains a list of all the locations you visited. These locations are not always in linear succession (i.e., in a succession of child- or parent folders). For example, if you are in a certain sub-sub-sub-sub-folder of an external drive, and you click on one of the shortcuts in the Favorites section of the Windows Explorer folder pane (don’t forget these, they represent one place where the mouse can save you a lot of time, although they can be accessed in about the same amount of time with the keyboard shortcuts already shown). This move will have taken you to a different folder on a different drive. You can click on the Back button (rather press Alt + Left Arrow) to go back to that sub-sub-sub-sub-folder of your external drive). And having done that, you can click Forward (rather press Alt + Right Arrow) to go back once more to Documents. Pressing F4 (not Alt + D will show you the complete list of your previous locations, and you can move up and down that list with the Up Arrow and Down Arrow arrows, and press Enter to select the location of your choice.

Folder pane

These shortcuts work only in the folder pane on the left of the file browser window:

Listing 5    Windows Explorer Folder Pane Shortcuts

Right Arrow
Expands the subfolders of the currently selected folder or goes to the first child if already expanded

Numeric Keypad +
Expands the subfolders of the currently selected folder

Numeric Keypad *
Expands everything under the currently selected folder (don’t do this on C:\)!

Left Arrow
Collapses the subfolders of the currently selected folder or goes to the parent if already collapsed

Numeric Keypad –
Collapses the subfolders of the currently selected folder

Ctrl + Shift + E
Expands the Navigation pane to the folder in which the selected item is found.

More advanced bits

Remember that if you have clicked in the file list of Windows Explorer, you can easily jump to certain files by typing the first letters of a name (you have to type reasonably quickly). So, for example, in Figure 1 I can get to the WordPress folder (and bypass the Word folder) by typing “wordp.” Also, if I wanted to move through the various folders starting with a “c,” I could just press “c” repeatedly with a short break in between.

Figure 2    Selecting an item by typing its name

Then, of course, one of my personal favourites, is typing the destination I want to get it, instead of browsing by a series of mouse clicks. This relies on using the file syntax that derives pretty much all the way from MS-DOS. The first thing you need to know is that you can specify lower-level folders by typing their names. You specify (see below) higher level folders with “..\” and the root with only “\” (all sans the quotes, obviously). You can introduce a succession of lower-level folders by typing the folder names, separated by “\” (again sans quotes).

Listing 6    Text Entries Used for Specifying Higher-Level Folders

Move one folder up (can be combined x number of times for x number of levels).

In Windows Explorer, takes you to the root of the main drive.
In Windows File Management, takes you to the root of the current drive.

Specifies folder ccc which is a subfolder of bbb, which is itself a subfolder of aaa, which is itself a subfolder of the current folder.

The best way to demonstrate the power of this technique is with a short video. I am saving a page from the Daily Dose of Excel blog by printing it as a pdf from Chrome. This is relevant for a tool that I am developing, and so I want to keep this, and look at it when I work on this tool. I obviously have to indicate where the pdf must be saved. I last saved a pdf in another folder two levels down in the Documents folder, so I first have to move two levels up (which I do by typing “..\..\”), and then down to “Computer Training\Excel\Tips – Excel\Macros\.” This is a lot of scrolling and clicking with the mouse. However, note that as I type the address into the file name box, Windows Explorer gives me all folder and files that match what I have typed. I then use this to select the folders I want in a fraction of the time it would take to do with the mouse (on an unrepresentative sample of one trial, 13s with the keyboard vs 20s with the mouse, for a 35% increase in efficiency!). Also note that by typing the folder names, I “lose” the file name, but Windows Explorer “remembers” it, so that when I eventually reach the folder I am aiming for, the name is given back (I can, of course, then type a different name if I want).

Video 1    Using typing to select a folder or file

One last tip. Sometimes, I have to “load” a file to a certain program. As one example: When I using the Spreadsheet Compare tool to compare two workbooks. I already have Windows Explorer open at the location where the file is (actually, I can do this trick much quicker in Total Commander, but this blog post is about Windows Explorer). In my program (e.g., Spreadsheet Compare) I have to click on the browse button and this opens the Windows Explorer file browser at a predefined location—e.g., a default location like “Documents” or the last used location, neither of which is what I want. But I can copy the address of the location for my file (in Total Commander I can copy the file name with its location, which means I often do not even have to click on Browse). I do this either by clicking in the Windows Explorer Folder Address box and copying the full location there (see Figure 3), or I can, of course, eschew the mouse and do it all with Alt + D, Ctrl + C, Esc (and I’m done before you even had a chance to see what I’m doing!). [As a side note, F4 doesn’t work as well as Alt + D for this one.] Now I can paste the location into the Spreadsheet Compare file browser (of course, I jumped there from Windows Explorer with Alt + Tab), press Enter, and jump straight to the location where the file is. Better yet, I can type “\” and start typing the file name, and Windows Explorer will give me the names that match—even quicker!

Figure 3    Selecting a folder address from the Address box

One last trick. You can also apply filters in the file name box. So, for example, in Figure 4 I want to see only the Excel files, not all the other files, so I type *.xlsx and press Enter. This will filter the display and show me only Excel (and only xlsx Excel) files.

Figure 4    Applying a filter in the file name box


A cool browser keyboard shortcut

I am working on a big keyboard shortcut post for 4 November—International Keyboard Shortcut Day in 2015 ;-)—but in the mean time, I made a little discovery today that I felt I should share.

Some time ago, I wrote about learning keyboard shortcuts. And although I often fail, I do still try to practice what I preach (in fact, I have used three already in the first two paragraphs of this post—two for adding the hyperlinks and one for the italics).

In that previous post, I wrote that one way we often discover keyboard shortcuts is by accident—for example, when we press something on the keyboard we did not intend to. You may argue that that makes keyboard shortcuts dangerous—we intend pressing one thing, but press something else, and then we have something bad happen. I counter by pointing out that I am hard pressed to think of a keyboard shortcut for which the handy undo (Ctrl+Z) does not apply. Furthermore, and more to the point, the mouse is also dangerous, if not even more dangerous. I cannot think when I have caused “catastrophic” (i.e., unrecoverable, or at least hard-to-recover) damage with the keyboard, but I can think of several instances with the mouse. How many times have you dragged a file or an e-mail, and dropped it (accidentally, of course) into the wrong folder, and spent ages trying to find your now missing item? Enough said.

Today, I discovered another shortcut like that—I wanted to close a tab in Chrome, and instead of pressing Ctrl+F4, I pressed Ctrl+4 (just a little South of where I wanted to be…). Now when I have multiple browser tabs open (which is all of the time!), I navigate between tabs with the old faithful Ctrl+Tab (or Ctrl+Shift+Tab to reverse direction). However, through this fortuitous mistake, I discovered that when I have multiple tabs open, I can easily jump to a tab by pressing Ctrl+x, where x is the tab’s position number (from left to right). Yes, this will probably not work when you have a gazillion tabs open, but if you have a handful open that will allow you to count at a glance, this will halve the tab navigation time.I also tend to have some standard tabs open in standard positions (GMail, my website dashboard, etc.), so this will help me jump to those tabs very quickly from now on.

A little further testing has shown that the way this shortcut works is that Ctrl+1 through Ctrl+8 take you through the first 8 tabs, while Ctrl+9 handily takes you to the very last tab.

My next step was to immediately test it in other browsers, and I am happy to report that, like numerous other browser shortcuts (e.g., Ctrl+T to open a new tab, Alt+D to select the address bar), it works in Internet Explorer (I’ll test it in Edge later) and Firefox as well. So there you have it: One new shortcut in three new contexts.

So this is one small incremental improvement in productivity (made more so by the multiple browsers in which it works), but the important thing to remember with using keyboard shortcuts is that it all adds up—the gain from using one shortcut one time is minimal, but the gain from using many shortcuts multiple times is huge!


Calculating the mean from a frequency table: Array formula example

Array formulas are difficult to get your head around, but once you start understanding the way in which they work, you will find that there is a lot to love about array formulas (sometimes called CSEs because you have to type the formula and then press Ctrl+Shift+Enter to have it entered as an array formula). Of course, the converse sometimes also happens—once someone “gets” array formulas, they start using them for all kinds of tasks, even where non-array equivalents exist. That is generally not a good thing, as array formulas are computationally cumbersome, and also have other drawbacks, chief of which is that they can be edited and entered with only the Enter button instead of Ctrl+Shift+Enter. Sometimes, this results in an error value, which immediately shows the problem, but other times, the non-array formula returns a result, although that result is different (and therefore incorrect) from the same formula array-entered.

Here, however, is one nice little application of an array formula which shows their power.

It is not uncommon for me, as a statistician, to receive a table like this:

Figure 1    Frequency table

Granted, I tend to work with raw data, not processed data as in the table, but the simple fact is that tables like this do come across my path from time to time.

How, then, can I calculate something like the mean age of the respondents listed in this table? I cannot simply calculate the mean of the age column, nor the mean of the counts column, as the following figure shows (I am using one of Excel 2013’s great new functions, FORMULATEXT, to display the formulas in A16:C16):

Figure 2    AVERAGE function gives totally wrong answer when applied to wrong values

But with a very simple array formula, I can “expand” the data in the table, and then calculate the mean of that (here, FORMULATEXT shows the formulas in C1:C2):

Figure 3    Simple array formula “expands” the frequency table and allows me to calculate the average from the expanded results

I should add, as a side note, that {=AVERAGE(A4:A14*B4:B14)} does not give me the desired results, in case you were wondering why my contrived formula for calculating the average was necessary.

Simple, but very effective. And that is why I love array formulas!