Using Wildcard Search and Replace to switch from manual numbering to automatic numbering

This is again in response to a question from the Word-PC e-mail list.

You have a document where the numbering has been entered manually, and you would like to convert this into a document which uses the different heading styles with associated numbering.

This is a slightly modified version of Example X (that’s ten, by the way) from Application 10.3 (Practical wildcard searches) in my book Doing your dissertation with Microsoft Word.

Students have manually numbered and formatted their thesis before learning about the principles taught in this book which allows Word to do their numbering for them. They want to start using this system (e.g., so as to still be able to use Word to create their table of contents), but cannot simply apply the heading styles, as all the old numbers still have to be deleted manually, and in any case it would be great if Word could actually find all their headings and update them to the styles automatically, without the students having to apply the style to each heading.

The secret is to start with the lowest heading level, hierarchically speaking. Thus, if your numbering is like this, then level 4 is your lowest hierarchical level:

Level

Numbering example

1

Chapter 1

2

1.1

3

1.1.1

4

1.1.1.1

Since the principle is the same for each level, I will only do levels three and higher in this example.

Also note that this assumes that the students very consistently did their numbering—they missed no separating periods, made no formatting mistakes, etc. Every time one of those mistakes is made, it means that the search and replace might miss something. In short, given the human propensity for error, it is best to first make a backup of the document (use my Word uTIlities to make that easy), and then to do a thorough check once the search and replace process has been completed.

For heading level three, the search specification is “([1-6].[1-9].[1-9] )(*^13)”—obviously without the quotes, here and further—which assumes that the students did not ever use more than nine third level subheadings to a single second level heading, the same for second to first level headings, and that the student has six chapters in the thesis. When in doubt, go big and specify it as follows: “([1-9].[0-9]{1,2}.[0-9]{1,2} )(*^13)” which allows for up to a single digit first level, a two digit second level, and a two digit third level (i.e., 1.1.1 through 9.99.99). Very important here is to note the space before the first right parenthesis.

To explain what this is doing, the search specification consists of two groups (as defined by the two sets of parentheses): The first group consists of the number, which is, in the first example, a digit between 1 and 6 for the chapter, followed by a period, a digit between 1 and 9 for the second level heading, followed by a period, and a digit between 1 and 9 for the third level heading, followed by a space. The second group is any text (in this case, the heading text) up to a paragraph mark. The replace with criteria is simply \2 with the formatting specification of Style: Heading 3 (Click on Format, Style in the Find and Replace dialog, and choose Heading 3). This (\2) puts only group 2 (the text and the paragraph mark) back in the replace action. The style is what adds the numbering.

Thus the number is stripped away, and the style applied, all at the same time.

In addition, the formatting for the Find What box should be set to the formatting used for the third level heading. This is actually quite necessary, as without that, the search and replace will also find cross-references in the text to those heading numbers, which is not what you want to do!

So it would look something like this:


The process can then be repeated, with ([1-6].[1-9] )(*^13) for the second level heading, and exactly the same replace with specification, except that now the formatting is set to Style: Heading 2, and of course the appropriate formatting for the Find what box too.

For the Chapter titles, again the appropriate formatting for the Find what box is set, and then the specification is (Chapter [1-6] )(*^13) with exactly the same replace with criteria and with the formatting specification of Style: Heading 1. In this way, an entire manually numbered thesis can be transformed into an automatically numbered thesis in a matter of minutes.

Since you have made a backup of the document, give it a bash, thrash it out, and see if you can get it working. I have actually used this technique on a number of theses already, and while I do concede that it is complex and heady, I can also attest that it is amazingly effective, giving great results and saving hours of manual work. I have sometimes just done a few Find Next operations until I am certain that it is finding what it needs to be finding, and is not finding the cross references, and then, when I feel bold enough, I hit Replace All (easy enough to do if you have a backup!).

Share

Writing and publishing WordPress Blog posts in MS Word

I suppose the first question I have to ask, before getting to the ‘How,’ is the ‘Why’ question: Why write blog posts in Word in the first place?

Why Blog with Word in the first place?

Well, I can’t really recommend Word as a tool (e.g., in my book), if I’m not willing to use it. I am also practising what I preach, and this blog post you are reading now was written in Word 2013 (the first few blog posts on my site were done with the native WordPress tool).

And I must also admit that the native WordPress tool for creating posts is quite nice. And the WordPress team has really gone to a lot of effort to see to it that you can create typographically professional blog posts.

But there are some distinct advantages to using Word, especially if you write the kind of blog posts that I do. I know that there are probably good responses to each of these, but the fact is that they do act as motivators to me, and I believe will also for many (but not all) other users.

Firstly, I can format my text quickly in Word with Styles. Yes, WordPress also has styles, but it’s quicker and easier to create and use new styles in Word.

Secondly, I have much more keyboard shortcuts in Word, and all told, Word generally gets the job done quicker than WordPress (compare adding a hyperlink in the two, as an example). I don’t have time for doing things slowly.

Third, I can work on the blog post while offline in Word, which is not so easy to do with WordPress.

Fourth, and this is a biggie, adding pictures is a pain in WordPress. Each time the picture needs to be added to the gallery, and then inserted into the post. Yes, WordPress has structured it so that you can do this in one smooth operation, but it just takes far too long. With Word, it’s just a case of simply inserting the pictures where I want them. When I publish the post, all the pictures are uploaded to the gallery in one go.

Fifth, I can use automatic numbering features (such as automatically numbering my figures, using cross-references, etc.). If I rearrange things in my post, Word sorts out the numbering for me. That’s no small advantage!

Sixth, I can do more advanced kinds of spelling and grammar checking, etc.

Seventh, and this is another biggie, tables are a cinch in Word, and a pain in most other programs. Having “cut my teeth” on Word, I sort of took doing advanced tables for granted. Word really is great with tables, all my gripes about minor things aside. It wasn’t until I started looking into the printing of that book, and developing websites with basic tools like Google Sites and even FrontPage, that I began to appreciate how far ahead Word is with tables. Want a table like the one in this post? Word should be your tool of choice.

I suppose I could come up with even more, but I think this pretty much covers my main motives.

How to set up the WordPress Blog account in Word

The first thing to do is to set up the WordPress account to which you are blogging.

Microsoft gives some very short and cryptic advice here:

http://office.microsoft.com/en-us/word-help/help-with-blogging-in-word-HA010164021.aspx

WordPress’s basic guide is here:

http://codex.wordpress.org/Weblog_Client

The basic idea is that you have to tell Word where to find the xmlrpc.php on your site, but both of the above assume that your blog is on the front page of your site—admittedly, many sites are set up like that.

Apart from locating the file, you also have to prepare your site to receive the blog posts. If you have WordPress 3.5 or later, this is enabled by default, but for earlier versions, you need to enable the Remote Publishing setting in the Writing section of the site settings. So step 1 should be: Upgrade to the latest version of WordPress.

Step 2, then, is find the xmlrpc.php file. However, my blog was a bit atypical, in that I wanted a static front page, and my blog page is not the main page. This caused some problems, but that helps explain to you, dear reader, how to get yours set up. Also, if you use some file browsers, the file path given makes (to me) no sense. For example, Figure 1 shows the CPanel file browser. If I had to infer the file path from that, it would be /home/trueinsi/public_html/xmlrpc.php. But when I connect with my trusty Total Commander‘s FTP client, I see that it is actually public_html/insight/xmlrpc.php (Figure 2). This equates to http://trueinsight.za.com/insight/xmlrpc.php. The /insight/ part of the location is, of course, the subdomain.

Figure 1    CPanel file browser

Figure 2    FTP via Total Commander

Now I’m getting somewhere. After having found the location of the xmlrpc.php file, the first thing you want to do in Word is to actually open a blank blog post. This is the easiest way to access the blogging tools in Word. Yes, you can add the tools from the ribbon to a new ribbon of your choice in Word, or to the QAT, but since you want to write a blog post anyway, this is just a lot less effort. Note that this setup needs to be done only for the first post. Word then remembers all your settings, and they are available each time you start your next post.

So, click on New, Blog post (Figure 3). Word opens a new blog post, with a trimmed down set of ribbons (too trimmed down, if you ask me)—Figure 4.

Figure 3    Creating a new blog post in Word

Figure 4    Blog Post ribbons

To set up your accounts (I eventually want to maintain at least two regular blogs) click on Manage Accounts (Figure 5), which will open the Blog Accounts dialog (Figure 6).

Figure 5    Manage Accounts

Figure 6    Blog Accounts dialog

From there, click on New, and in the New Blog Account dialog, select WordPress as your Blog type (Figure 7).

Figure 7    New Blog Account dialog

This, in turn, opens the New WordPress Account dialog (Figure 8), where you have to enter the location of your xmlrpc.php file (as discussed above) and your user name and password.

Figure 8    New WordPress Account dialog

You can also set the Picture options (Figure 9).

Figure 9    Picture options

Figure 10 shows the dialog with all the settings completed.

Figure 10    Completed account settings

When you have added the details and clicked on OK, Word will warn you about the security of the post (Figure 11).

Figure 11    Sending information

So now you’re all set, and from now on, it should be as simple as just selecting your account and writing your blog entry.

Setting up individual blog posts

When the Blog template is opened, it displays a title with the account you want to use (Figure 12). If you have multiple accounts, you can select the right account from the dropdown that will appear when you click on the account line (Figure 13).

Figure 12    Blog title and account

Figure 13    Selecting a different blog account

It is also a useful idea to add categories to your posts. For example, I have set up my menu to include several categories, so that they serve as handy shortcuts to people wanting to read posts related to that category. So, for example, Figure 14 shows my category for Excel blog posts.

Figure 14    Example of a blog-post category in a site menu

Adding categories is easy—just select Insert Category from the Blog Post ribbon tab (Figure 15). The list of categories will be displayed underneath the account information (see Figure 13 above).

Figure 15    Adding categories

For the rest, simply write the blog post, as per normal. The dearth of tools caused by Word hiding normal editing ribbon tabs is a bit annoying, but is easily overcome by adding the tools you want either to the Blog Post ribbon or the QAT.

Publishing the blog post

Once the post is ready, you can publish it straight from Word (Figure 16). I generally recommend first publishing it as a draft, which is exactly the same as saving a blog post as a draft from the WordPress editor. The advantage to this approach is that it allows me to go to WordPress and satisfy my paranoid self that everything looks right before going public with the post (for example, with this post, I forgot to update my fields, so my figure numbering and cross-referencing was out—I could then go back to Word and correct it). If you’re braver, you can go straight to publishing from Word. It stands to reason that while you can write the post offline, you need to be online to publish the post.

Figure 16    Publishing a blog post

Editing existing blog posts

What I find really nice is the fact that you can also open existing posts in Word and edit them. Simply click on Open Existing (Figure 17), choose the post you want to edit from the Open Existing Post dialog (Figure 18), and you’re good to go.

Figure 17    Opening existing blog posts

Figure 18    Open Existing Post dialog

Conclusion

In short, while there are many Blog writing tools out there, I find that for the kind of work I tend to want to include in my Blog posts—captioned figures, tables, fast editing, etc., Word is a good tool of choice, albeit not the only one. After some simple setup, blogging with Word is clean and simple, the way it should be.

Share

Manipulating text data with Excel: Using Excel to reverse engineer Mendeley’s documents database

Some introductory notes

Perhaps “reverse engineer” is a bit of a strong term. Bit it sketches the general idea I want to illustrate—how I tried to figure out in what way Mendeley stores document information in its database.

Also, this is a long blog post. I have a friend who has “coached” me a bit on bloging, and he warned that if I write long posts, people won’t read them. Something about short attention spans and clicking out of long web pages. While he doesn’t fit into that category himself, here’s my limerick to those people:

In an age of Twitter
With minds fully a-flutter
Every single Twit
Will continually want to flit
From this to that quick wit!

Nonetheless, be warned: This is a long post. The aim is to show you how some of Excel’s techniques and functions are used in practise to manipulate text data. Secondarily also, it explains how I derived the various Mendeley document field tables in my forthcoming book on Mendeley. For each Excel function or tool mentioned in this post, I have provided a link to the Microsoft support site explaining that function or tool. So because this post is so long already, I have not included explanations of the functions themselves, as that would really make it too long. But the reality is that the explanation below probably won’t make sense if you don’t have an inkling of how the various functions work.

In the discussion below, I have indicated each Excel function/technique using Strong style formatting.

Also, you will want to download the example workbook [Mendeley fields for different source types (example for book)] and look at it as you read through the discussion. The discussion uses the names of the worksheets for headings, so it helps you find your place in the workbook.

The following functions and techniques were used in the creation of this workbook (just kidding about CONCATENATE—no one in their right mind uses concatenate!). Each item contains a link to the Microsoft support site, so that you can learn more about that feature from there, which I hope will help you understand how I used it in the example workbook:  Mendeley fields for different source types (example for book):

And now, without further ado:

Background

For my book on Mendeley, I had to explain what all the fields in the Mendeley document database were about, so that readers could understand what information belongs where. The problem, of course, is that Mendeley has not made this kind of information freely available. So I had to do some guesswork and some sleuthing. And that’s where Excel comes in. I thought demonstrating the process I used would:

  1. Illustrate that Excel can be a powerful tool for working with text data
  2. Be a good demonstration of some advanced Excel techniques like macros, array formulas, conditional formatting, etc.

The first point is interesting. Excel gets used and abused with numerical data all the time. But few people see it as a good tool for working with text data, which is an unfortunate misconception. Excel has its flaws when it comes to text, but in my work as a research consultant, I work with a lot of text data together with a lot of numerical data. Knowing how to work with text in Excel can be very useful.

So here’s the first challenge—figuring out which fields are actually in the database. This is challenging, not only because Mendeley has not published their database design. Rather, two more complex reasons tend to befuddle things: Firstly it is clear on inspection that, for different document types, they have named the same fields differently (i.e., even though the field appears, in the interface, to be called one thing, the same field, for a different document type, is called something else). Secondly, the interface only gives you the relevant fields for each document type, which means that with no document type are you ever looking at all the fields. To get a clear picture of what all the fields are, you have to piece together the information from all 20 document types provided by Mendeley and build a coherent whole. That’s where Excel comes in.

Doing the work

Worksheets:
Document types–Mendeley to CSL
Mendeley to CSL Fields
Field mapping–Mendeley to CSL

So I started by going to this page, which maps the Mendeley document types against the CSL types:

https://github.com/citation-style-editor/csl-editor/wiki/CSL-Editor-Type-and-Field-Mappings-for-Mendeley-Desktop

The page contains two tables. The first maps the Mendeley document types against the CSL document types. The second maps the Mendeley fields against the CSL fields. I copied and pasted these into two worksheets (note that it is a general principle/best practice of mine to keep one block of data to one worksheet, hence the two sheets). The first I formatted as a table. The problem with the second one (the Mendeley to CSL Fields worksheet) is that some of the rows show that multiple Mendeley fields get mapped against single CSL fields. This means that Mendeley’s database has a lot more detail in terms of field distinctions than currently exists in the CSL specification. Although the process is not shown, a quick Text to columns using the commas as the delimiter, and then copy with Paste SpecialTranspose, allowed me to get each Mendeley field in one row. All I had to do then was use Fill Down (Ctrl+D) to fill down the CSL fields for the multiple Mendeley fields against which they were mapped (thus, from the Mendeley to CSL Fields worksheet, to the desired end result in the Field mapping–Mendeley to CSL worksheet). This was then also formatted as a table. Getting from the original table pasted into the worksheet, to the one-field-per-row version, in other words, took probably much less than two minutes (sorry, I didn’t record a screen cast, and I didn’t run my ManicTime stopwatch, so you’ll have to take my word for it).

This then accounts for the first three sheets.

Worksheets:
Generic
to Working paper (20 in total)

Next I had to create a worksheet for each document type. Fortunately, I had written a macro a long time ago that creates worksheets using the names in a selected range of cells (you don’t need to write these, you can search for one on the net easily). But the code could be as simple as:

Sub AddWorksheetsFromCells()
For Each c In Selection.Cells
With ActiveWorkbook.Worksheets
Set ws = .Add(After:=ActiveWorkbook.Worksheets(.Count))
ws.Name = c
End With
Next c
End Sub

Note that this code does the job, but doesn’t check for error such as worksheet names that contain illegal characters, are too long, or already exist, amongst others. But it does the job.

Using the table of Mendeley document types (from the Document types–Mendeley to CSL worksheet), I have my 20 worksheets in just a few seconds. Not too bad. Probably five minutes into the process in total, and I have 23 worksheets, 22 of which I will be using, all of which have been named, and two of which have already been populated.

The next step was more tedious, and took a lot longer. I went to the Mendeley web interface and stepped through the process of adding a document, for all twenty document types. I selected each document type in turn, activated the Additional Fields option, and copied all the fields that Mendeley gave me, pasting them into their host worksheet (the worksheet with the same name as the current document type). When you copy and paste information from the net like that, a lot of crud comes with it in the form of the html formatting, but this is quickly removed by applying the Normal Cell Style (on Excel’s home ribbon).

On each of these twenty sheets, I added two columns. Column C (Web: Basic/Additional) shows whether the field is in the initial listing, or only available once the Additional Fields button has been selected. This was easy to do, as the top items were the initial ones, and the lower items the additional ones. So it was a simply matter of adding the two terms (Basic or Additional) and then filling down for the other rows. Some minor changes were made later: The Note field was added (at the end), as was the Tags field (at the end of the basic list).

Column B was more work. I opened the Document Details pane in the Mendeley Options dialog on the Mendeley Desktop application, and noted which fields were displayed as marked. These were marked as “Core.” Then I also scanned the full list and indicated which fields I, personally, would recommend also using, or would also consider using. So far, so good.

The next column (D) was to show which CSL field this particular Mendeley field was mapped against. The Mendeley Support site lists the fields mapped for each document type here: http://support.mendeley.com/customer/portal/articles/364144-csl-type-mapping. I copied and pasted each list to the relevant worksheet. Using Text to Columns from the Data ribbon quickly gave me two columns (Columns G:H), one for Mendeley, one for CSL. Once I had that information, it became quite easy to grab that and populate column D with a VLOOKUP:

=IFERROR(VLOOKUP(A2,G:H,2,0),””)

However, note the use of the IFERROR function (introduced with Excel 2007), which is a great wrapper to use with VLOOKUP (I would guess that more than half of my VLOOKUPs these days are wrapped in IFERRORs). I also grouped the twenty sheets and added the formula once for all the worksheets together, thus the slightly excessive use of the whole of columns G:H for my lookup table (the exact number of rows in G:H varies amongst the different worksheets). The formula was filled down, and there it was. I had one problem, though. The main table in columns A:D was showing all possible fields, and the mapping provided by Mendeley was only for the fields they map against CSL for that document type. But the other fields get mapped for other document types, and these are not shown. This was corrected by adding the Filter, and filter out the successfully matched items (filter to show only blanks).

The information for the full field mapping was already available in the table on the Field mapping–Mendeley to CSL worksheet, so I just added a new VLOOKUP:

=VLOOKUP(A9,’Field mapping–Mendeley to CSL’!$A$1:$B$89,2,FALSE)

This formula was added in the top row after the filter had been applied, and filled down across all the unfiltered rows (thus not wiping out the formulas that have already returned something).

Thus, if you scan down column D, you will see these two different VLOOKUP formulas intermingled. Where they appear is determined by how the filter filtered out non-blank lookups from the first one.

One last column (E) was added, to indicate whether I wanted to include this field in the table for that document type in my book (see Appendix A of the book). Here the OR function comes in handy—if the Mendeley field is mapped against a CSL field, or is one of the fields listed in the new document dialog in the desktop application (‘Core’) or is not shown only when the Additional fields link in the web application is clicked (not ‘Additional’), then it must be added:

=OR(B2=”Core”,C2<>”Additional”,LEN(D2)>0)

I had an additional check in cell F1 (using the COUNTA and COUNTBLANK functions), but to tell the truth, it became moot when I started adding the extra ‘Also consider’ or ‘Recommended’ options in column B.

Worksheets:
 Field listings for each type
All fields from all types

 Fields by type

One thing I noticed was that for each of the twenty document types, the Web client showed, when additional fields were displayed, a total of 67 fields were returned (I also later added the Files field from the Desktop application). The worksheet Field listings for each type shows the use of an INDIRECT function to pull these fields over from each sheet (and the ROW function tells the INDIRECT function where to look):

=INDIRECT(“‘”&A$1&”‘!a”&ROW())

However, demystifying the Mendeley database was not simply a matter of reading across the rows, since the initial items (the Web client’s ‘Basic’ fields) were different. So the same item (e.g., Length, or Series Editor, to mention a few) would not always be in the same row). Because of the field name changes for different document types, an alphabetical sort would also not solve the problem.

So now the last trick was to pull all this together. The challenge, of course, was Mendeleys renamed fields—those ‘fields’ that are actually the same field, but have different field names for different document types—I had to find these, and that meant figuring out when the different name was an alias, or an actual different field.

First another macro that was gleaned (and then modified) many years ago from the Excel-G list and stored in my personal macro workbook copied the contents of each of the twenty worksheets and dumped it into one sheet—All fields from all types. I didn’t actually do all that much with this worksheet—it just served as a starting point for another worksheet. What I did do was to copy and paste the worksheet names (from column A of the same sheet) to Column H, and then use Remove duplicates to list only the unique items. Then a quick MATCH function confirms that each of them is in the document type mapping table (if the MATCH returns #N/A then the item is not found):

=MATCH(H2,’Document types–Mendeley to CSL’!$A$2:$A$21,0)

I then did the same Copy and Paste and then Remove duplicates operation with the fields, so that column K listed all the unique field names across all document types. At this stage, the list would consist of ‘real’ field names as well as aliases, but the important thing is that it was complete—it listed 90 entries, to Mendeley’s 67, meaning I had to find and eliminate 23 aliases.

This list of unique field names and aliases was copied and pasted to a new sheet (Fields by type), which would be the one mapping fields across document types. This list was somewhat longer than the previous lists, but it would help me sort out what was going on. Again a VLOOKUP wrapped in an IFERROR would tell if this field was, for this document type, a core field or not. But oh no! I now had to VLOOKUP to twenty different worksheets! That would mean twenty different formulas, one for each column, to refer to the right worksheet. Or maybe not…. After all, each worksheet is named for the document type (thanks, in part, to my macro, and also to good naming conventions). So with one formula that includes an INDIRECT function to build the worksheet name into the reference, I can VLOOKUP the right sheet for that column:

=IFERROR(VLOOKUP($A2,INDIRECT(“‘”&C$1&”‘!$a$1:$b$100″),2,FALSE),””)

Note that the function will return a blank if the field is not found (the IFERROR sees to that), but if the field is found, then the function will return either the text from column B of the lookup sheet (if it has text), or a zero. Zero thus means the item is on that sheet, but it is not a core field. I could have “translated” that into something else with my function, but I was happy to leave it as is for the moment. What I did want to do was to use Conditional Formatting to show me on which sheets a field is not found. This would help me to see how commonly a field was being used, which could also give me clues as to when the field name was actually an alias (it would not appear in many document types ‘under the alias’). To quantify this, Column B counts the various text options and displays them separated by slashes:

=COUNTIF(C2:V2,”Core”)&”/”&COUNTIF(C2:V2,”Recommended”)&”/”&COUNTIF(C2:V2,”Also consider”)

Worksheets:
Field listings for each typeFIN

Almost done. I just need to look at this, and figure out what was being used where. Some were obvious (e.g., the Abstract field, which is there in all twenty document types). The worksheet Field listings for each typeFIN is just the content of the Field listings for each type worksheet copied and pasted as values with Paste Special. Here I used the ‘Generic’ document type as my reference point, and I Sorted alphabetically. Then I looked for mismatches—fields that did not match the same name as was in the Generic field. This was easy to do with another application of Conditional Formatting. Note how the conditional format uses a partially relative reference, so that I could set up one conditional format, that is row-specific—each column in the table is compared to the field name for the Generic field in Column C. Now the gumshoe work came in. For the mismatches, I had to look at the previous sheet, and try to discern what was the most likely name for that field by reading across all the columns. So, for example, for the Case document type, I was able to figure out that Reporter is not an actual field, but is the alias for Authors, as was also Date Decided the alias for Last Updated. So Mendeley, when defining the document information capture for the Case document type, captures the Reporter data to the Authors field, and the Date Decided field to the Last Updated database field. The Film and Television Broadcast fields do the same. To make this task easier, I used the filter to help me zoom in on the fields that need attention. Column B counts how many of the field names for the various document types are the same as for the Generic document type—if it was nineteen, then all twenty agreed, and that field needed no work:

=COUNTIF(D3:V3,C3)

So I filtered out all the 19s, and worked on the rest. What the sheet does not show is that each time I looked at a name (I started with those with the lowest occurrences), I tried to figure out what that field was an alias for. I then entered that alias in the row for the field for which I believed it stood (e.g., adding the value ‘Reporter’ for the Case document type in the ‘Authors’ field), and then I deleted that field row. This process of manual elimination whittled down the fields until I was left with 68 (the 67 former field count, plus 1 for Files).

Worksheets:
Field listings for each typeCOR

The last thing to be done was to map all of this out sensibly, so that it could become a table in the book. The worksheet Field listings for each typeCOR shows again all the document types across the columns, and the now-finalised list of fields down the rows. Again a VLOOKUP to the previous worksheet with an INDIRECT to reference the right worksheet for that column, brings in the name of the document type when the field is listed as core or recommended (not when listed as “also consider”), but to execute that decision, it requires the use of the OR function, nested in an IF function: Quite a mouthful:

=IF(OR(VLOOKUP(‘Field listings for each typeFIN’!C3,INDIRECT(“‘”&D$2&”‘!$a$1:$b$100″),2,FALSE)=”Core”,VLOOKUP(‘Field listings for each typeFIN’!C3,INDIRECT(“‘”&D$2&”‘!$a$1:$b$100″),2,FALSE)=”Recommended”),D$2,””)

Then I used Column A to count for how many document types the field was listed. This is an interesting formula, as I could not simply use a COUNTA or even a COUNT minus a COUNTBLANK, because the VLOOKUP formulas meant that the cells were not actually empty. Thus I had to use an Array formula, testing the length of the value returned by each cell’s formula—one of those instances where an array formula is the only thing that can get a result:

{=SUM(IF(LEN(D3:W3)>0,1,))}

This told me in how many document types, that field was used by Mendeley. However, presenting such a big table (essentially 69 rows x 22 columns) in a book is problematic. For my final step, I had to summarise columns D:W of this worksheet. I would have thought to use CONCATENATE, except that I still believe it is the most useless Excel function ever (ill-)conceived. Now I have a confession to make: I actually used Charles Williams’ excellent CONCAT.RANGE (which forms part of the FastExcel SpeedTools which I was beta testing). But if I distributed this workbook as a demo, that wouldn’t do, since most people don’t have those tools (and I won’t soon, either!). So I had to come up with an alternative. So the original:

=IF(A3=20,”All”,CONCAT.RANGE(D3:W3,”; “, , TRUE,,))

became:

=IF(A3=20,”All”,IF(D3=””,””,D3&”; “)&IF(E3=””,””,E3&”; “)&IF(F3=””,””,F3&”; “)&IF(G3=””,””,G3&”; “)&IF(H3=””,””,H3&”; “)&IF(I3=””,””,I3&”; “)&IF(J3=””,””,J3&”; “)&IF(K3=””,””,K3&”; “)&IF(L3=””,””,L3&”; “)&IF(M3=””,””,M3&”; “)&IF(N3=””,””,N3&”; “)&IF(O3=””,””,O3&”; “)&IF(P3=””,””,P3&”; “)&IF(Q3=””,””,Q3&”; “)&IF(R3=””,””,R3&”; “)&IF(S3=””,””,S3&”; “)&IF(T3=””,””,T3&”; “)&IF(U3=””,””,U3&”; “)&IF(V3=””,””,V3&”; “)&IF(W3=””,””,W3&”; “))

Now that’s a huge formula, and I am way too lazy to do all that typing. But here’s the last technique I used: Excel can actually be used to (sort of) build its own formulas.

So I entered this into D73 (D72 shows a simplified version):

=”if(“&MID(CELL(“address”,D3),2,1)&ROW(D3)&”=”””””&”,”””””&”,”&MID(CELL(“address”,D3),2,1)&ROW(D3)&”&””; “”)&”

Note how the ” has to be preceded by another ” if it is to be displayed in the cell results (it always takes me a moment to get my head around it when I have to write formulas like that). The formula results in this text:

if(D3=””,””,D3&”; “)&

Now all I did was fill D73 across to W73, then copy D73:W73, paste it into my text editor (UltraEdit), where each cell is separated by a tab and do a search and replace to strip out the tabs. I copied the result, and went to cell A3, where I typed the =IF(A3=20,”All”, bit, pasted my text, backspaced out the last ampersand (I could have done that in UltraEdit too), and added the final parenthesis. All this is a lot quicker than typing all of what is in cell A3. The formula then just gets filled down, and there I have my information for the book’s table!

After all of this, I was in the position to simply copy the relevant parts out of this Excel workbook, and paste them into the Word document for the book, formatting them with a table style. If you want to see what these final tables look like, you would, of course, have to get your hands on the book (due out in about a month and a half). The table names (I won’t give table numbers right now, as these are not yet final) as they appear in the book are:
Mapping of Mendeley Document Types to CSL Document Types
M
endeley Document Information Fields
Mendeley Fields Mapped to CSL fields
Mendeley Field Types Showing Core Document Types and Alternative Names
And then all the tables in Appendix A: Mendeley fields per document source type.

Share

Separate table (or figure) numbering for appendices

A question recently came up in the Word-PC e-mail list which is quite typical of something a student doing a thesis or dissertation might come across. I will also modify this post to that context. I was quick to give an answer, and only later thought through all the implications, so that my initial answer would have been less than satisfactory. Here is, what I believe, a more robust solution.

The requirement is as follows:

  1. A dissertation is using chapter-based numbering instead of sequential numbering for its tables (i.e., in Chapter 3, the tables would be 3-1, 3-2, etc.). Note of course, that the student can set the delimiter (here a hyphen) in the process of activating chapter based numbering (the dialog is shown below)—many students use a period.
  2. The dissertation also has appendices, which must be numbered with uppercase letters (Appendix A, Appendix B, etc.).
  3. Those appendices contain tables (or figures, or any other label created with the Insert Caption tool).
  4. Those tables need to be numbered in the chapter-based style, but now reflecting the Appendix number. Thus A-1, A-2, etc. for Appendix A, and so on.

Parts of these are very easy to do. It’s the combination that is a headache.

Requirement 1 is simple. Just use the Insert Caption tool to add the numbers, and set the table-based numbering. See p. 162–163 of Doing your dissertation with Microsoft Word if you have the book (if you don’t some screen caps below will help).

Requirement 2 is equally simple. I have created, in the two templates (Thesis 2010.dotx and Thesis up 2009.dotx) which can be downloaded from the examples page, a style called “Appendix Title” that uses normal upper case alphabetic numbering. This is what the style looks like, showing its definition:

Some pointers. When modifying the style, be sure to go to the paragraph dialog and set its outline level to Level 1. This will ensure that it should be included in your Table of Contents (mine is based on Heading 1, which gives the same end result). Using this style for your appendix headings gives them the desired numbering scheme, thus satisfying requirement 2.

Requirements 3 and 4 are dealt with together, as these are the more tricky ones.

Here are the problems:

When Word adds caption labels, and those caption labels use chapter-based numbering, Word actually adds two fields: One is a STYLEREF field which is set to automatically pull in the heading level 1. The second is a SEQ field, which is set to restart after each occurrence of heading level 1.

Furthermore, whenever you add a table anywhere in the dissertation, Word runs through all instances of the table, and resets all of them. If you thus make changes to the switches of these caption labels in the appendix, they can get nixed by adding any table anywhere else in the dissertation.

To get started, use the Insert Caption tool to insert the caption label for the first table in the first appendix. We will modify the fields it adds afterwards (or, if you are confident writing out fields by hand, you can just skip ahead).

Click on Insert Caption (Captions group, References ribbon). From the Caption dialog, choose Numbering… to open the Caption Numbering dialog, and include the chapter number, also choosing the delimiter (separator). No need to change the Chapter start with style setting here, since all that is allow are the nine built-in heading levels. We will modify this manually later:

Once the chapter-based numbering has been set, the Caption dialog should look more or less like this:

Clicking on OK let’s Word add the caption label, using two fields.

Let’s first look at those fields Word adds. This is what they look like before and after being selected and set to Show field codes (Shift + F9):

So. The field codes (the bits between the braces) and text Word adds, is:

Table { STYLEREF 1 \s }–{SEQ Table \* ARABIC \s 1}

We can’t just change the STYLEREF field from 1 (Word shorthand for the built-in style “Heading 1”) to “Appendix Title,” as Word will nix this each time it updates the caption labels when a table is added somewhere. We also cannot change the restart switch for the SEQ field to anything other than 1-9 (i.e., Word’s nine built-in heading styles).

So we need a slightly different approach.

The first thing is to keep Word from automatically changing these table captions. The second is to get it to restart with each new appendix. Both of these are accomplished by setting the first appendix table caption to:

Table { STYLEREF “Appendix Title” \s }–{SEQ AppendixTable \* ARABIC \r 1}

The change to the STYLEREF field will pull in the required appendix number (from my custom style–if you are using a different style, then you would have to include whatever that style name would be). But note that it is the change to the SEQ identifier (from Table to AppendixTable) that essentially creates a new sequence, and prevents Word from reverting the STYLEREF field from STYLEREF “Appendix Title” back to STYLEREF 1. The SEQ field switch \r 1 which has replaced the \s 1 switch tells Word to restart at 1 here. Note that the 1s in the two switches have vastly different meanings. The \s 1 means restart numbering if there is an instance of “Heading 1” style between me and the preceding similar caption. The 1 refers to Heading 1. The \r 1 means restart right here, using the number 1. Here the 1 refers to 1.

But still more needs to be done. For each subsequent table in each appendix, the caption must be:

Table { STYLEREF “Appendix Title” \s }–{SEQ AppendixTable \* ARABIC \n}

Here the \n switch just tells Word to continue the numbering sequence from the previous similar caption label.

So the first table caption of each appendix will be:

Table { STYLEREF “Appendix Title” \s }–{SEQ AppendixTable \* ARABIC \r 1}

And the table caption of all other tables in each appendix will be:

Table { STYLEREF “Appendix Title” \s }–{SEQ AppendixTable \* ARABIC \n}

Probably the best way to achieve this is to simply copy and paste the caption paragraph for each new table. Yes, it’s a little manual work, but it’s not that bad. If you’re work averse, you could always record a macro writing out the fields (remember to use Ctrl + F9 to add the field braces) and then run that macro each time. Remember to also set the paragraph style to Caption, thus ensuring consistent formatting for your chapter and appendix tables.

Unfortunately, the job is not quite done.

Because we changed the SEQ field from SEQ Table to SEQ AppendixTable, the tables in the appendices will not appear in the list of tables. So we need to fix that.

To get that done, insert (or re-insert) the list of tables.

From the Table of Figures dialog, select the Options button:

In the Table of Figures Options dialog, specify that the table of figures should be built from the Caption Style:

Once you click on OK, notice that back in the Table of Figures dialog, Word will have set the Caption label to (none). This won’t do, because that means that the table will include, tables, figures, and any other caption labels. Compare the Table of Figures dialog below to the one shown above:

Just reset this to Table, and click on OK:

The end result is a list of tables that includes both your chapter and appendix tables:

The great thing about this approach is that it is stable and it works. It will withstand the updating caused by adding new tables or deleting existing tables, updating all fields in the document, closing and re-opening the document, etc.

It does require a small amount of manual work, but I think it is well worth the effort.

Update (2014-03-11):
If you want to do this with multiple captions (e.g., Tables and Figures in your appendices), then a little more work is needed. Read about it here.

Share

Change revision author names

Announcing a new addition to the True Insight Word uTIlities

Change revision authors

It seems that people often encounter problems with the revision author name when using track changes to review a document.

The problem may stem from various circumstances:

  • They forget to set the user name before making their revisions
  • They work on a single document from multiple computers where the registered user name is not the same on the various computers
  • They have made their revisions, but wish to anonymise them before sending them to another person

Regardless of these various situations, the desired end result is the same–to change the author names of revisions once they have been made.

To fully understand the solution, we just need to understand something about reviewing in Word. Essentially, reviewing a document can entail two things: Commenting on the text (done through comments), and suggesting actual changes (either insertions or deletions) through revisions. Interestingly, if you look at the navigation buttons on the Review ribbon, then you will notice that there are buttons that take you from one comment to the next (or previous), and buttons that take you to the next (or previous) revision.

However, while the buttons for the comments do exactly that–take you only to comments, bypassing revisions; the buttons for revisions take you to both revisions and comments. Granted, you can choose to ignore comments (and some other options–even insertions and deletions) from the Show Markup options:

 

In short, while comments are part of the reviewing process, they are viewed differently in the Word object model.

When faced with this problem, the first thought, of course, is to do this manually. However, you will notice that this is not possible, for both comments and revisions.

The next thought is to do it via VBA (if that is an option–if not, see below).

For comments, this is entirely possible. The Comment Object has an Author property, which is read/write. Thus, code like this will change the author name of comments:

For i = 1 To .Comments.Count
  .Comments(i).Author = str_authorname
  .Comments(i).Initial = str_Initial
Next i

But for some odd reason which only the people at Microsoft will know (or sometimes I wonder if even they will know the reason for decisions like this), the Revision Object also has an Author property, but it is read only. Thus, a snippet of code like this, which Word’s VBA IntelliSense will happily allow one to write, will result in an error:

For i = 1 To .Revisions.Count
  .Revisions(i).Author = str_authorname
Next i

This is a problem, since often, being able to change the comment author but not the revision author will not be a satisfactory solution.

What to do?

Well, there are other ways to approach the problem. firstly, if the aim of the exercise relates to my third bullet above (but not the first two), then the document comments and revisions can be anonymised using the Document Inspector:

Note that you want to use the Remove Document Properties and Personal Information button, not the Remove Comments, Revisions, Versions, and Annotations button–you want the revisions and comments to remain, but not be attributed to any author. This approach can be seen in Office help articles like this one: http://office.microsoft.com/en-us/word-help/change-the-author-name-for-review-comments-HA010036415.aspx

 Ok. what about the first two problems I mentioned above? This sort of change can also be made manually, as described by Andrew Savikas and Andy Bruno in Andrew Savikas’ book Word Hacks: Tips & Tools for Taming Your Text. The basic technique is described here: http://oreilly.com/catalog/wordhks/chapter/hack41.pdf, but even this process can seem daunting to less advanced users.

After all that, it does seem as if this new tool in the uTIlities set is a bit moot, but at the end of the day, it has firstly been requested by some users, and secondly, presents what I believe Microsoft should have allowed for these functions. It is simple and effective, and requires you the user, not to have to use VBA or a relatively complex hack to get the job done.

The use of the tool is simple. Once the document is open, select the tool from the uTIlities ribbon, decide how to manage the document (make a backup and do the changes in the original, or do the changes in a renamed copy of the original):

And then choose the author names to replace (any number of names can be replaced with a single new name), add the new author name, and decide whether to change the comments too (and if so, what Initials to use):

If all goes well, clicking OK will do the job. Simple enough.

To date, I know of no other tool that automates this process, and yes, I did get it right to make the change using VBA (naturally!), but of course, I had to take the long way around to get past that read only property. I’ll leave you to speculate about the details!  🙂

So, if you want to give it a spin, go visit the Word uTIlities page and download it today!

Share