Manipulating text with Excel: Take II

Quite some time ago, I did a post in which I demonstrated some techniques for manipulating repetitive text blocks. However, I will confess, as my Excel-L friend Brian Canes also pointed out, that the example is somewhat contrived, as it would probably have taken less time to use the CONCATENATE function and just to click on all the cells. What I don’t like about doing something like that is the possibility of error—of clicking on one cell twice, or skipping a cell.

I want to go just a bit further with the idea of using Excel to manipulate text (also using a less contrived example), and show how I use Excel to create repetitive (or even partially repetitive) text blocks. Granted, people tend to think of Excel as a tool for working with numerical data (which, of course, it is), and granted, Excel really has some big problems working with rich text, but people tend to miss the fact that the tabular layout of a spreadsheet can still be a useful palette for working with text. For example (and before I get to my post), if I have to create something like an attendance list (a table with space for names, signatures, and perhaps some other information to be completed, but nothing else), then I prefer to do it in Excel. The page layout is a bit trickier, but I find I can do it quicker in Excel than, for example, by creating a table in Word.

Here, then, are four somewhat less contrived (or not!) examples of using Excel to manipulate text. They all relate to using excel to create programming code. The first one I used to crea VBA code, and the last three SAS code.

Example 1

Here is an example where I used Excel to write VBA code for the Word uTIlities. I wanted to be able to determine exactly what is selected, especially if an object is selected. I copied a table from the Microsoft Word help file (and the example below is from the Word 2010 help file), but the same list can be found for Word 2013 at https://msdn.microsoft.com/en-us/library/office/ff860759.aspx. The table consists of three columns, the Name, the value, and a description. I wanted to take that and use it as the base for my programming. So all I did was copy the table and paste it into Excel, and then added some extra columns to do all my “writing” for me. The process is explained below.

Figure 1    Word Help file text pasted into Excel and modified for further programming

If you look at this, the process was simple. My three original columns from the table were pasted into the worksheet columns B:D. I then added two columns between B:C. Then I typed “elseif selection.shaperange(1).type=” into A2 and filled it down. I typed “Then” into C2 and filled that down. Then I added one double quote to D2 and filled that down, and then merged those values with the aforementioned uTIlity of mine (note that the actual values are now, in Figure 2, in Column D and no longer in E). I then typed “WhatIsSelected = WhatIsSelected & ” in an” in F2 and filled that down, and again one double quote to H2 and filled that down. The double quote in D2 deserves an explanation. Because the single quote is Excel’s cue to treat a number as text, if I added that, and copied it across to the VBE, only the number would be pasted, not the single quote (which, as you will see in the code snippet below, I wanted to use for commenting out the number). But a simple search and replace could change that double quote to a single quote once it had been copied and pasted. One more search and replace would also allow me to break the Excel rows into two code rows each. And I just had to delete the very first else to make it the starting if. So, in a matter of a minute or two, I was able, from the help file, to produce the code below. Counting only the text from Excel, that’s 367 words, 2603 characters in fifty lines of code—you would have to be a very fast typist to beat that (and I am not a very fast typist, so it helped me all the more).

intTestForError = .ShapeRange(1).Type
If Err.Number = 0 Then
  '1
  If .ShapeRange(1).Type = msoAutoShape Then
      WhatIsSelected = WhatIsSelected  _
       & " in an AutoShape"
  '2
  ElseIf .ShapeRange(1).Type = msoCallout Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Callout"
  '20
  ElseIf .ShapeRange(1).Type = msoCanvas Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Canvas"
  '3
  ElseIf .ShapeRange(1).Type = msoChart Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Chart"
  '4
  ElseIf .ShapeRange(1).Type = msoComment Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Comment"
	'21
  ElseIf .ShapeRange(1).Type = msoDiagram Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Diagram"
	'7
  ElseIf .ShapeRange(1).Type = msoEmbeddedOLEObject Then
      WhatIsSelected = WhatIsSelected  _
       & " in an Embedded OLE object"
	'8
  ElseIf .ShapeRange(1).Type = msoFormControl Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Form control"
	'5
  ElseIf .ShapeRange(1).Type = msoFreeform Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Freeform"
	'6
  ElseIf .ShapeRange(1).Type = msoGroup Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Group"
	'24
  ElseIf .ShapeRange(1).Type = msoSmartArt Then
      WhatIsSelected = WhatIsSelected  _
       & " in a SmartArt graphic"
	'22
  ElseIf .ShapeRange(1).Type = msoInk Then
      WhatIsSelected = WhatIsSelected  _
       & " in an Ink"
	'23
  ElseIf .ShapeRange(1).Type = msoInkComment Then
      WhatIsSelected = WhatIsSelected  _
       & " in an Ink comment"
	'9
  ElseIf .ShapeRange(1).Type = msoLine Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Line"
	'10
  ElseIf .ShapeRange(1).Type = msoLinkedOLEObject Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Linked OLE object"
	'11
  ElseIf .ShapeRange(1).Type = msoLinkedPicture Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Linked picture"
	'16
  ElseIf .ShapeRange(1).Type = msoMedia Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Media"
	'12
  ElseIf .ShapeRange(1).Type = msoOLEControlObject Then
      WhatIsSelected = WhatIsSelected  _
       & " in an OLE control object"
	'13
  ElseIf .ShapeRange(1).Type = msoPicture Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Picture"
	'14
  ElseIf .ShapeRange(1).Type = msoPlaceholder Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Placeholder"
	'18
  ElseIf .ShapeRange(1).Type = msoScriptAnchor Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Script anchor"
	'-2
  ElseIf .ShapeRange(1).Type = msoShapeTypeMixed Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Mixed shape type"
	'19
  ElseIf .ShapeRange(1).Type = msoTable Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Table"
	'17
  ElseIf .ShapeRange(1).Type = msoTextBox Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Text box"
	'15
  ElseIf .ShapeRange(1).Type = msoTextEffect Then
      WhatIsSelected = WhatIsSelected  _
       & " in a Text effect"
  End If
End If

 Example 2

The first SAS example concerns the scaling of values in a questionnaire for a paper we were writing. Essentially, we have thirty three items, each of which is divided into four sub-questions. The respondents had to indicate to what level they felt their mothers were involved in various aspects of their sport (A), to what level they wanted their mothers to be involved (B), and then again to what level they felt their fathers were involved in various aspects of their sport (C), to what level they wanted their fathers to be involved (D). For each parent, we had to calculate a discrepancy score—the degree to which the perceived behaviour differed from the desired behaviour (i.e., A-B for mothers, and C-D for fathers.

This could also, admittedly, be done in a different way. I must confess that my SAS macro skills are not what my Excel skills are, and even in Excel I’m probably still not at the true level of guru-like enlightenment. But presumably, this could be done with a SAS macro. However, the simple macro below doesn’t work because my incrementer variable I is used next to other characters, and so SAS doesn’t recognise it. Contrary to my own advice, I did not go and teach myself how to conflate the incrementer with the succeeding text into a new variable name, but rather thought to myself: “I’m under a tight time deadline, and I can do this in a minute or two in Excel.” Which is what I did.

So the first thing I did was enter this in Row 1:

Figure 2    Starter line of code in Excel

Next, copy all of that down to row 2 and change the value of B2, E2 and I2 to 2:

Figure 3    Second line of code, with 1 changed to 2

Now select it all, and use autofill to drag and fill it down to row 33.

Now copy all 33 rows, paste below, and in that, use search & replace to replace ‘Mom’ with ‘Dad.’

The last step can be done in one of two ways. If I copy all of that, and paste it into UltraEdit which I use for my SAS programming, each column’s values are added as tab-separated. So a simple search and replace in UltraEdit to replace tabs with nothing, gives me the final result I want:

B_1_Mom_Discrep=B_1A-B_1B;

However, I used one of my own Excel uTIlities (which I have still not released to the public!) to merge all that information into Column A (see the before and after views below). From there, I could just copy that into UltraEdit and I was good to go. Sixty six lines of code written in about two minutes from when I started typing in Excel, until I had the finished product in UltraEdit!

Before

After

Figure 4    Excel text before and after merging cell contents

Turns out, though, that a period is what joins the counter in a SAS macro to the text which comes after it, so I could just have written the macro like this (had I known this at the time):

%MACRO Discrep;
    %do i=1 %to 33;
        B_&i._Mom_Discrep=B_&i.A-B_&i.B;
    %end;
%MEND;

 Example 3

The third example, I will confess, is again a bit more contrived, but this is something I actually did, so it is worth mentioning. It also employs a technique I learned on Excel-G from the Excel guru, Bob Umlas, which, I think, is featured in his book This isn’t Excel, its Magic!

The scenario is this:

I was writing a SAS program to analyse a student’s data. The student found that some of the data points were incorrect (his error in the coding), and sent me an Excel file with the corrections. However, instead of taking the data file, and correcting it, he sent me a file containing only the corrections, and blanks for all the rest. The (de-identified) corrections look like this:

Figure 5    Data which has to be merged into SAS program

The file indicates the respondent number, as well as the items in which errors were found, and the correct values. Unfortunately, I could not simply read the data file into SAS and merge it with the existing data, because the layout was not ideal (SAS would read all the blank cells as missing values). I could certainly try to manipulate the layout and create a useable data set, and either import that from Excel, or read it into SAS as a data card. But that may have been about as much effort as the solution I chose:

The first thing I did was to delete all the empty columns, i.e., those columns to which no corrections applied. This could easily be done for a large dataset by adding a new top row, and using a COUNTA function in each column—those columns with a count of zero are empty and can be deleted (and even that process can be expedited by sorting horizontally). The result now looked like this:

Figure 6    Data with empty columns removed

Next, I needed labels for each row (this was optional, I could have written a more complex formula that pulled everything out of the top row, but I chose the simpler option for its speed). So here I used Bob Umlas’ trick. I added another column (optionally called “Order”). As before, I added 1 and 2 in the first two rows of that column (L2:L3):

Figure 7    Creating the order column

With autofill, that was quickly copied down all 33 rows. That was then copied again and pasted directly below itself, so that I had two number sequences across the rows, from 1 to 33, right below each other. Then I copied the labels from row 1 and pasted them below the data (i.e., next to the second 1 in my sequence, which was in row 35):

Figure 8    New labels added

Autofill copied that down all the remaining rows (up to row 67) and I quickly sorted on the order and No columns, to give me this:

Figure 5    Labels positioned above each row

Note that, in terms of time, this is probably about one minute’s worth of work.

Next, I added a new column A. Now I was ready to create my lines of SAS code.

The interesting thing is that I even wrote my cell formulas with the method I am describing in this post. I know it will seem contrived, but essentially, I wrote these formulas (note the cells in which I wrote them) in a blank worksheet (some cells contain text, other contain formulas, which I have highlighted, but note that B2 starts with an unseen single quote, so it is not counted as a formula). Note also that this is much less typing that it seems, as C2 & F3 contain exactly the same formula, which is only modified slightly for both B3 & D3 respectively:

Figure 9    Formulas to write an Excel formula

The result is shown below:

Figure 10    Results from formulas to write an Excel formula

Then I again use autofill to copy them down to row 12 (L is the 12th letter) and just add one snippet in B13:

Figure 11    Final text for Excel formula

Now comes the fun bit. I copy that (B2:G13), paste it into my handy text editor UltraEdit, copy it again, and paste it (F2, Ctrl+V) into cell A1 of my worksheet containing the data. The result is this:

Figure 12    Formula in first cell

Now I select A1:A2, and fill down (A2 so that the formula is filled down to every second row):

Figure 13    Final code

I can copy that, and paste it directly into my SAS program, and it will make all the corrections I want.

if no eq 8 then do;B1=3;B2=2;end;
if no eq 10 then do;B1=3;B2=3;B3=3;B6=3;B7=3;B16=3;B17=1;B18=2;B22=3;end;
if no eq 11 then do;B1=3;B2=3;B6=3;B7=3;B16=3;B18=2;B22=3;end;
if no eq 12 then do;B1=3;B2=3;B3=2;B6=3;B7=2;B16=3;B17=1;B18=2;B22=3;end;
if no eq 13 then do;B1=3;B2=3;B3=2;B6=3;B16=3;B17=2;B18=2;B22=3;end;
if no eq 14 then do;B1=3;B2=3;B3=2;B6=3;B16=3;B17=1;B18=3;B22=2;end;
if no eq 15 then do;B1=3;B2=3;B3=3;B6=3;B7=1;B16=3;B17=1;B18=2;B22=3;end;
if no eq 16 then do;B1=1;B2=3;B3=3;B6=3;B7=1;B16=3;B17=3;B18=3;end;
if no eq 17 then do;B1=3;B2=2;B3=2;B6=3;B7=3;B16=3;B17=3;B18=3;B22=3;end;
if no eq 18 then do;B1=3;B2=3;B3=3;B6=3;B16=3;B17=3;B18=2;B22=3;end;
if no eq 19 then do;B1=3;B2=3;B3=3;B6=3;B7=2;B16=3;B17=1;end;
if no eq 20 then do;B1=3;B2=3;B3=3;B6=2;B16=3;B17=1;B18=2;B22=3;end;
if no eq 21 then do;B1=3;B2=3;B3=2;B6=3;B7=3;B16=3;B17=3;B18=3;B22=3;end;
if no eq 22 then do;B1=3;B2=3;B3=3;B6=3;B7=2;end;
if no eq 26 then do;B16=3;end;
if no eq 29 then do;B1=3;B14=1;B16=3;B17=3;end;
if no eq 33 then do;B1=3;end;
if no eq 36 then do;B16=3;end;
if no eq 38 then do;B22=3;end;
if no eq 43 then do;B16=3;end;
if no eq 45 then do;B16=3;end;
if no eq 48 then do;B22=2;end;
if no eq 57 then do;B16=3;end;
if no eq 59 then do;B16=3;end;
if no eq 63 then do;B17=1;end;
if no eq 69 then do;B6=3;end;
if no eq 72 then do;B6=3;B16=3;B18=3;end;
if no eq 73 then do;B6=3;end;
if no eq 83 then do;B16=3;end;
if no eq 84 then do;B6=3;B18=1;end;
if no eq 89 then do;B3=2;end;
if no eq 99 then do;B17=1;end;
if no eq 111 then do;B18=2;end;

The process may be crude, but is surprisingly effective and efficient: This kind of programming takes just a minute or two to complete.

Example 4

This example is much the same as before: Data corrections (yes, it seems as if I spend more time correcting data than analysing data, but let’s just say that proper preparation leads to efficient execution).

The basic layout is as shown below–almost 70 rows of corrections that need to be made. The trick here is that it is not always the same variable that is to be corrected. The variables names are listed in B1:J1, and if a column has a value, that variable must be set to that value for that respondent (as listed in Column A).

Excel Text example 4b

Figure 14    Listed corrections

So I use this formula, and the results are shown in Column K in the final figure.

="if Number eq " & A2 & " and " & $B$1 & " eq " & B2 & " and " &OFFSET($B$1,0,MATCH(MAX(C2:H2),C2:H2,0),1,1) & " eq " & MAX(C2:H2)& " then do; " & $B$1 & "="&I2&";" &OFFSET($B$1,0,MATCH(MAX(C2:H2),C2:H2,0),1,1) & "="&J2&"; end;"

Excel Text example 4a

Figure 15    Formula result showing code which can be copied to SAS

Essentially, my formula finds which cells have values, and then pulls the column headings in for those cells, setting them equal to the value found in the cell. Using this method, I could get Excel to create 6557 characters of code for me by typing 228 characters (the length of the formula) and doing a little bit of layout.

Conclusion

In closing, this is definitely not a conventional usage of Excel, but in terms of getting “repetitive” (actually, “patterned” is a better term) text created, it is hard to beat for speed if you know Excel well. I can create hundreds of lines of code, if need be, in a fraction of the time it would take to type it.

Share