|
An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart

Manipulating Text with the Power of Regular Expressions

1 Leave a comment on paragraph 1 0 Previous section: Clustering Data to Find Powerful Patterns with Overview

2 Leave a comment on paragraph 2 0 A regular expression (also called regex) is a powerful tool for finding and manipulating text.[1] At its simplest, a regular expression is just a way of looking through texts to locate patterns. A regular expression can help you find every line that begins with a number, or every instance of an email address, or whenever a word is used even if there are slight variations in how it’s spelled. As long as you can describe the pattern you’re looking for, regular expressions can help you find it. Once you’ve found your patterns, they can then help you manipulate your text so that it fits just what you need. Regular expressions can be difficult, but they are worth it.

3 Leave a comment on paragraph 3 0 This section will explain how to take a book scanned and available on the Internet Archive, Diplomatic correspondence of the Republic of Texas, and manipulate the raw text into a format that you can use in the Gephi network visualization package. We will end with a correspondence network. In this section, we begin with a simple unstructured index of letters, and then use regular expressions to turn the text into a spreadsheet that can be edited in a program like Excel.

4 Leave a comment on paragraph 4 0 Regular expressions can look pretty complex, but once you know the basic syntax and vocabulary, simple ‘regexes’ will be easy. Regular expressions can often be used right inside the ‘Find and Replace’ box in many text and document editors, such as Notepad++ on Windows, or TextWrangler on OS X. You cannot use regex with Microsoft Word, however! To find these text editors, you can find Notepad++ at http://notepad-plus-plus.org/ or TextWrangler at http://www.barebones.com/products/textwrangler/. Both are free and well worth downloading, depending on the platform that you use. This lesson is designed to work with these editors. Results may differ with other editors.

5 Leave a comment on paragraph 5 0 You type the regular expression in the search bar, press ‘find’, and any words that match the pattern you’re looking for will appear on the screen. As you proceed through this section, you may want to look for other things. In addition to the basics provided here, you will also be able to simply search regular expression libraries online: for example, if you want to find all postal codes, you can search “regular expression Canadian postal code” and learn what ‘formula’ to search for to find them.

6 Leave a comment on paragraph 6 0 Let’s start with the basics and say you’re looking for all the instances of “cat” or “dog” in your document. When you type the vertical bar on your keyboard (it looks like |, shift+backslash on windows keyboards), which means ‘or’ in regular expressions. So, if your query is dog|cat and you press ‘find’, it will show you the first time either dog or cat appears in your text. Open up a new file in your editor and write some words that include ‘dog’ and ‘cat’ and try it out.

7 Leave a comment on paragraph 7 0 If you want to replace every instance of either “cat” or “dog” in your document with the world “animal”, you would open your find-and-replace box, put dog|cat in the search query, put animal in the ‘replace’ box, hit ‘replace all’, and watch your entire document fill up with references to animals instead of dogs and cats.

8 Leave a comment on paragraph 8 0 The astute reader will have noticed a problem with the instructions above; simply replacing every instance of “dog” or “cat” with “animal” is bound to create problems. Simple searches don’t differentiate between letters and spaces, so every time “cat” or “dog” appear within words, they’ll also be replaced with “animal”. “catch” will become “animalch”; “dogma” will become “animalma”; “certificate” will become “certifianimale”. In this case, the solution appears simple; put a space before and after your search query, so now it reads:

9 Leave a comment on paragraph 9 0 dog | cat

10 Leave a comment on paragraph 10 0 With the spaces, “animal” replace “dog” or “cat” only in those instances where they’re definitely complete words; that is, when they’re separated by spaces.

11 Leave a comment on paragraph 11 0 The even more astute reader will notice that this still does not solve our problem of replacing every instance of “dog” or “cat”. What if the word comes at the beginning of a line, so it is not in front of a space? What if the word is at the end of a sentence or a clause, and thus followed by a punctuation mark? Luckily, in the language of regex, you can represent the beginning or end of a word using special characters.

12 Leave a comment on paragraph 12 0 \b

13 Leave a comment on paragraph 13 0 means the beginning of a word. So if you search for, \bcat it will find “cat”, “catch”, and “catsup”, but not “copycat”, because your query searched for words beginning with “cat”. For patterns at the end of the line, you would use:

14 Leave a comment on paragraph 14 0 \b

15 Leave a comment on paragraph 15 0 again. If you search for

16 Leave a comment on paragraph 16 0 cat\b

17 Leave a comment on paragraph 17 0 it will find “cat” and “copycat”, but not “catch,” because your query searched for words ending with -”cat”.

18 Leave a comment on paragraph 18 0 Regular expressions can be mixed, so if you wanted to find words only matching “cat”, no matter where in the sentence, you’d search for

19 Leave a comment on paragraph 19 0 \bcat\b

20 Leave a comment on paragraph 20 0 which would find every instance. And, because all regular expressions can be mixed, if you searched for

21 Leave a comment on paragraph 21 0 \bcat|dog\b

22 Leave a comment on paragraph 22 0 and replaced all with “animal”, you would have a document that replaced all instances of “dog” or “cat” with “animal”, no matter where in the sentence they appear.

23 Leave a comment on paragraph 23 0 You can also search for variations within a single word using parentheses. For example if you were looking for instances of “gray” or “grey”, instead of the search query

24 Leave a comment on paragraph 24 0 gray|grey

25 Leave a comment on paragraph 25 0 you could type

26 Leave a comment on paragraph 26 0 gr(a|e)y

27 Leave a comment on paragraph 27 0 instead. The parentheses signify a group, and like the order of operations in arithmetic, regular expressions read the parentheses before anything else. Similarly, if you wanted to find instances of either “that dog” or “that cat”, you would search for:

28 Leave a comment on paragraph 28 0 (that dog)|(that cat)

29 Leave a comment on paragraph 29 0 Notice that the vertical bar | can appear either inside or outside the parentheses, depending on what you want to search for.

30 Leave a comment on paragraph 30 0 The period character . in regular expressions directs the search to just find any character at all. For example, if we searched for:

31 Leave a comment on paragraph 31 0 d.g

32 Leave a comment on paragraph 32 0 the search would return “dig”, “dog”, “dug”, and so forth.

33 Leave a comment on paragraph 33 0 Another special character from our cheat sheet, the plus + instructs the program to find any number of the previous character. If we search for

34 Leave a comment on paragraph 34 0 do+g

35 Leave a comment on paragraph 35 0 it would return any words that looked like “dog”, “doog”, “dooog”, and so forth. Adding parentheses before the plus would make a search for repetitions of whatever is in the parentheses, for example querying

36 Leave a comment on paragraph 36 0 (do)+g

37 Leave a comment on paragraph 37 0 would return “dog”, “dodog”, “dododog”, and so forth.

38 Leave a comment on paragraph 38 0 Combining the plus ‘+’ and period ‘.’ characters can be particularly powerful in regular expressions, instructing the program to find any amount of any characters within your search. A search for

39 Leave a comment on paragraph 39 0 d.+g

40 Leave a comment on paragraph 40 0 for example, might return “dried fruits are g”, because the string begins with “d” and ends with “g”, and has various characters in the middle. Searching for simply “.+” will yield query results that are entire lines of text, because you are searching for any character, and any amount of them.

41 Leave a comment on paragraph 41 0 Parentheses in regular expressions are also very useful when replacing text. The text within a regular expression forms what’s called a group, and the software you use to search remembers which groups you queried in order of their appearance. For example, if you search for

42 Leave a comment on paragraph 42 0 (dogs)( and )(cats)

43 Leave a comment on paragraph 43 0 which would find all instances of “dogs and cats” in your document, your program would remember “dogs” as group 1, ” and ” as group 2, and “cats” as group 3. Your text editor remembers them as “\1″, “\2″, and “\3″ for each group respectively.

44 Leave a comment on paragraph 44 0 If you wanted to switch the order of “dogs” and “cats” every time the phrase “dogs and cats” appeared in your document, you would type

45 Leave a comment on paragraph 45 0 (dogs)( and )(cats)

46 Leave a comment on paragraph 46 0 in the ‘find’ box, and

47 Leave a comment on paragraph 47 0 \3\2\1

48 Leave a comment on paragraph 48 0 in the ‘replace’ box. That would replace the entire string with group 3 (“cats”) in the first spot, group 2 (” and “) in the second spot, and group 1 (“dogs”) in the last spot, thus changing the result to “cats and dogs”.

49 Leave a comment on paragraph 49 0 The vocabulary of regular expressions is pretty large, but there are many cheat sheets for regex online (one that we sometimes use is http://regexlib.com/CheatSheet.aspx. Another good one is at http://docs.activestate.com/komodo/4.4/regex-intro.html)

50 Leave a comment on paragraph 50 0 To illustrate, we’ve included an example of searching using regular expressions that draws from these cheat sheets, to provide a sense of how you would use it to form your own regular expressions. It uses a corpus of diplomatic correspondence in 19th century Texas. Using regular expressions, we will turn an unformatted index of correspondences, drawn from a book, into a structured file that can be read in Excel or any of a number of network analysis tools. A portion of the original file, drawn from the Internet Archive, looks like this:

Sam Houston to A. B. Roman, September 12, 1842 101
Sam Houston to A. B. Roman, October 29, 1842 101
Correspondence for 1843-1846 —
Isaac Van Zandt to Anson Jones, January 11, 1843 103
By the end of this workflow, it will look like this:
Sam Houston, A. B. Roman, September 12 1842
Sam Houston, A. B. Roman, October 29 1842
Isaac Van Zandt, Anson Jones, January 11 1843

51 Leave a comment on paragraph 51 0 While the changes appear insignificant, it will allow us to turn this index into something that a network analysis program (for instance) could read and make visual sense of. It is, in fact, turning an OCR’d page of text into a csv file!

52 Leave a comment on paragraph 52 0 Begin by pointing your browser to the document listed below.[2] https://archive.org/stream/diplomaticcorre33statgoog/diplomaticcorre33statgoog_djvu.txt  then copy and paste the text into your text editor, either  Notepad++ (PC) or TextWrangler (Mac).[3] You can do this by pressing Ctrl+A (Windows) or Cmd+A (Mac). It’ll look a bit messy, but that’s where the next step comes in.

53 Leave a comment on paragraph 53 0 Remember to save a spare copy of your file before you begin – this is very important, because you’re going to make mistakes that you won’t be sure how to fix. Now delete everything but the index where it has the list of letters. Look for this in the text, as depicted in figure 3.9, and delete everything that comes before it (what you’re looking for begins at approximately line 260 and continues until line 2670, depending on how carefully you selected the text when you copy and pasted). This can be a bit time consuming, so feel free to skip this step and download our online version at http://themacroscope.org/2.0/datafiles/raw-correspondence.txt. If you run into problems, try using our ‘raw-correspondence.txt’ file instead of copying and pasting from your browser.

54 Leave a comment on paragraph 54 0 3.9 metadata to delete

[insert Figure 3.9 Screenshot of the metadata to delete in the archive.org file on the diplomatic correspondence of the Republic of Texas] 

56 Leave a comment on paragraph 56 0 If you want to do it yourself, however, you’re looking for the table of letters, starting with ‘Sam Houston to J. Pinckney Henderson, December 31, 1836 51’ and ending with ‘Wm. Henry Daingerfield to Ebenezer Allen, February 2, 1846 1582’. There are, before we clean them, approximately 2400 lines’ worth of entries indexed in this table!

57 Leave a comment on paragraph 57 0 Notice that there is a lot of text that we are not interested in at the moment: page numbers, headers, footers, or categories. We’re going to use regular expressions to get rid of them. Our end goal is to end up with a csv file, which when opened in a spreadsheet would have three columns called:

Sender, Recipient, Date

58 Leave a comment on paragraph 58 0 We are not really concerned about dates for this example, but they might be useful at some point, so we’ll still include them. We’re eventually going to use another program, called OpenRefine, to fix things up further.

59 Leave a comment on paragraph 59 0 Scroll down through the text; notice there are many lines which don’t include a letter, because they’re either header info, or blank, or some other extraneous text. We’re going to get rid of all of those lines. We want to keep every line that looks like this:

Sender to Recipient, Month, Date, Year, Page

60 Leave a comment on paragraph 60 0 This is a complex process, so first we’ll outline exactly what we are going to do, and then walk you through how to do it. We will start by instructing our text editor, using a regular expression, to find every line that looks like a reference to a letter, and put a tilde (a ~ symbol) at the beginning of it so we know to save it for later. Next, we get rid of all the lines that don’t start with tildes, so that we’re left with only the relevant text. After this is done, we format the remaining text by putting commas in appropriate places, so we can import it into a spreadsheet and do further edits there.

61 Leave a comment on paragraph 61 0 There are lots of ways we can do this, but for the sake of clarity we’re going to just delete every line that doesn’t have the word “to” in it (as in sender TO recipient). We will walk you through the seven-step plan of how to manipulate these documents. At the end of each section, the regular expressions and commands are summarized. Read the step first to understand the logic of what’s going on, and then try the summarized commands at the end. Save often!

62 Leave a comment on paragraph 62 0  

63 Leave a comment on paragraph 63 0 Step One: Identifying Lines that have Correspondence Senders and Receivers in them

64 Leave a comment on paragraph 64 0 Move your cursor to the start of the file. In Notepad++, press ctrl-f or search->find to open the find dialogue box. In that box, go to the ‘Replace’ tab, and check the radio box for ‘Regular expression’ at the bottom of the search box. In TextWrangler, hit command+f to open the find and replace dialogue box. Tick off the ‘grep’ radio button (which tells TextWrangler that we want to do a regex search) and the ‘wraparound’ button (which tells TextWrangler to search everywhere).

65 Leave a comment on paragraph 65 0 Remember from earlier that there’s a way to see if the word “to” appears in full. Type

66 Leave a comment on paragraph 66 0 \bto\b

67 Leave a comment on paragraph 67 0 in the search bar. This will find every instance of the word “to” (and not, for instance, also ‘potato’ or ‘tomorrow’).[4]

68 Leave a comment on paragraph 68 0 We don’t just want to find “to”, but the entire line that contains it. We assume that every line that contains the word “to” in full is a line that has relevant letter information, and every line that does not is one we do not need. You learned earlier that the query “.+” returns any amount of text, no matter what it says. If your query is

69 Leave a comment on paragraph 69 0 .+\bto\b.+

70 Leave a comment on paragraph 70 0 your search will return every line which includes the word “to” in full, no matter what comes before or after it, and none of the lines which don’t (it would not find lines that began with ‘to,’ which is good in this case; for future reference, if you did want to do that, you could just remove the first “.+\b”).

71 Leave a comment on paragraph 71 0 As mentioned earlier, we want to add a tilde ~ before each of the lines that look like letters, so we can save them for later. This involves the find-and-replace function, and a query identical to the one before, but with parentheses around it, so it looks like

72 Leave a comment on paragraph 72 0 (.+\bto\b.+)

73 Leave a comment on paragraph 73 0 and the entire line is placed within a parenthetical group. In the ‘replace’ box, enter

74 Leave a comment on paragraph 74 0 ~\1

75 Leave a comment on paragraph 75 0 which just means replace the line with itself (group 1), placing a tilde before it. Make sure you use the digit ‘1’ not the letter ‘l’. In short, that’s:

80 Leave a comment on paragraph 80 0  

81 Leave a comment on paragraph 81 0 Step Two: Removing Lines that Aren’t Relevant

82 Leave a comment on paragraph 82 0 After running the find-and-replace, you should note your document now has most of the lines with tildes in front of it, and a few which do not. The next step is to remove all the lines that do not include a tilde. The search string to find all lines that don’t begin with tildes is

83 Leave a comment on paragraph 83 0 \n[^~].+

84 Leave a comment on paragraph 84 0 A \n at the beginning of a query searches for a new line, which means it’s going to start searching at the first character of each new line.

85 Leave a comment on paragraph 85 0 However, given the evolution of computing, it may well be that this won’t quite work on your system. Linux based systems use \n for a new line (which refers to a “line feed” character), while Windows often uses \r\n (the \r refers to a “carriage return”), and older Macs just use \r. These are the sorts of things that digital historians need to keep in mind! Since this will likely cause much frustration, your safest bet will be to save a copy of what you are working on, and then experiment to see what gives you the best result. In most cases, this will be:

86 Leave a comment on paragraph 86 0 \r\n[^~].+

87 Leave a comment on paragraph 87 0 Within a set of square brackets [] the carrot ^ means search for anything that isn’t within these brackets; in this case, the tilde ~. The .+ as before means search for all the rest of the characters in the line as well. All together, the query returns any full line which does not begin with a tilde; that is, the lines we did not mark as looking like letters.

88 Leave a comment on paragraph 88 0 STEP TWO COMMANDS (SAVE before you run this command so you can undo if necessary!):

89 Leave a comment on paragraph 89 0 Find: \r\n[^~].+

90 Leave a comment on paragraph 90 0 (Remember, you may need to search something like: \n[^~].+ instead, depending on your system)

91 Leave a comment on paragraph 91 0 Replace:

92 Leave a comment on paragraph 92 0 Click ‘Replace All’.

93 Leave a comment on paragraph 93 0 By finding all \r\n[^~].+ and replacing it with nothing, you effectively delete all the lines that don’t look like letters. What you’re left with is a series of letters, and a series of blank lines.

94 Leave a comment on paragraph 94 0  

95 Leave a comment on paragraph 95 0 Step Three: Removing the Blank Lines

96 Leave a comment on paragraph 96 0 We need to remove those surplus blank lines. The find-and-replace query for that is:

97 Leave a comment on paragraph 97 0 STEP THREE COMMANDS:

98 Leave a comment on paragraph 98 0 Find: \n\r

99 Leave a comment on paragraph 99 0 (In Textwrangler on OS X): ^\r

100 Leave a comment on paragraph 100 0 Replace:
Click ‘Replace All’.

101 Leave a comment on paragraph 101 0  

102 Leave a comment on paragraph 102 0 Step Four: Beginning the Transformation into a Spreadsheet

103 Leave a comment on paragraph 103 0 Now that all the extraneous lines have been deleted, it’s time to format the text document into something you can import into and manipulate with Excel as a *.csv, or a comma-separated-value file. A *.csv is a text file which spreadsheet programs like Microsoft Excel can read, where every comma denotes a new column, and every line denotes a new row.

104 Leave a comment on paragraph 104 0 To turn this text file into a spreadsheet, we’ll want to separate it out into one column for sender, one for recipient, and one for date, each separated by a single comma. Notice that most lines have extraneous page numbers attached to them; we can get rid of those with regular expressions. There’s also usually a comma separating the month-date and the year, which we’ll get rid of as well. In the end, the first line should go from looking like:

~Sam Houston to J. Pinckney Henderson, December 31, 1836 51

105 Leave a comment on paragraph 105 0 to

Sam Houston, J. Pinckney Henderson, December 31 1836

106 Leave a comment on paragraph 106 0 such that each data point is in its own column.

107 Leave a comment on paragraph 107 0 Start by removing the page number after the year and the comma between the year and the month-date. To do this, first locate the year on each line by using the regex:

108 Leave a comment on paragraph 108 0 [0-9]{4}

109 Leave a comment on paragraph 109 0 In a regular expression, [0-9] finds any digit between 0 and 9, and {4} will find four of them together. Now extend that search out by appending .+ to the end of the query; as seen before, it will capture the entire rest of the line. The query

110 Leave a comment on paragraph 110 0 [0-9]{4}.+

111 Leave a comment on paragraph 111 0 will return, for example, “1836 51″, “1839 52″, and “1839 53″ from the first three lines of the text. We also want to capture the comma preceding the year, so add a comma and a space before the query, resulting in

112 Leave a comment on paragraph 112 0 , [0-9]{4}.+

113 Leave a comment on paragraph 113 0 which will return “, 1836 51″, “, 1839 52″, etc.

114 Leave a comment on paragraph 114 0 The next step is making the parenthetical groups which will be used to remove parts of the text with find-and-replace. In this case, we want to remove the comma and everything after year, but not the year or the space before it. Thus our query will look like:

115 Leave a comment on paragraph 115 0 (,)( [0-9]{4})(.+)

116 Leave a comment on paragraph 116 0 with the comma as the first group “\1″, the space and the year as the second “\2″, and the rest of the line as the third “\3″. Given that all we care about retaining is the second group (we want to keep the year, but not the comma or the page number), the find-and-replace will look like this:

121 Leave a comment on paragraph 121 0  

122 Leave a comment on paragraph 122 0 Step Five: Removing Tildes

123 Leave a comment on paragraph 123 0 The next step is easy; remove the tildes we added at the beginning of each line, and replace them with nothing to delete them.

128 Leave a comment on paragraph 128 0  

129 Leave a comment on paragraph 129 0 Step Six: Separating Senders and Receivers

130 Leave a comment on paragraph 130 0 Finally, to separate the sender and recipient by a comma, we find all instances of the word “to” and replace it with a comma. Although we used \b to denote the beginning and end of a word earlier in the lesson, we don’t exactly do that here. We include the space preceding the “to” in the regular expression, as well as the \b to denote the word ending. Once we find instances of the word and the space preceding it, ” to\b”, we replace it with a comma “,”.

131 Leave a comment on paragraph 131 0 STEP SIX COMMANDS:

132 Leave a comment on paragraph 132 0 Find:  to\b   

133 Leave a comment on paragraph 133 0 !! remember, there’s a space in front of “ to\b”

134 Leave a comment on paragraph 134 0 Replace: ,

135 Leave a comment on paragraph 135 0 Click ‘Replace All’.

136 Leave a comment on paragraph 136 0  

137 Leave a comment on paragraph 137 0 Step Seven: Cleaning up messy data

138 Leave a comment on paragraph 138 0 You may notice that some lines still do not fit our criteria. Line 22, for example, reads “Abner S. Lipscomb, James Hamilton and A. T. Bumley, AugUHt 15, “. It has an incomplete date; these we don’t need to worry about for our purposes. More worrisome are lines, like 61 “Copy and summary of instructions United States Department of State, ” which include none of the information we want. We can get rid of these lines later in Excel.

139 Leave a comment on paragraph 139 0 The only non-standard lines we need to worry about with regular expressions are the ones with more than 2 commas, like line 178, “A. J. Donelson, Secretary of State [Allen,. arf interim], December 10 1844″. Notice that our second column, the name of the recipient, has a comma inside of it. If you were to import this directly into Excel, you would get four columns, one for sender, two for recipient, and one for date, which would break any analysis you would then like to run. Unfortunately these lines need to be fixed by hand, but happily regular expressions make finding them easy. The query:

140 Leave a comment on paragraph 140 0 .+,.+,.+,

141 Leave a comment on paragraph 141 0 will show you every line with more than 2 commas, because it finds any line that has any set of characters, then a comma, then any other set, then another comma, and so forth.

142 Leave a comment on paragraph 142 0 STEP SEVEN COMMANDS:

143 Leave a comment on paragraph 143 0 Find: .+,.+,.+,

144 Leave a comment on paragraph 144 0 Fix the line by hand, and then click ‘Find Next’.

145 Leave a comment on paragraph 145 0  

146 Leave a comment on paragraph 146 0 Celebrate! You’re almost done!

147 Leave a comment on paragraph 147 0 After using this query, just find each occurrence (there will be 13 of them), and replace the appropriate comma with another character that signifies it was there, like a semicolon.  While you’re searching, you may find some other lines, like 387, “Barnard E. Bee, James Treat, April 28, 1»40 665″, which are still not quite perfect. If you see them, go ahead and fix them by hand so they fit the proper format, deleting the lines that are not relevant. There will also be leftover lines that are clearly not letters; delete those lines.  Finally, there may be snippets of text left over at the bottom of the file. Highlight these and delete them.

148 Leave a comment on paragraph 148 0 At the top of the file, add a new line that simply reads “Sender, Recipient, Date”. These will be the column headers.

149 Leave a comment on paragraph 149 0 Go to file->save as, and save the file as cleaned-correspondence.csv.

150 Leave a comment on paragraph 150 0 Congratulations! You have used regular expressions to extract and clean data. This skill alone will save you valuable time. A copy of the cleaned correspondence file is available at http://themacroscope.org/2.0/datafiles/cleaned-correspondence.csv. Note that the file online has been fixed by hand, so it is well formatted. The file you worked on in this walkthrough still needs some additional cleaning, such as the removal of like 61 “Copy and summary of instructions United States Department of State, “.

151 Leave a comment on paragraph 151 0 Next Section: Cleaning Data with OpenRefine


152 Leave a comment on paragraph 152 0 [1] Regex expressions are instantiated sometimes differently depending on which program you are working with. They simply do not work in Microsoft Word. For best results, try TextWrangler (on Mac) or Notepad++ (in Windows).

153 Leave a comment on paragraph 153 0 [2] We have lodged a copy of this file also at http://themacroscope.org/2.0/datafiles/source-texas-correspondence.txt

154 Leave a comment on paragraph 154 0 [3] Notepad++ (for Windows) can be downloaded at http://notepad-plus-plus.org/ . Textwrangler (for Mac) can be found at http://www.barebones.com/products/textwrangler/

155 Leave a comment on paragraph 155 0 [4] Remember, these are markers for ‘word boundaries’. See http://www.regular-expressions.info/wordboundaries.html

Page 38

Source: http://www.themacroscope.org/?page_id=643