|
An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart

Breaking a CSV file into separate txt files

1 Leave a comment on paragraph 1 0 Sometimes it might be necessary to break a csv file into separate text files, in order to do the next stage of your analysis. For instance, imagine that you had scrape John Adam’s diaries into a single csv file, but what you really wanted to know was his ego network regarding his relationships with other people, institutions, and organizations. This becomes an exercise in named entity recognition, and network extraction. RezoViz, a tool within the Voyant Tools platform, can parse individual documents and tie together any named entities within (by using the Stanford Named Entity Recognizer).

2 Leave a comment on paragraph 2 0 Here’s how the workflow might go.

3 Leave a comment on paragraph 3 3 1. Scrape your source into a single csv file.
2. Run a script to break each row into a single text file.
3. zip the resulting files into a zipped folder.
4. upload the zipped folder to http://voyeurtools.org/tool/RezoViz/
(4a. And if you have an existing corpus in Voyant-Tools, you can copy and paste its corpus number, available under the cogwheel icon at the top-right of the page when you then click on ‘url for export’, like so: http://voyeurtools.org/tool/RezoViz/?corpus=1392142656545.494)
5. clicking the cogwheel icon to get the results as a .net file for import into a network analysis program like Pajek or Gephi.

4 Leave a comment on paragraph 4 0 (There are of course other times when you might want to have a folder of separate documents, as for instance when topic modeling with MALLET from the command line.)

5 Leave a comment on paragraph 5 0 Let’s focus on step two. In Excel, one can record macros (or write them from scratch) to achieve this. Another route to explore, when we don’t know exactly how to achieve what we’re trying to do, is to examine the question and answer site StackOverflow. A user named Eke had this exact question – how do I write each Excel row to new .txt file with ColumnA as file name? If you examine his question on Stackoverflow, Eke has precisely described what it is he’s trying to achieve, he has cited other threads on the forum that seem relevant to his question, even if they are not exactly what he wants, and he’s given examples of his data and his ideal output. This is a very good example of how to get the most out of this (or indeed any) forum.

6 Leave a comment on paragraph 6 0 If we examine the thread, we see a lot of back and forth with various users as suggestions are made and Eke reports back his results. A good citizen of the site, Eke also posts his complete code once he finds the solution that works best for him. To write this script, we have to make a new macro in excel, and then edit it. In excel, find the Macros button for your version of Excel, click view macros, and then click ‘create’. (Sometimes, the ‘create’ button is greyed out. If this is true for you, click ‘record new macro’ instead, hit the start then stop button and the new macro will appear on the list. You can then click ‘edit’.) A new window opens with ‘Microsoft Visual Basic’ in the title. You can now copy and paste Eke’s code into your window. You don’t need to hit save, as anything you do in this window will save automatically. Go back over to your normal Excel spreadsheet which contains your table of data. Click on Macros, select the one now listed as ‘SaveRowsasTXT’ and the macro will automatically copy and paste each row into a new worksheet, save it as a txt file, close that new worksheet, and iterate down to the next row. If you get an out-of-range error, make sure your worksheet in Excel is named ‘Sheet1′, so that this line in the script is correct:

7 Leave a comment on paragraph 7 0 Set wsSource = ThisWorkbook.Worksheets("Sheet1")

8 Leave a comment on paragraph 8 0 Note the line that reads

9 Leave a comment on paragraph 9 0 filePath = “C:\Users\Administrator\Documents\TEST\

10 Leave a comment on paragraph 10 0 You will want to change everything after the quotation mark to point to the location where you want your separate files stored.

11 Leave a comment on paragraph 11 2 Eke’s code is also available here via github. Try it yourself on John Adam’s Diary (which you can download as a csv file here right click on the link, save as .csv) Incidentally, having the csv broken into separate files arranged in chronological order means that when the zipped folder of these files is uploaded to Voyant Tools, you can explore trends over time, reading from left (older) to right (newer), in the basic Voyant Tools interface, as here: http://voyant-tools.org/?corpus=1389375873945.8996  and here it is in the RezoViz interface: http://voyeurtools.org/tool/RezoViz/?corpus=1389375873945.8996. 

12 Leave a comment on paragraph 12 0 To export this data for further analysis, click on the cogwheel icon in the top right and select the format that you’d like. Note that this selection doesn’t download the data in that format; rather, it opens a window in which you must copy the data to your favourite notepad application. If you click on the left pointing arrow at the top right of the page, you can also edit the data directly in the browser and see the results immediately. This works best in the Chrome browser.

13 Leave a comment on paragraph 13 0  

Page 66

Source: http://www.themacroscope.org/?page_id=418