|
An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart

All Comments

Comments on the Pages

  • Regex (36 comments)

    • Comment by Ben on June 24th, 2014

      This use of | works for Notepad++ but not for MS Word 2010. Haven’t tried Textwrangler. Might be better to say something like “While MS Word has some basic search and replace functions, other programs for eding text take full advantage of regular expressions in their search and replace tools”

      Comment by Ben on June 24th, 2014

      odd location of space around parenthesis here:

       

      , \bcat )it will

      Comment by Ben on June 24th, 2014

      \<cat|dog\> will also match these forms

      - catch

      - houndog

      Is that intended?

      Comment by Ben on June 24th, 2014

      the ‘a’ is hyperlinked to something on Shawn’s computer…

       

      Comment by Ben on June 24th, 2014

      Also weird hyperlink in here

      Comment by Ben on June 24th, 2014

      The sentence “Notepad++ remembers them as “″, “″, and “″ for each group respectively.” is a bit hard to parse with those odd clusters of punctuation! Is it possible to express it in some other way? Or is something missing?

      Comment by Ben on June 24th, 2014

      There seems to be nothing in this paragraph… can that be right? If you mean to leave it empty then it might be good to say that to reduce ambiguity.

      Comment by Ben on June 24th, 2014

      A few other nice regex testers:

      http://regexpal.com/
      http://regexone.com/

      Comment by Ben on June 24th, 2014

      It would be good to mention where this list ends also, the savy user might want to copy and paste the list into another doc, rather than delete the other text (which is quite a lot to select!) and leave the list in situ

      Comment by Ben on June 24th, 2014

      Footnotes links are not taking me anywhere (Chrome/Windows)

      Comment by Ben on June 24th, 2014

      Notepad++ has a checkbox “. matches newline” that has to be un-checked for this to work.

      Comment by Ben on June 24th, 2014

      What am I doing wrong? This is what I get after this line:

      ~ Ebenezer Allen, September 30
      ~ James Buchanan, September 23
      ~ Ebenezer Allen, October 8
      ~ David S. Kaufman, October 15

       

      The years have been removed…

      Comment by Ben on June 24th, 2014

      This doesn’t seem to do the trick for me. (,)( [0-9]{4})(.+) removes the year, which I thought we wanted to keep

      Comment by Ben on June 24th, 2014

      It’s not really clear why we’re adding tildas here… do they have a special regex function? I don’t know.

      Comment by Ben on June 24th, 2014

      I’d suggest introducing the comment symbol # and using that to spell out these slightly ambiguous expressions, eg.

      Find:  to    # there’s one space before the ‘to’
      Replace: , # a comma with no spaces around it

      Comment by Ben on June 24th, 2014

      Great chapter! I learned a lot and the example is a compelling demonstration of the power of regex.

      Comment by Shawn on June 25th, 2014

      Thanks Ben!

      Comment by Shawn on June 25th, 2014

      Good point Ben – thanks!

      Comment by Shawn on June 25th, 2014

      Hi Ben,

      It doesn’t have any particular special regex function, but it helps us to mark off the bits of text that we are particularly interested in, so that we can find it easily later on. Since the tilde is not often used, we’re probably not going to grab any extraneous text in any subsequent operation.

      At least, we hope!

      Comment by Shawn on June 25th, 2014

      Hi Ben,
      Thanks for catching this. It should be, replace with \2. For whatever reason, WordPress is blanking out anywhere I write \2. I’ve inserted a screenshot instead.

      Comment by Shawn on June 25th, 2014

      Are you using Notepad++ or Textwrangler or something else? I think the issue might be the whole line endings thing (new lines versus carriage returns) – http://stackoverflow.com/questions/1761051/difference-between-n-and-r

      Comment by Shawn on June 25th, 2014

      Are you using Notepad++ or Textwrangler or something else? I think the issue might be the whole line endings thing (new lines versus carriage returns) –http://stackoverflow.com/questions/1761051/difference-between-n-and-r

      Comment by Ben on June 25th, 2014

      Thanks for responding, I’m using Notepad++ I

      Comment by Shawn on June 25th, 2014

      Ok. I’ll switch to a windows machine and run through this again with Notepad++ and see if I can figure out what’s going on or what we’ve neglected to relate. Stay tuned!

      Comment by Shawn on June 26th, 2014

      Ok, I think it was that darned wordpress glitch that stops \2 (backslash-two, in case it happens in comments too) from displaying. Looks like you replaced with a blank, which would remove the comma, the year, and the page number. So replace with the second group and it should be fine.

      Comment by Shawn on June 26th, 2014

      Good catch, thanks.

      Comment by Shawn on June 26th, 2014

      Ah, that’s because I cut and pasted from Word. Darn.

      Comment by Shawn on June 26th, 2014

      The one regex that deletes everything not preceded by a tilde (see below) sorts this issue out though. Really, one could just open the txt file directly from the website and run all this regex on it with no cutting and pasting involved, but I wanted to give an indication of what exactly it was we were trying to grab/clean, etc.

      Comment by Shawn on June 26th, 2014

      Thanks!

      Comment by Shawn on June 26th, 2014

      Ah crap. WordPress zings me again: it should read

      \3\2\1

      WordPress is not displaying that!

      Comment by Shawn on June 26th, 2014

      Something is missing. At every point in this document where I’ve written \1 or \2 or \3, wordpress has stripped it out. Should read:

       

      Notepad++ remembers them as “\1″, “\2″, and “\3″ for each group respectively

      Comment by Shawn on June 26th, 2014

      ha, stupid bloody Word.

      Comment by Shawn on June 26th, 2014

      Did you test that? If so, then that wasn’t quite what was intended. I’ll have to check with Scott, as he’s much more au fait with regex than I am.

      Comment by Shawn on June 26th, 2014

      yep. Can’t type.

      Comment by Shawn on June 26th, 2014

      That’s good phrasing, thank you. Will incorporate that.

      Comment by Shawn on June 26th, 2014

      Folks, apologies for some of the layouting. WordPress for reasons that I can’t explain (but are probably explained somewhere at wordpress.com) strips out any time we type the regex indicating a group. If you put your search string within parentheses ( ) , regex will remember that as a group. So in your replace, you can indicate which group you’re interested in like so \1 for the first group, \2 for the second, etc.

  • The Third Wave of Computational History (29 comments)

    • Comment by Shawn on September 6th, 2013

      These few paragraphs remind me of John Bonnet’s work on Harold Innis. I’ll need to chase that thought down. But something of its form might be found at:

      http://www.humanities.utoronto.ca/event_details/id=753

      and

      http://www.academia.edu/2465986/Harold_Innis_Information_Management_and_the_Topographic_Revolution_in_Communication

      p6 “To survive, Innis believed that humans have to think consciously about their instruments and signs of communication. They have to change them, play with them, complexify them, even violate them if they are to adapt to new challenges and environments.”

      and p8: ‘Information and increasing returns are two neglected constituents of Innis’ thought, and they shouldn’t be.”

      and finally p9: ‘ Innis thought deeply about the dynamics of information, and how said dynamics impinge on the function anddysfunction of complex systems. In so doing, Innis independently replicated the accomplishment of an important contemporary: Norbert Wiener, one of the founders of the fieldof cybernetics. ‘

      Something to think about, but we are working in a tradition that doesn’t just begin with Busa, but also has other roots that connect with present day concerns – thinking too about our thoughts on the ethics of big data.

      We might (?) want to work that bit of intellectual history in somewhere else.

      Comment by Shawn on September 6th, 2013

      Footnote maybe on what WAIS is?

      Comment by Shawn on September 6th, 2013

      Expand a bit more on what this dark side is?

      Comment by Shawn on September 6th, 2013

      Might be interesting to contrast what ‘archiving’ means in digital historical/humanities circles versus what archivists themselves say about what an ‘archive’ is, versus a ‘collection’.

      Comment by Shawn on September 6th, 2013

      Socrates was famously not a fan of writing things down.

      Comment by Ian Milligan on September 6th, 2013

      Incorporated a quick description in footnote 14. It’s pretty cool actually how in 1994, before the Internet Archive, it’s seen as a tool to retrieve cultural information.

      Comment by Ian Milligan on September 8th, 2013

      Yes, yes, and yes – this is a fantastic article. It’s not the sort of thing that lends itself to easy incorporation, as Innis’ thought is quite complicated…

      Comment by José Igartua on September 11th, 2013

      Information  becomes “historical” only when historians conceive of it as answering questions about the past that they deem relevant to answer for themselves and for their public.

       

      Massive amounts of data are not new. Archivists have had for years to decide whether to keep or to let go miles and miles of paper trails. They, in other words, have had to anticipate which kinds of historical questions are likely to be answered by the documents they have charge of. An example is the thousands and thousands of parking tickets any large city has issued. Archivists have to decide whether to keep this information in the medium and long term.

      Comment by José Igartua on September 11th, 2013

      Strange that, as fellow historians, you seem to ignore the rapid rise, internationally, of quantitative history in the 1970s. It covered economic history, of course, but also social and demographic topics. Are these topics to be excluded from “History” because they are not “Humanities” topics?

      Humanities computing, of course, goes back to Father Busa’s machine-readable concordance of the works of St. Thomas Aquinas in … 1951 (see http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1069&context=classicsfacpub).

      Comment by Ian Milligan on September 11th, 2013

      Ack, this is a problem with having snippets out of context (and is a good reminder that we need to do this). When we write

      “Above, we have seen how the digital humanities have developed and flourished.”

      we are referring to a section that we’ve written but not posted yet, which is on the early emergences – we begin with Father Busa, move into the 1960s/70s wave of computational history (the Federalist Paper authorship dispute, journals such as Computers and the Humanities, and closer to home here in Canada, studies like Katz’s People of Hamilton Canada West). That’s the ‘first wave,’ and then we have the ‘second wave’ as GIS, H-Net, Personal computing, ACH, etc.

      I apologize that we didn’t make that clear and I’ll move it up in the release calendar. In the future, in such contexts, we should provide an editor’s note making this clearer.

      That said, looking at the draft, we  do need to really stress the international dimension of it and I’ll apply your comment to that as well.

      Thanks for your comments, which are really helpful.

      Comment by José Igartua on September 11th, 2013

      On a more general note, I think you should acknowledge that each generation of historians claims to be doing something new which their predecessors have failed to notice or do, and that you are indeed casting your argument for the book in this vein, for whichever reasons you care to invoke.

      You will sense from my comments some irration at this strain of argument when it appears to ignore (I see elswhere in the draft that you don’t, but that’s the problem with non-sequential reading, isn’t it?) what has come before. A successful rhetorical strategy of this type has to be explicitly concerned with solidity of the claim of doing something “new”.  I see on the earlier pages that you are aware of this, but I haven’t seen an explicit acknowledgement of it. Just a suggestion….

      Comment by Ian Milligan on September 11th, 2013

      Good point, and one that we’ll flesh out. I was actually thinking about this earlier this week, along the lines of Chad Black’s “How Far Have We Come in the Digital Humanities” (http://parezcoydigo.wordpress.com/2010/10/14/how-far-have-we-come-in-the-digital-humanities/), and yeah – mea culpa – we do need to make sure that we make this far more explicit. While we do think we have something new to add to the discussion, an explicit section in the book about this is in mind. We will chat about adding this to Chapter One as a subheading.

      Non-sequential reading is tough (tougher than we thought at the conceptual level, to be honest) but you’re definitely speaking at a bigger issue that we need to address.

      Thanks for your engagement and honesty.

      Comment by Shawn on September 11th, 2013

      Very true. What’s different, perhaps, is the intentionality of that process. We’re not used to thinking that in digital terms, we still have to make those decisions, even when we let robots do a lot of the heavy lifting. Perhaps what is genuinely ‘new’ in this process is that it is shared between human and machine. The decision making element might reside in ‘when’ to sweep, moreso than in ‘what’. A good point – thank you.

      Comment by Scott Weingart on September 11th, 2013

      I do note that while, as Ian said, we do mention all of this in the as-yet unpublished other section, we should probably dwell longer than we do on cliometrics, as well as put in a bit about longue duree and other specific statistical predecessors.

      Thank you for your careful read-through, this is exactly what we need to make as good a finished product as possible. Clearly we also need to be more careful about what is or is not currently included in the public draft, as well.

      Comment by Jonathan McQuarrie on September 22nd, 2013

      The point about Yahoo being ‘Exhibit A’ of ‘bad historical citizens’ seems to be unfinished. It mentions that Yahoo bought Geocities, and I know (thanks to Ian) that they ended up eliminating a lot of that user content. It alludes to this act on paragraph 22, but the connection remains a bit unclear.

      Comment by Ian Milligan on September 23rd, 2013

      Great catch, you’re completely right here. Thanks Jonathan, it’s sincerely appreciated.

      Comment by Michael W. Kramer on March 29th, 2014

      As mentioned, influential individuals in positions of power within corporations indeed do not realize the importance of preserving historical data. However, I believe that preserving historical data created by influential individuals or groups within governments is also of high importance. Historians need to take heed and recognize that influential government personal also play a significant role as to how the past will be remembered.
      I did a quick search on this site of the term “intranet” and to my surprise the word has yet to be used. Therefore, in regards to big data, I think it is important that future action must be taken to preserve protected and secure data created by government employees and stored privately on intranets. Historians should work with governments and take collective efforts to not only preserve publicly accessible digital data, but also privately held data distributed internally throughout an intranet.
      Imagine if you will, that top-secret government documents were collected and stored regarding an event of historical importance. The traditional method of storing classified documents was to take the printed paper, store it in a locked archive, and hide the key until public demand was high enough to warrant declassification. If we take a monumental event such as the JFK assassination as an example, all paper files and reports of the event were stored away for declassification at a later date. What were the repercussions? Historians and the public have and continue to have different understandings of what actually happened and whether there was a conspiracy. At that time, we must remember the government documents existed in only paper format. But now, any information on an event of historical importance would more than likely be created, stored, and backed up in electronic format and left either on government intranets and/or storage media.
      When information is declassified by governments, historians can analyze the new data to gain new perspectives about the past. However, influential individuals always have the potential to destroy evidence resulting in nobody knowing the true history of the past. This is not a new phenomenon because as we know companies and governments have always destroyed documents to protect themselves. As an example, in the 1970s when the US embassy was being overrun in Vietnam, government workers scurried to destroy classified information. Therefore, the evidence allowing us to better understand what happened was lost for all time (unless if shredded documents were stitched back together of course).
      In the ear of big data, we must be aware that one individual with access to a digital intranet database can dramatically alter how we remember the past and the present by simply pressing the delete key. Take the movie The Departed for example, where with one click of the mouse the undercover agent’s identity was completely erased. I believe that steps should be taken to not only work with private corporations, but also governments to (like some companies, to my knowledge, do) perform nightly backups of data so future historians can more accurately write about the past in the future.
      In the article “Big? Smart? Clean? Messy? Data in the Humanities,” by Christof Schoch, he brilliantly explains how digital data exists as either big data (unstructured) or smart data (clean, structured, organized, and sorted for study). What he does not mention is that both types of data are in regards to strictly public data (such as that preserved at the Internet Archive). I propose another type of data of equal if not more importance to be preserved: concealed data. I argue that Scholars need to advocate methods to keep digitally concealed data on government intranets preserved for future generations. Otherwise, a phrase I created that I hope catches on, the possibly exists that “whoever has their finger on the delete button controls how the history of the present will be remembered.”
       
      Source:
      Christof Schoch, “Big? Smart? Clean? Messy? Data in the Humanities,” Journal of Digital Humanities, Vol. 2, No. 3 Summer 2013. http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/
       

      Comment by Amanda Seligman on March 29th, 2014

      Not sure if you’ll get to this, but one question to ask is whether this is the information we want archived. Sure, we want some of it. But where are the institutional histories, neighborhood records, etc., that are the stuff of my own particular corner of history? So much of what the Internet captures is front-facing rather than inward facing, that this direction presents a problem for historians who aspire to look inside organizations.

      Comment by Amanda Seligman on March 29th, 2014

      This role of curation is key.

      Comment by Bricelyn on April 1st, 2014

      Is it possible to elaborate on the “dangerous” characteristic of long-term retention and preservation?  You write that it would require hard work and needed to be done quickly, but how is this dangerous?

      Comment by Nicole Spangler on April 1st, 2014

      When examining the role of internet data and print data, how does an organization, like the Library of Congress,  curate the digital collection?  Do the collections simply contain all the data, or should historians require some weeding through the data?  I think of the LOC’s twitter collection, and am confused on how this entire database will be useful in the future.  I am aware that there are times when certain tweets have historical value, but others, like my tweets for example, that contain little historical value.  How does digital historical scholarship exist and continue to remain valid with the overload of information accessible to scholars?

      Comment by Karlie on April 2nd, 2014

      I believe we need more “forward thinking” people who look at what all of this digital history could potentially mean for future historians and future generations. Preserving all of what the internet is made of seems near impossible, but I know it can be done and is being done. The Internet Archive is extremely important when it comes to preserving the digital past and these new storage mediums with high standards of longevity and storage are astounding. My question is: Who decides on what to archive and what not to archive?

      Comment by Zakea Jones on April 3rd, 2014

      In regards to Jason Gleick comment on lost plays of Sophocles…this is a prime example of why the documenting of history is essential to our past and future. It is heartbreaking to know that out of 123 plays that were written only 7 have survived in a complete form. Now, for someone that has no interest in Greek tragedians this is no big deal, however to a person that is spending majority of their life researching the Greek era this is a huge deal! So in my opinion Big Data is an essential tool and since it’s available it should be utilized for historical purposes in regards to preserving large amounts of data.

      Comment by Zakea Jones on April 3rd, 2014

      In regards to Jason Gleick comment on lost plays of Sophocles…this is a prime example of why the documenting of history is essential to our past and future. It is heartbreaking to know that out of 123 plays that were written only 7 have survived in a complete form. Now, for someone that has no interest in Greek tragedians this is no big deal, however to a person that is spending majority of their life researching the Greek era this is a huge deal! So in my opinion Big Data is an essential tool and since it’s available it should be utilized for historical purposes in regards to preserving large amounts of data.

      Comment by Ian Milligan on April 4th, 2014

      Good question. They can’t really curate them, given the size (or aren’t, in any case). We’ll get a lot of tweets about what people ate for lunch and dinner, for example. But I think on a large scale, those sorts of tweets will be an incredible bread and butter resource for social historians in the future. We can reconstruct online daily habits of at least a percentage of the population with this sort of data, even if the individual tweets don’t meet a ‘remarkability’ threshold.

      Comment by Ian Milligan on April 4th, 2014

      Thanks, Karlie – that’s one of the aims of this project. As for who decides what to archive and what not to archive, I suppose it’s up to two things: the crawl engineer, and the decisions they may have to make about how deep a web crawler goes; and us, since we can technically opt out of this process by putting a little file called robots.txt on our servers.

      Comment by Brandon Locke on February 15th, 2015

      It’s *James* Gleick, not Jason Gleick. Footnote name is correct.

      Comment by Ian Milligan on February 18th, 2015

      Thanks for the catch on this! Corrected.

  • The Limits of Big Data, or Big Data and the Practice of History (26 comments)

    • Comment by Scott Weingart on September 5th, 2013

      Re: last sentence, big data does shift the rhetorical weight of historiography from one based in positive examples to one based in trends, and I feel like we should touch on this early on. It might not be as high a stake issue as the epistemological foundations of the field, but it is a big change affecting the nature of argument and evidence.

      Comment by Scott Weingart on September 5th, 2013

      Include a comment here about Aviezer Tucker’s Our Knowledge of the Past, regarding how historians triangulate evidence into insight and degrees of certainty.

      Comment by Scott Weingart on September 5th, 2013

      Did we mean what’s said in the last sentence here?

      Comment by Ian Milligan on September 5th, 2013

      Hah! No, a typo that dramatically changed the meaning. (for those interested, it was implying that computers had to put everything together – sort of an artificial intelligence approach to history). :)

      Comment by Shawn on September 6th, 2013

      ‘noted by noted’

      Comment by Shawn on September 6th, 2013

      See Michael Widner’s recent blog post ‘debating the methods in Matt Jockers’ Macroanalysis’ re the digital humanities reading group at Stanford

      https://digitalhumanities.stanford.edu/debating-methods-matt-jockerss-macroanalysis

      Comment by Jonathan McQuarrie on September 9th, 2013

      Is it possible to *very* briefly define what tokens are here, despite fact that they will be expanded on in chpt. 2?

      Comment by Ian Milligan on September 9th, 2013

      Great idea… it is a little confusing now so this would really help. Thanks Jonathan, much appreciated.

      Comment by Jim Clifford on September 14th, 2013

      The other thing to acknowledge here is that a large proportion of historical records are not digitized and likely will not be digitized for a long time to come. There is a danger that, as Ian demonstrated for Canadian newspapers, we as historians will adapt our research to study topics where the newspaper or court proceedings are online. This gets even more complicated with text mining, as we’re forced to work with the data that we can get our hands on and not the ideal dataset. The Economist would have been great for our project, but Gale never agreed to let us use it.

      Comment by Jim Clifford on September 14th, 2013

      I like this paragraph. Good balance.

      Comment by Ian Milligan on September 14th, 2013

      Good call, Jim – and that’s a really good applied example (and thanks for the plug on my own forthcoming work on newspaper citations). I think right now we are planning to flesh this out in Ch 2, but with your comment and on a closer re-read, I think comments along these lines belong here.

      Comment by Christina on March 27th, 2014

      Did any other type of historical research, besides quantitative, face criticism in the early stages of  computer technology?

      I think this is a very interesting fact to add at the end of the paragraph, but I guess I am confused about what type of criticism the historians face.

      Comment by Ian Milligan on March 28th, 2014

      Thanks for your feedback, Christina – a good question. Certainly when I think of computers and technology, I think of Cliometrics and the backlash (which we discuss elsewhere here). But qualitative.. hmm. Certainly criticism, but not because of computers, I don’t think.

      That being said, I’m picking up a methods book as I type this so I can explore your question.

      Comment by Amanda Seligman on March 29th, 2014

      I think it would be very useful to further explicate the comparison with the 1970s and 1980s and quantitative history.

      Comment by Amanda Seligman on March 29th, 2014

      One thing that might help this manuscript is concrete examples from working historians who have already used Big Data to make their arguments. Are there such?

      Comment by Amanda Seligman on March 29th, 2014

      I don’t think I’m persuaded by this argument.

      Comment by Amanda Seligman on March 29th, 2014

      The reason that I’m not persuaded is that I don’t see the “shadow” argument as fully responsive to the methodological questions raised in this project. How we know is as much about how we go about putting the past together as it is about the distance from the past. If you do push this, though, I would suggest drawing the macroscope argument back in.

      Comment by Cassidy on March 31st, 2014

      You stated that having more data is not a bad thing, but what about information overload? Do you think that future historians might find the accumulation of too much information to be a problem?

      Comment by Arlardy on April 3rd, 2014

      Although the digital methods may not merit a transformation in the foundation of history, could it not transform modern approaches to it, in fact hasn’t it already? I don’t know any history student that doesn’t use the cyber resources to at least begin their research of a specific topic?

       

      Thanks,

      Amelia

      Comment by Jessica Daase on April 3rd, 2014

      I agree that having more data is not a bad thing.  Working with big data is a tough task because much of the information could be useless to the research topic at hand; however you can also happen upon one paragraph or even one sentence that is a key point in the topic you are researching.

      Comment by Ian Milligan on April 4th, 2014

      Thanks, Cassidy!

      Good point.. we need to highlight that. I don’t think it’s a problem, per se, but it certainly means that we need new tools to deal with the problem of TMI.

      Comment by Ian Milligan on April 4th, 2014

      Well put!

      Comment by Amanda Workman on April 4th, 2014

       

      The point you make here is really important to a student like myself who is just beginning to study and pick apart the methodologies of digital history. It’s often easy for more inexperienced students of history (especially those born in the digital age) to separate the present and future of digital methods from those of the past because we were not an active part of that generation of historians or historical work. It is stated here that the historical tradition will not change  but do you think there is a chance that digital history will drastically change the teaching of history to students in the future?

      Comment by James Waller on April 4th, 2014

      Hello guys, I guess I would like to give you my opinion and ask yours. I am a nursing student at UWM I am only doing this because this is part of my grade. With that being said I am so glad I read this part of the book. I am not a history major or do I really think about history a lot but when I do or have in the past this section of the book is what I always pondered. Decisions have been made on what to document or record in history from the time we started recording it. So now what I think digital history really gives us is freedom of choice.  I can now look at youtube and do a search on benghazi and have 348000 results. The results are all from “historians” some are professionals other are just people my question is what source do you think is more reliable for people of the future to base their opinions on.

      Comment by David Baglien on May 16th, 2014

      You mention that prior to big data that isn’t a lot of history recorded. Whose to say that something like big data is permanent?  Shouldn’t data that is important not only be electronic but physical to ensure its longevity?

      […] believe she gives the appropriate answer when (drawing from The Historian’s Macroscope: Big Digital History) […]

  • Building the Historian's Toolkit (22 comments)

    • Comment by Christi Bose on March 26th, 2014

      I found this section really interesting, yet I’m not sure how exactly it would be useful to me as a college student.  I am far from tech savvy and would say I only have a vague and basic understanding of the digital world and the topics talked about in this book/section.

      Reading through the second half of this section on normalizing and tokenizing data, I realized I already do this, sometimes unconsciously, in order to find the theme or importance in an article and I can see how a more advanced version of this would be useful for historians digging through an enormous amount of information.  But as a college student with a basic understanding of technology, how could this programmed form of normalizing and tokenizing data be simplified and useful to me?

      Comment by Amanda Seligman on March 29th, 2014

      This story nicely captures the “big toolkit” problem. As I teach DH for the first time this semester, one of my nagging fears is about the tools that are out there that I don’t know to point my students to. Is there a canonical set of big data tools that you think all historians should have in their toolkits, and a lesser set that we should know about? And a set coming that we have to be alert for?

      Comment by Amanda Seligman on March 29th, 2014

      I’m curious about your identification of a list of names as a tool here. I would have thought that a list of names was a dataset.

      Comment by Amanda Seligman on March 29th, 2014

      This paragraph points to your small audience of DH converts. There will be lots of historians who might benefit from what you are teaching in this book who have never done this at all.

      Comment by Amanda Seligman on March 29th, 2014

      It strikes me that if you write about how the three of you got qualified to be both historians and to be able to write this book. What was the shape of your education and career. First person narratives of that sort might make your goals seem more achievable for us regular non-coding historians…

      Comment by Amanda Seligman on March 29th, 2014

      I’m not sure I accept the word “easiest” here.  (Un)conferences and summer camps might also be reasonable routes.

      Comment by Amanda Seligman on March 29th, 2014

      I’m just reading through in order, so this thought is more about what I haven’t seen so far than about this paragraph. What I see you doing here is arguing that historians need to grapple with big data. So far, I haven’t seen the nuance that says historians need to learn to grapple with big data in order to answer these specific (or kinds of) questions.

      Comment by Amanda Seligman on March 29th, 2014

      This paragraph needs a bit more to fully explain tokenization. I’ve read it a couple of times and still feel that I am missing something.

      Comment by Johanna on March 30th, 2014

      I am relatively new to this field and have been browsing this site for a class.  I chose this segment in particular because it allowed me to process through one of my fears in regards to the digital realm.  I fear that future generations will become lazy and less involved in information, the concept of digital tools making research easy really frightens me.  However, this particular raised a valid point, digital tools can be used as learning curves and can aid in the overall process of gathering–at sometimes painstakingly costs–information and applying it.  I appreciate the opportunity to process my internal dilemma and view the situation from a different perspective.

      Comment by Will Tchakirides on March 31st, 2014

      Gee, I wish I knew about the programminghistorian.org when I first tried experimenting with QGIS a few years back. Although, the site may not have been released yet. Part of what makes this website so attractive and, I think, usable is its sleek and simple design. I wonder to what extent minimalistic, yet responsive web design structures–like that of the programming historian–make it easier for digital novices to learn how these tools work.

      Comment by Eric Gajdostik on March 31st, 2014

      I like the way that this chapter assuages some of the fears that non-coding historians could have, and also embraces the collective goals of digital history by being collaborative. One thing I worry about as a non-coding historian has nothing to do with coding; indeed I am very interested in the mixing of these tech-niques.

      I worry that with the streamlining of big-data, historians will gain a more efficient way to sort out relevant data, but this could come at the cost of seeing how it fits with other phenomena. Historians will become increasingly specialized while losing their intimate connection with the subject matter, and the grander schemes; personal accounts become impersonal, and human events become less human.

      Comment by Ruth Jones on April 1st, 2014

      Hello,

      I am a student in Amanda Seligman’s Digital History Class.  I am trying to install Homebrew on my Mac, but the ruby command on line 14 fails.  On the Homebrew site the command string includes ‘install’ after /go.  When I attempted that, I received the following messages.

      Press RETURN to continue or any other key to abort
      ==> /usr/bin/sudo /bin/mkdir /usr/local

      WARNING: Improper use of the sudo command could lead to data loss or the deletion of important system files. Please double-check your typing when using sudo. Type “man sudo” for more information.

      To proceed, enter your password, or type Ctrl-C to abort.

      At this point I questioned whether a non-programmer should be running such commands.  Luckily, I have some background in Linux and understood how to read the man page for sudo and check that the director /usr/local did not exist.

      It was also quite a challenge to find the terminal command.

      While I agree that historians needs to become more tech savvy, I question whether such type of commands shoudl be included in this type of book.

      Comment by Beth Mudlaff on April 3rd, 2014

      I get that this may be beyond the scope of what you are intending in this chapter but as decisions are made about normalization and tokenization are issues of gender and ethnicity worth at least mentioning or acknowledging here? If fewer women are contributing to programming (and by many measures they are, AP high school testing, degrees conferred or other markers) does that shape the analysis? I will admit, being someone with a very limited programming background I don’t really understand and wonder about these things.

      Comment by Benjamin G on April 3rd, 2014

      Hey authors!

      I think what you all are doing is incredible and fascinating.  The digital history world is vast in the amount of sources both digitized and that are digitally born.  Just learning about the digital humanities in general has been almost overwhelming, luckily my professor has put together a great deal of information about this fascinating area of history.

      Anyways, I have a small question.  Is there any sort of a certificate or official recognition that is not a degree or masters program etc…?  I ask this actually in reference to the first Programming Historian in paragraph 23 when it was described as a kind of crash course and how Programming Historian 2 is described as teaching the user how to interact with big data in hours.  I also ask this for those who have nearly finished college and may not have the time/money/resources to continue on and become fluent in big data.

      I look forward to hopefully hearing from you all and diving further into your awesome ongoing project.

      -Thanks

      Ben

      For historians, individual research is one of the primary practices used for data gathering, and in this day and age the data growth (or amount of data that can be stored) seems to be growing closer and closer to an infinite value. In the Welcome section of the book, S. Graham, I. Milligan, & S. Weingart explain how technology needs to be reevaluated by many historians for the purpose of understanding its role in society and culturally, as well as its usefulness in academia. As historians, they must adapt and observe how the role of technology has a grasp on the attention of the large populations of people, and how they can change their mediums of teaching in order to create a cohesive learning experience that has an engaging effect.
      I found the WGET example, in the section of the Building the Historian’s Toolkit very interesting because of the nature of the program. Firstly the program is free and open sourced, which creates access for millions of teachers around the world. It’s open on many platforms, and functions as a data retrieval tool, which takes content from websites and gathers data, on a constant and swift pace. Furthermore, once this process is finished you can further refine the content by computational science and data analysis, which is fascinating.

      In earlier discussions in my class, there was the issue of people criticizing the Internet’s data as being “mostly” illegitimate, the fact that there are programs that operate for searches to become more productive as well as refined, begins to counter this argument, through a practical application. Being that this program has the ability to refine and retrieve, it brings up my question of other bi-products that might spring up from this use. Do you think that this data retrieval will lead to more exposure to young and upcoming academic professionals? All of their work can be distributed on a worldwide scale by the clink of an upload, do you think that tools like WGET and others similar have the capability to promote and speed up the process of knowledge spreading the academic work (if approved by filtering) for all people?

      Comment by Christian Wilhelm on April 3rd, 2014

      I appreciate the story regarding the importance of having a variety of tools available to handle any situation at hand. This is critical not only for the issue of big data, but practitioners in the IT industry as a whole. The tools you have discussed are open source, which is great for individuals without other resources or for an introduction to a particular concept. But many of these tools have a steep learning curve and lack the easy to use interfaces of commercial endeavors. Do you believe that the progression of this area will be stunted and slowed by a lack of access to the commercial tools developed for the business world? I think that this in addition to the lack of access to infrastructure with the digital horsepower needed to process and analyze such mind bogglingly large data sets are the two major barriers for most historians.

      Comment by Marcus Van Grinsven on April 3rd, 2014

      I am new to digital history, and am currently taking my first DH class.  The word “programming” has always been a little scary to me.  Perhaps it is because when I hear the word, I tend to think of hardcore geeks who write operating systems and other complex programs.  While I was reading this, I remembered that when I was in elemntary school back in the 1980′s our class actually did some simple programming, and it wasn’t that bad.  It seems that it has gotten easier to do more, as you say in paragraph 23, “the barriers have never been lower.”  The other fear I have of technology is the cost of tools, gadgets, and software.  For instance, I know people who have $500 smart phones, and $200/month bills for the service.  I am also a Geography major, specializing in Geographic Information Systems, so I know the software can get quite expensive.  In my DH class, and though reading this page I have been learing about open source technology, which is usually free and easy to use.  My experiences as a Digital History student, and reading your work in progress have eased many of my apprehensions about technology.

      Comment by Ian Milligan on April 4th, 2014

      Thanks Ruth – I’ll double check that command when I can hop on a computer that doesn’t have brew installed. And good cautionary note, we’ll think about directions to the terminal command and also maybe document sudo a little bit..

      Comment by Ian Milligan on April 4th, 2014

      This certainly belongs somewhere.. I’ll chat with the other editors. We make these sorts of assumptions all the time, and they can sometimes be built into the syntax of the code itself..

      Comment by Ian Milligan on April 4th, 2014

      Thanks for the feedback, Ben. I don’t know off hand if there are certificates in this field.. Programming Historian 2 doesn’t offer any, although the skills are very valuable. Perhaps some of the Coursera MOOCs have a certificate, but I’m not sure how received those are.

      You make a very good point here!

      […] The Historians Macroscope, ‘The Historian’s Toolkit’, http://www.themacroscope.org/?page_id=330; consulted on […]

      […] ‘Building the Historian’s Toolkit’, The Historian’s Macroscope: Big Digital History, http://www.themacroscope.org/?page_id=330; consulted 29 April […]

  • Networks in Historical Research (21 comments)

    • Comment by Clement Levallois on September 16th, 2013

      Ref?

      Comment by Scott Weingart on September 16th, 2013

      Whoops, good catch. Will leave this here as a placeholder: Ben Fry, Visualizing Data, 2008.

      Comment by Scott Weingart on September 16th, 2013

      We should stick a more recent example here that includes a very large network viz of thousands / hundreds of thousands, to show that there are (limited) applications where those visualizations and analyses can also lead to insight.

      Comment by Clement Levallois on September 16th, 2013

      An interesting ref in intellectual history:

      http://www.amazon.com/The-Sociology-Philosophies-Global-Intellectual/dp/0674001877

      I missed a bit a historical, contextualized view explaining *why* network analysis developed (or not that much) in history as a discipline. I imagine that other types of quantitative mehtodologies in history (cliometrics, les Annales…) acted as models, references or foils to the adoption of network analysis.

      The developments in other social and natural sciences might have played a role too. Wider cultural phenomenon played a role too (Facebook and Internet are key, the generalization of computational methods across life and social sciences since the 1980s might also worth mentioning.)

      Another factor to consider is funding and applications: as for previous experiences, it would be interesting to look at how military / strategic / business uses cases fostered the development of network research in all disciplines, not just the ones you’d expect to be directly associated with these areas.

      Oh, and visualization. Not sure I fancy the “careful, use with care” way to present it. There is much to be written and elucidated on why visualization developed so much in recent years – not just with networks. The multiplication of screen interfaces must have played a role, as well as the fact that science is expected to reach out faster, to wider audiences (internal and external) : visual forms of expressions can help in that. What happened with neuroimaging offers interesting parallels: brain scans are seen alternatively as frontier science / and spoof evidence (see eg, “Picturing Personhood” by Dumit and “The space inside the skull” by Beaulieu for nuanced analyses).

      Anyway, looking forward to the next pages!

      Comment by Scott Weingart on September 16th, 2013

      Good points Clement. You’re right that we’re missing a huge portion here; I was thinking it would be covered in earlier chapters, but now that you bring it up in this section, I definitely think we should dedicate more time to the history of networks specifically and its interplay with various quantitative historical turns over the years. You also bring up a great point regarding the modern web; I’ve long thought the reason the republic of letters research is becoming so popular is because people connect it with the internet.

      Expect some expanded sections, and perhaps a bit more nuance in the discussion of visualizations as well (although I think that may be beyond the scope of the book, lest the published version become too large…) Many thanks!

      Comment by peter Holdsworth on September 16th, 2013

      While historical sociologists and social scientists often use historical topic for network methods, these works are based more from the views of facts and knowledge construction of science and social science and less the varaible nature of the humanities that many historians  use.  Many of these works are orginized scientifically  through precess and method and  rely less on documentary or historical knowledge  than if formal network analysis could be approached with historical training as the lens.  Padget and Ansell is references in paragraph 13 but possibly a reference to the strengh of the network from its data to that work could help, and explained  a few paragraphs later given the reliance on the data for that work.

      Comment by peter Holdsworth on September 16th, 2013

      Where does the role of earlier prosopography fit in this, especially with geneological work.  A network that uses names in historical social network analysis could possibly be seen as following in the trend of prosopography not formal social network analysis.

      Comment by Scott Weingart on September 16th, 2013

      We were purposefully eschewing titles such as SNA/Network Science/Graph Theory/Citation Analysis/Prosop/Etc, because we felt the focus on similar method was more important than diverging tradition, from the perspective of the book. That said, I don’t think we touch on prosopography enough earlier on, and your comment has inspired me to remedy that.

      Comment by Scott Weingart on September 16th, 2013

      Good point, thanks. We’ll be sure to add a bit about the rich source of the Florentine data, as well as the different approaches between social scientists and historians. That said, I’ve honestly seen more historically dynamic network approaches from social scientists than I have from historians, who are still learning how to add that dimension to network analysis. We’ll have to add a more nuanced discussion to that effect.

      Comment by Zack Batist on September 23rd, 2013

      This study takes on a much broader geographical scale than the other studies mentioned here, and I think that this is an opportunity to point out some issues relating to scale. Some questions to consider might include:

      Can the data being used provide reliable insight concerning larger-scale processes? What geographical constraints must be considered when interpreting large-scale networks? How are geographical and relational distance comparable?

      Comment by Marten on September 26th, 2013

      I would add that “network thinking”, i.e. the observation that relations between people matter and have profound effects is universal.

      In my mind our generic skill to make sense of social ties in groups is another essential starting point for network analysis – even before the concepts were used in sociology. That said, you could mention Durkheim and Simmel as well – possibly in the overall “Networks” chapter.

      For a first overview take a look at Sebastian Gießmann’s talk at our conference:

      Sebastian Giessmann (Germany):
      Network Paradigms: From Textile Objects to Complex Networks

      He recently published his PhD thesis on this, German only unfortunately.

      For me, the extraction and analysis of network data is actually closer to this common place metaphorical meaning than the mathematical one. Albeit I make use of their tools of course.

       

       

      Comment by Marten on September 26th, 2013

      I agree with both of you. Social scientists dont’t care to much about writing history and historians don’t yet have the skill by and large.

       

      The last 2 sentences are important, where they stand now they might be easily overlooked, maybe move to a more prominent place.

      Comment by Marten on September 26th, 2013

      ..why not the other way round: essence first, case study second?

      Comment by Marten on September 26th, 2013

      It might be worthwhile to include Claire Lemercier’s work in this context as well

      Comment by Marten on September 26th, 2013

      Overall, I miss a bracket to these paragraphs, maybe an outline in the beginning of the chapter can provide this. Alternatively seperate the overview from the insights and label them accordingly

      Comment by Marten on September 26th, 2013

      I miss references to social and quantitative history, in my mind 2 crucial roots of what we try to do today. Also, the criticisms have not changed since the 19cent. Droysens already brushes off statistics..

      Other thoughts:

      . thanks to the reference to my site, there is also http://reshist.hypotheses.org/

      . overall there is an emphasis on history of science – economic history deserves to have equal representation I think

      . this might be too early for this but my colleagues Uli Eumann, Linda v Keyserlingk, Martin Skoeris and myself work with a combination of hermeneutics, qualitative data analysis and SNA to extract relations from texts – afaik a new approach, clsoer to trad. history. Others say microhistory. A few texts in English but not much but maybe worth a mention

      .  I am also a fan of the idea to combine the reflected use of network theory and trad history methods. Irad Malkin’s book “A small greek world” comes to mind as well Wolfgang Reinhardt’s early works, again, German only sorry

      . What about Bearman’s paper where he treats historical events as nodes and causal links as ties? Not sure if this ever took off but I love the concept

       

      Finally: I enjoyed reading it and I think this is a great initative and very cool project. Looking forward to seeing this evolve into a pillar of Digital History – keep up the great work!

       

      Comment by Caleb McDaniel on September 30th, 2013

      I was scanning chapter quickly, but is this the first time a distinction is clearly made between a network and “its visualization”? The heavy use of graphs earlier in the section may lead readers to conclude a network just is its graph.

      The statement that graphs are little understood and often overused also raised for me the question of what means (other than visualizations) historians can use to analyze networks.

      Again, apologies if I’ve missed earlier comments on these issues.

      Comment by Scott Weingart on September 30th, 2013

      That’s a great point, Caleb. We purposefully did not differentiate between the two in this introductory section because we felt like it would bog the “what’s interesting about historical networks” examples down with details, but since the absence is clearly notable, we’ll re-evaluate that decision. And the noted lack of other-means-to-analyze-networks was definitely unintended, we’ll remedy that in the next pass. Thanks!

      Comment by Amanda Seligman on March 31st, 2014

      This is a very helpful example.

      Comment by Amanda Seligman on March 31st, 2014

      I like this quotation.

      Comment by martemya on January 30th, 2015

      Nice quotation, thanks

  • Welcome! (19 comments)

    • Comment by Scott Weingart on September 5th, 2013

      Did you intend for the entire section to be italicized, or just the book title?

      Comment by Shawn on September 5th, 2013

      Just the book title.

      Comment by Amanda Seligman on January 17th, 2014

      Greetings! I wanted to let you know that I am putting this project into my graduate-level Digital History syllabus for spring 2014. My students will be assigned to read it for our class meeting on April 1!

      Comment by Shawn on January 18th, 2014

      That’s fantastic! Thank you Amanda. By April 1st, we hope to have the manuscript completed (fingers crossed!). If any of your students wanted to look and comment on any part of it in the next few weeks, we’d really appreciate the feedback.

      Comment by Amanda Seligman on January 19th, 2014

      Shawn, right now I have it scheduled for later in the semester–after April 1, I think. Can you still use the feedback after April 1? Will the comment feature still be open? I had pedagogical reasons for putting the Big Data section later in the semester, but it’s possible that the spirit of DH can persuade me to move the schedule around. One of the virtues of this assignment from my point of view is that it permits students to experience reading a project in progress.

      Incidentally, I’m also assigning your essay on the Wikiblitz.

      Comment by Shawn on January 20th, 2014

      Hi Amanda,
      No need to change your plans! We’ll keep the comments open and we’ll respond to anything your students post.

      Comment by Rebecca Greer on March 28th, 2014

      I like what you authors are doing.  I like the fact that you authors have a view on the things people need in order to participate in the world of big digital history. I see from the previous comments that you guys hope to have a manuscript done by April 1st may I ask how long was this process of putting this book together?  After reading this book as a college student it made me think about some things twice.  For instance deciding on a software package, I never thought that it made a difference on which software package to choose from.  I had no clue it could dictate  what data plan I’ll be collecting and how I plan on collecting it.

      Comment by Jasmine Alinder on March 28th, 2014

      Hi Shawn, I’m a colleague of Amanda Seligman’s at Univ of Wisconsin Milwaukee. I am teaching an undergraduate course on digital history this semester and have asked my students to post comments on your book over the next few days. Please let me know if there is something I should do to facilitate that process. Best, Jasmine Alinder

      Comment by Shawn on March 31st, 2014

      Hi Jasmine,

      Fantastic! If you could let your students know that we’re really interested to hear about the ‘gaps’ – the things that we’ve left implicit that they’d really like to know explicitly, that’d be most helpful. That’s always a big problem with writing about anything technical – we’ve looked at the forest for so long, we can’t see the trees anymore.

      Thank you for playing along!

       

       

      Comment by Shawn on March 31st, 2014

      Hi Rebecca,

      Thank you for the note. We were first approached about this volume a year ago March. Could’ve been February. We put together a proposal, which was peer reviewed anonymously by folks selected by the publisher. We then revised the proposal to take into account those comments, and we had a contract signed in about August. This website went up in early September. We currently have a draft of around 80 000 words that we are tightening up and intend to now submit before July 1st. So from start to finish, our writing process (once everything was signed and agreed upon) will have taken us 1 year: which is ridiculously quick, as these things go! But on the other hand, given the nature with which tools appear, disappear, and so on, we felt we needed to be fast. Writing this in public too has helped us pivot so that we can be as timely as possible.

      Comment by Zakea on April 3rd, 2014

      This will be great!

      Comment by Jasmine on April 3rd, 2014

      Let me start by saying that prior to taking “Digital Humanities” I had never really thought about what goes on inside the Internet, or how much data is out there. It’s all very mind-boggling and I have a hard time wrapping my mind around many of these concepts and terminology related to the inner workings of technology. Often times I find myself reading and re- reading the same articles, as if by memorizing them I can force myself to understand a concept that is seemingly written in a foreign language. However, when I read your article “How big, is big data?” I found myself not only reading, but also understanding. Reading this article gave me at least a starting point, explaining to me the truly impressive size of data- something that made sense to me and didn’t make my head spin! In reading this article I even found myself nodding along at times, identifying all this work and no proper direction to take it in- I’ve been there. It’s important, as an expert and as someone who is passionate about what they do, that they make their subject matter understandable to more than just one specific group of people. As I mentioned previously, I’ve read many articles written in a computational language that, as someone with no prior background in technology or digital history- I have a hard time understanding or being able to relate to on a personal level. Reading your work I didn’t have that challenge- which leads me to my point -which is that to bring awareness to your cause or your goal, it is important that the information which you put out to the world, can be understood by more than just digital historians or very technologically savvy people- and you accomplished that! Writing and explaining things in such a way that everyone can understand will bring more readers and more interest than simply going over their heads, which leads me to my question; What are digital historians and those well versed on the subject matter of big data, doing to raise awareness to this cause without alienating those less learned on the subject? What are you, as authors, doing to help others like myself understand your cause and want to learn more about it?

      Comment by Gerard Lewis on August 21st, 2014

      Understanding the past is thought to help us prepare for the future; from  macroscope to the  Epediascope. Fascinating field and approach. I will enjoy being a fly on the wall of your lab.

      Comment by Brittany Reichel on September 2nd, 2014

      Hello! I am a student at Boise State University and I am taking a Digital Humanities course this semester. My first assignment is to find a digital humanist and interview them! I was wondering if Milligan or Weingart would be interested or available to do that? Please let me know I would really like to know more about what it is a digital humanist does and how it benefits the humanities.

      Thank you,

      Brittany Reichel

      Comment by Brittany Reichel on September 2nd, 2014

      Hello!

      My name is Brittany Reichel and I am a student at Boise State University. I am taking a Digital Humanities course this semester and my first assignment is to interview a Digital Humanist. I was wondering if Milligan or Weingart would be available for an interview. I would very much like to know what it is a Digital Humanist does and how it benefits the Humanities. My email address is reichelbrittany@gmail.com. The website for my class is: http://digitally.doinghistory.com/. Please let me know what your think.

      Thank you,

      Brittany Reichel

      Comment by Shawn on September 3rd, 2014

      Hi Brittany,

      I’ll ping Ian and Scott today, letting them know your request.

      Cheers,
      Shawn

      Comment by Shawn on September 9th, 2014

      Hi Britanny, I passed your note on to Ian. Best, Shawn

      Comment by Amanda Seligman on April 11th, 2015

      Sorry to hear you lost the 8000 Canadians. For a Big Data novice like me, that was the most accessible part. I just used it with my undergraduates a couple weeks ago.

      Thanks for turning the comments back on for me!

      Comment by Shawn on April 11th, 2015

      Hi Amanda,

      It will live on, on our v2.0 of this site – the companion site. All our code snippets, things we liked but had to cut from the ‘official’ version etc will still be accessible.

  • 8,000 Canadians (19 comments)

    • Comment by Ian Milligan on September 6th, 2013

      Could we flesh this step out just a little bit more? Might leave readers behind.

      Comment by Ian Milligan on September 6th, 2013

      Will have to find a good OS X equivalent for N++…

      Comment by Ian Milligan on September 6th, 2013

      Might want to divide into two paragraphs, perhaps on ‘this network diagram’

      Comment by Ian Milligan on September 6th, 2013

      ‘of’ or ‘or’ in list of positions?

      Comment by Ian Milligan on September 6th, 2013

      Just a random comment: as a Canadianist, I find this so fascinating!

      Comment by Shawn on September 6th, 2013

      And perhaps a sidebar that explains this process in more detail.

      Comment by Shawn on September 6th, 2013

      “Fitted a topic model by running the R topic modeling script, using Mimno’s Mallet package, available on the Macroscope Github page and discussed in the previous section. We ran it several times, trying to settle on the number of topics that seemed to balance human intelligibility against machine efficacy. The choice of 30 topics reflects nothing more than a number that seemed to provide results that made sense. The problem of how many topics to run, and what seemingly sensible topics might conceal, is one discussed by Ben Schmidt in ‘Words Alone: Dismantling Topic Models’ in The Journal of Digital Humanities 2.1 Winter 2012  http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/“.

      Comment by Ian Milligan on September 6th, 2013

      Should we have a link to the interactive right at the top - http://themacroscope.org/interactive/dcbnet/ – it’s a beautiful visualization and really helps make clear (IMHO) what’s going to follow.

      Comment by Ian Milligan on September 6th, 2013

      Perfect!

      Comment by Shawn on September 6th, 2013

      See what people think.

      Comment by Shawn on September 6th, 2013

      Of.

      Comment by Shawn on September 6th, 2013

      This paragraph will tie in well with issues and examples raised in our discussion of social networks analysis, too.

      Comment by Scott Weingart on September 11th, 2013

      TextWrangler, maybe?

      Comment by Ian Milligan on September 11th, 2013

      Can give this a try in TextWrangler, looks like it’s relatively simple enough? http://bcdwp.web.tamhsc.edu/webmaster/2011/08/12/how-to-strip-html-code-using-grep-in-text-wrangler/

      Comment by Amanda Seligman on March 31st, 2014

      Unlike the other images in this manuscript, the visualization here can’t be clicked to enlarge. Experiencing this problem with both Chrome and Firefox.

      Comment by Amanda Seligman on March 31st, 2014

      As an encyclopedia editor, I read this section with special interest. In developing our TOC for the Encyclopedia of Milwaukee, we did something that sounds like this in advance, deciding on the “rubrics” or big categories we are using as subject areas. Then an editorial board elaborated each rubric into a table of contents for that area. With a project like a multi-year biographical dictionary that is being done alphabetically, I wonder what the internal process for deciding on categories looks like, and to what extent the categories changed over time.

      Comment by Amanda Seligman on March 31st, 2014

      I agree that this is interesting. I wonder how they generated the list for inclusion…

      Comment by Amanda Seligman on March 31st, 2014

      Ah, this is contrary to what I imagined the structure was. I assumed that they had alphabetical volumes, which is what I think the ANB has. It makes it much less likely they are working with a master list. Again, though, I wonder about the process for identification of subjects to be treated. If this were the US context, would the midwife that Laurel Thatcher Ulrich wrote about be included, because she has a full-length biography, even given her lack of notoriety?

      I suspect these questions aren’t relevant for what you are trying to do, but this interests me.

      Comment by Amanda Seligman on March 31st, 2014

      This paragraph makes a short but important point.

  • Putting Big Data to Good Use: An Overview (12 comments)

    • Comment by Scott Weingart on September 16th, 2013

      We should link to Underwood’s post ( http://tedunderwood.com/2013/02/20/wordcounts-are-amazing/ ) or something similar for “One often unspoken tenant of digital history is that very simple methods can produce incredibly compelling results, and the Google Ngrams tool exemplifies this idea.”

      Comment by Ian Milligan on September 16th, 2013

      Good point – I keep coming back mentally to that post whenever I’m counting words.

      Comment by Amanda Seligman on March 29th, 2014

      Do you guys know this video about a big data thesis, put together by Jorge Cham of PhDComics?

      http://phdcomics.com/comics/archive.php?comicid=1628

      The references to the Old Bailey keep bringing it to mind.

      Comment by Amanda Seligman on March 29th, 2014

      The role of collaboration, as praised in this paragraph, seems to be in tension with another idea in this manuscript–that the barriers of access to DH have been lowered enough that historians can get their feet wet without too much difficulty, an approach that suggests they might be going it alone much of the time.

      Comment by Amanda Seligman on March 29th, 2014

      seventeenth misspelled

      Comment by Amanda Seligman on March 29th, 2014

      You want “tenet” not “tenant.”

      Comment by Amanda Seligman on March 29th, 2014

      Thank you for that line about “the hubris around Culturomics.” It did indeed rankle me when I heard them present on it.

      Comment by Elizabeth Y on April 3rd, 2014

      I find it fascinating that not only were they able to do a keyword search, but also match up similar words. I’m wondering if it was sophisticated enough to match up, say, foot  – as in a measurement – versus foot – body part?

      The amount of data available in this one example is truly remarkable.

      Comment by Elizabeth Y on April 3rd, 2014

      I find it fascinating that not only were they able to do a keyword search, but also match up similar words. I’m wondering if it was sophisticated enough to match up, say, foot  – as in a measurement – versus foot – body part? The amount of data available in this one example is truly remarkable.

      Comment by Sharon Howard on December 22nd, 2014

      A brief, slightly pedantic historical point – the judges presiding over Old Bailey trials weren’t magistrates but professional elite judges.

      Comment by Shawn on March 18th, 2015

      ah. thank you!

      […] ‘Putting Big Data to Good Use: An Overview’, The Historian’s Macroscope: Big Digital History, http://www.themacroscope.org/?page_id=246; consulted 29 April […]

  • The Joys of Abundance: The Era of Big Data (12 comments)

    • Comment by Amanda Seligman on March 29th, 2014

      The formulation of the definition of big data was snappier in the proposal.

      Comment by Amanda Seligman on March 29th, 2014

      Add this to your list of fluencies? Have to learn to read non-native languages?

      We debate the merits of the language requirement over and over again in my department’s doctoral program. Or did you mean that the language requirement is unshakable?

      Comment by Amanda Seligman on March 29th, 2014

      An idea worth addressing somewhere (maybe in your epilogue?) is how developing these skills fits into the curriculum for graduate education in history. There’s certainly a case to be made in this era for graduate students developing transferrable skills, but what are the trade offs?

      Comment by Ian Milligan on March 31st, 2014

      Thanks, Amanda – I think you’re right after a re-read and have reincorporating some of that punchier language.

      Comment by Tony Hugill on April 3rd, 2014

      I can appreciate the magnitude of the project you are trying to take and don’t take it for granted that three of you got together and collaborated to get this up and running.

      Already, I can see how this is superior to a general weblog post (even scholarly blogs) with a generic comments section at the bottom.

      Here, comment tabs next to each paragraph allow us to chime in precisely where we want to. I also like the fact that the comment “bubbles” show the quantity, so you can hone in on which paragraphs generate more buzz than others.

      This section in particular drew my attention because we have spent the first half of the semester dreading how much information is out there — gated or otherwise. Having a format like this almost does make having “TMI” a good thing.

      Comment by Danielle Alvaro on April 3rd, 2014

      I like that you go in depth about how the term “big data” varies upon perspectives. It’s true that dependent upon ones stance on a topic or in the historical world of research one may feel entirely different upon the term.  For example, as a college student, “big data” could be anything.  Such as, entering a search in a database and recieving 37,000 different sources to use.  Going through that much data is entirely too much for any one person.  Though, there is something to be said about when searching for “scholarly articles”, that giant number cuts to at least half.  This goes to show how much quanitative data over qualatative there is. Is it valid to compare that of a college student?

      Comment by Zakea Jones on April 3rd, 2014

      What are some of the dangers of information overload that is of concern, if any? What are some of the solutions that are being tossed around that can possibly solve this problem?

      Comment by Charles Mehlberg on April 3rd, 2014

      I agree with you in the fact that historians must be open to the digital turn, however, I personally believe that all historians will have to become fluent with data publishing and analysis at one point or another in the distant future. Through all of the disciplines of historical analysis, all will have to become fluent in data dissemination to publish their work and findings. I believe there will come a point in time when the internet and the “digital” will be the only reliable and practical medium to disseminate their findings. I think that the only question is, when will historians be required to become fluent and embrace data to make their findings known?

      Comment by Jasmine on April 3rd, 2014

      Let me start by saying that prior to taking “Digital Humanities” I had never really thought about what goes on inside the Internet, or how much data is out there. It’s all very mind-boggling and I have a hard time wrapping my mind around many of these concepts and terminology related to the inner workings of technology. Often times I find myself reading and re- reading the same articles, as if by memorizing them I can force myself to understand a concept that is seemingly written in a foreign language. However, when I read your article “How big, is big data?” I found myself not only reading, but also understanding. Reading this article gave me at least a starting point, explaining to me the truly impressive size of data- something that made sense to me and didn’t make my head spin! In reading this article I even found myself nodding along at times, identifying all this work and no proper direction to take it in- I’ve been there. It’s important, as an expert and as someone who is passionate about what they do, that they make their subject matter understandable to more than just one specific group of people. As I mentioned previously, I’ve read many articles written in a computational language that, as someone with no prior background in technology or digital history- I have a hard time understanding or being able to relate to on a personal level. Reading your work I didn’t have that challenge- which leads me to my point -which is that to bring awareness to your cause or your goal, it is important that the information which you put out to the world, can be understood by more than just digital historians or very technologically savvy people- and you accomplished that! Writing and explaining things in such a way that everyone can understand will bring more readers and more interest than simply going over their heads, which leads me to my question; What are digital historians and those well versed on the subject matter of big data, doing to raise awareness to this cause without alienating those less learned on the subject? What are you, as authors, doing to help others like myself understand your cause and want to learn more about it?

      Comment by Dan Shore on April 28th, 2014

      I like your definition of “big data” as “information that requires computational intervention to make new sense of it” because it’s a clear threshold clearly stated.  At the same time, I doubt whether it’s true.  Is it really the case that the only way to make sense of more than one can read is through computation (which I take to mean counting followed by some sort of calculation)?  What about searching and sorting?  What about pattern matching for purposes other than counting?  What about, for example, constructing a network (an activity that may allow quantitative forms of network analysis, but is not itself  quantitative, since it is a matter of establishing nodes and edges, things and their relations)?  Computation, in other words, doesn’t begin to exhaust the ways to make new sense of big data.

      Comment by Shawn on June 24th, 2014

      Hi Dan,
      Thanks for the note. I’d argue that all of those things are still computation.

      Comment by Amanda Seligman on April 11th, 2015

      Persuaded that historians will have to become fluent enough to read scholarship that uses Big Data. Not convinced that we will all need to develop the tools to do that research ourselves.

  • Basic Text Mining: Word Clouds, their Limitations, and Moving Beyond Them (11 comments)

  • Basic Concepts & Network Varieties (9 comments)

    • Comment by Amanda Seligman on March 31st, 2014

      I didn’t realize there was a term for people who studied citations…

      Comment by Amanda Seligman on March 31st, 2014

      Someone will undoubtedly tell you not to start a section with a negative, but this warning is a very helpful one.

      Comment by Amanda Seligman on March 31st, 2014

      This is a helpful example.

      Comment by Amanda Seligman on March 31st, 2014

      This paragraph is begging for someone to say “all roads lead to Rome.”

      Comment by Amanda Seligman on March 31st, 2014

      These three succinct paragraphs are very good.

      Comment by Amanda Seligman on March 31st, 2014

      Not just collecting data, but the process of preserving primary sources in the first place.

      Comment by Scott Weingart on March 31st, 2014

      Thanks for the comment, Amanda. We were trying to figure out the best way to tell a reader some of the things to try to avoid in network analysis, but we can try to integrate it a bit better.

      Comment by Scott Weingart on March 31st, 2014

      Or maybe all roads lead away from it!

      […] this makes no sense, read my earlier Networks Demystified posts (the first two posts), or the our Historian’s Macroscope chapter, for a primer on networks. If it does make sense, excellent! The rest of this post will hopefully […]

  • Principles of Information Visualization (7 comments)

    • Comment by Ian Milligan on April 28th, 2014

      +1000 – as a colour blind person myself, I’m so happy that we’ve got this in the book. :)

      Comment by Angela Z. on May 28th, 2014

      I’d argue that this isn’t the strongest example of a histogram.  True, the categories are ordered and probably shouldn’t be shifted around, but I’ve always understood “histogram” to refer to charts that use bars to represent bins along the range of a numerical variable, not just an ordered categorical variable.  The typical test for me is, does it make sense to have spaces between the bars or not?  For this grade chart, a space between A and B would be (arguably) fine as a stylistic choice, because there isn’t a numerical continuity on the x axis.  If the bars represented bins like “100-90.0″, “89.999-80″, etc., they should be touching because there is (practically) no space between them.

      I’m probably following Naomi Robbins’ definition.  http://www.forbes.com/sites/naomirobbins/2012/01/04/a-histogram-is-not-a-bar-chart/

      Comment by Angela Z. on May 28th, 2014

      Actually, using dots instead of bars even when you have one categorical axis is not uncommon, especially when you have data where a full numerical axis (going to 0) would obscure small differences.  Leland Wilkinson uses this technique a lot in The Grammar of Graphics, I believe.

      Comment by Jim Clifford on December 4th, 2014

      I’m not sure what you’re trying to say here. Maps are representations or models of geography and they are loaded with cultural meaning and errors related to flattening a sphere. Projections are very complicated and we make choices when we map latitude and longitude with different projections. Webmaps generally use WGS1984 which distorts the size of Africa and Greenland to look like they are almost the same size. Each projection has strengths and weaknesses and when we make maps we need to choose which aspect of a map we want to be more accurate (for example do we want the area to be more correct or distance between places). You also need to consider what a lat/long represents? The center of a city or the center of a country is useful for somethings, but useless for others. Polygons are generally better, but they are hard to find for historical boundaries. Cartography is as much an art as it is a science.

      Comment by Amanda Seligman on April 12th, 2015

      I can’t tell you how useful it is to discuss historians’ research and thinking methods explicitly.

      Comment by Amanda Seligman on April 12th, 2015

      This is a very helpful caveat.

  • Breaking a CSV file into separate txt files (6 comments)

    • Comment by Ben Mawick on February 28th, 2014

      I think this might not be the best URL, it’s not a csv file but a link to the github repo page of the CSV file… probably the URL to offer readers is this one, the raw text, ready to work with on their computers:

      https://raw.github.com/shawngraham/hmbook-data-examples/master/johnadams-diary.csv

      My guess is that most users will expect this URL to be the file, and click ‘save as’ and be expecting a CSV file. Currently you link to a github page and the user would have to know to download the raw there, you might lose a few people at that point…

      Comment by Ben Mawick on February 28th, 2014

      And here’s an R function to do step 2:

      https://gist.github.com/benmarwick/9266072

      Feedback welcome! Not sure I’ve accounted for all the possibilities…

      I see there’s a bit of a mix of excel files and CSV files in this section. We can write an R function to handle the excel files too, if they’re more commonly used in this context…

       

      Comment by Shawn on February 28th, 2014

      Hi Ben – this is fantastic, thank you! Being able to handle excel files too would probably be handy.

      Comment by Shawn on February 28th, 2014

      Hi Ben – good point. I’ll change that url accordingly.

      Comment by Ben Marwick on February 28th, 2014

      Here’s the excel equivalent: https://gist.github.com/benmarwick/9278490 These R scripts have the advantage of being cross-platform (unlike the bat file mentioned in theh other section). In this excel script, there’s also some flexibility about choosing which column is used as for the file names. I’ve added that to the csv script also, since I’ve been testing with the John Adams csv file which has the dates in the second column. In fact they probably need a bit of testing on typical use-cases before I get too carried away…  if you have any other typical files I’d be happy to test them.

      […] you’re on a PC, the instructions we posted here: http://www.themacroscope.org/?page_id=418 work. It’s a macro, in visual basic, for excel. But after a long back and forth yesterday […]

  • Intro to Some Key DH Terms (6 comments)

    • Comment by Amanda Seligman on March 29th, 2014

      This section seems to me to cry out for examples of existing DH projects that use these tools.

      Comment by Amanda Seligman on March 29th, 2014

      In my DH project, we are paying campus IT experts to design and implement the website. For us, the start-up time required to come up to speed is too great in light of the other demands of project management.

      Comment by Amanda Seligman on March 29th, 2014

      Nice point.

      Comment by Amanda Seligman on March 29th, 2014

      Having reached the end of this section, I again feel that examples of these in DH practice would be very helpful. A reader is more likely to retain the points if they are illustrated with examples of what you can do with them.

      Comment by Amanda Seligman on April 12th, 2015

      Does this mean that you think that scholars should pay for open-access publishing out of their own pockets rather than from institutional or grant sources?

      Comment by Shawn on April 12th, 2015

      I’m not sure what the solution is, but I’m not convinced that author-pays is the way forward. At Carleton U, our library has invested in open journal systems to support OA publishing http://www.library.carleton.ca/services/open-journal-hosting

      …I like this model.

  • Original Proposed Chapters (6 comments)

    • Comment by Amanda Seligman on March 29th, 2014

      I like this definition! Of course, what counts as a “reasonable amount of time” can vary. When I was in my 20s in graduate school, it seemed perfectly reasonable to while away many hours in the archives without counting the days or the costs or the opportunity costs. Now that I’m in my 40s and have a job and a family, what seems reasonable is quite different. The computational intervention perspective is key here as well.

      Comment by Amanda Seligman on March 29th, 2014

      The “renewed” importance of librarians and archivists? I’d like to know when they were less important.

      Comment by Amanda Seligman on March 29th, 2014

      This paragraph raises the question about the accessibility of this book, which is something that I’ve been wondering about as I read the proposal. Who is the audience? Mostly professional historians? Do you mean chapter 3 to be a pull-out article that students can make sense of as well?

      Comment by Nieya Dudley on April 13th, 2014

      I thought this section was very interested. I agreed that before you can compose a digital tool box that  historians would be able to utilize we should be aware what do that mean for every body.  One might assume big is large amount of research you receive looking through a library while people in this day in age would agree that might be small and big is in fact all the information that is now at our finger tips through technology

      Comment by Amanda Seligman on April 11th, 2015

      A year later I read the “renewed” comment as reflecting on how successfully librarians and archivists are adapting to the demands of research in the digital era.

      Comment by Shawn on April 11th, 2015

      Ack! I haven’t looked at this page in a long time I’m afraid, and can see how ‘renewed’ might be faint praise… I can’t emphasize enough the critical importance of librarians in things digital! Very first thing I did when I started at Carleton in 2010 was map out who was working on what in things digital at CU. I did a network analysis and the keystone, the pivot point, the crux of all our projects, across departments, was our university archivist & librarian, Patti Harper.

      That could be a useful exercise for grad students – map out the landscape of digital folks, network analysis to figure out the key players at a given institution. Dollars to donuts: the library!

  • Topic Modeling with R (5 comments)

    • Comment by William Denton on September 9th, 2013

      Do you mean RStudio? It sits on top of R and has a GUI etc. Great tool.

      Comment by Shawn on September 9th, 2013

      Hi William – no, just the basic R environment. But looking at RStudio http://www.rstudio.com/ide/screenshots/ I can see where that’d be a very useful thing indeed. I’ll start exploring!

      Comment by Scott Weingart on September 11th, 2013

      I second William’s comment - http://doingbayesiandataanalysis.blogspot.com/2012/01/complete-steps-for-installing-software.html Kruschke has some good, simple comments for installing R that we can take inspiration from, particularly some of the tips below.

      Comment by Ben Marwick on October 8th, 2013

      Following William’s comment, I think this paragraph needs to direct the reader to download and install RStudio after they’ve got R. Especially since you mention RStudio in paragraphs 4 and 7, currently the reader might be a bit confused with no prior mention of RStudio…

      Comment by Shawn on October 8th, 2013

      Hi Ben,
      You and William are quite right. Version two of this section will be much clearer about installing and using RStudio.
      Thanks!

  • Original Proposal (4 comments)

    • Comment by Zakea Jones on April 3rd, 2014

      This is an awesome way to build a community format, what better way than to open up the floor for peer review before a piece is published. I completely like the idea that anyone can pose a question or leave comments to the authors for feedback or review on their actual work in progress. This has definitely confirmed to me that we are in the digital age and historians and scholars should be grateful for these tools.

      Comment by Zakea on April 3rd, 2014

      Original Proposal

      Comment by Amanda Seligman on April 11th, 2015

      I have the sense, on my second pass through this manuscript, that those of us who are big data history novices may still need a more basic introduction to what Big Data is and what its potentials are.

      Comment by Shawn on April 11th, 2015

      Here I think Guldi & Armitage’s history manifesto would make a good paired reading.

  • Networks in Practice (4 comments)

    • Comment by Amanda Seligman on March 31st, 2014

      What’s the logic for the order of these packages? It’s discouraging to start out with two that you don’t recommend for historians.

      Comment by Amanda Seligman on March 31st, 2014

      This begs for more explanation.

      Comment by Amanda Seligman on March 31st, 2014

      Yes, it’s true that until I read this section, tree layouts represented networks to me.

      Comment by Scott Weingart on March 31st, 2014

      That’s a great point, we’ll re-organize it in the next run through.

  • Topic Modeling with the Stanford TMT (4 comments)

    • Comment by Diana Moreno on January 22nd, 2014

      What does the number by the topic (ex. Topic 29 391.695…) means?

      Comment by Shawn on January 29th, 2014

      Hi Diana!

      Good question! The documentation provided by Stanford TMT isn’t precisely clear on that point, but it can be generally understood as the overall weight of that topic in the corpus as a whole.

       

      Comment by Joyce Zhou on March 27th, 2014

      Hi,

      I am trying to run the sample scripts of the TMT tool as you said in this article, but always run into below errors:

      scalanlp.serialization.TextTableParseException: Unexpected quote in unquoted cell at line 915 column 149

      at scalanlp.serialization.TextTableReader$RowReader$CellReader$.read(TableSerialization.scala:450)

      at scalanlp.io.TextReader$class.readRemaining(TextReader.scala:128)

      I have no idea why this is happening, and seems no one is maintaining the tool now.

      Could you please help me with it?

      Thanks!

      Comment by Shawn on March 27th, 2014

      Hi Joyce, Hmmm. I’m not altogether certain what this error might mean! Can you run me through at which step you’re encountering the error? If it’s happening in the very first block, could there be a problem with your csv?

  • Topic Modeling by Hand (4 comments)

    • Comment by José Igartua on September 12th, 2013

      I don’t know where to put this suggestion, but it would be useful for you, if you are not familiar with it, to look at the Prospéro software programme, which I used for part of an article (« The Genealogy of Stereotypes: French Canadians in Two English-language Canadian History Textbooks», Journal of Canadian Studies / Revue d’études canadiennes, 42, 3 (Automne 2008): 106-132) and for which I prepared a brief online presentation at http://www.er.uqam.ca/nobel/r12270/textes/prospero.pdf.

      Before undertaking that research I had examined a variety of text analysis packages, and none had embodied a diachronic mode of exploration as well as Prospéro. It has now become open-source, with data dictionaries in English, French, and Spanish. The authors’ description of the software is available at http://prosperologie.org/?sit=22#5.

      Comment by Ian Milligan on September 13th, 2013

      Fascinating stuff, José (and a reminder that I need to get Parallels up so I can run this sort of Windows-only software). Sincerest thanks for sharing it with us: we’re planning to feature software platforms like MALLET, Stanford topic modelling, Gephi, etc., so this would be very a useful complement. Thanks for sharing the online link, which I think shows the utility of the platform really well.

      Comment by John Laudun on September 23rd, 2013

      Simple copy edit: remove “by hand” in first sentence to eliminate redundancy.

      Comment by Ashley Carlson on April 1st, 2014

      I found it interesting that the computer wasn’t able to be programed to tell the difference between a word being used as a noun vs a verb. Are there limits then to how much of the things, a historian would do without using digital history that, could be done digitally? I’m sure in the future everything will probably be all done digitally, but until then it seems having things digital can only take a historian so far, but it is a great distance from where we came from.

  • On Topic Modeling (3 comments)

    • Comment by Amanda Seligman on March 29th, 2014

      I’m curious about why you have a video introduction to this section. I sure wasn’t expecting it.

      Comment by Amanda Seligman on March 29th, 2014

      Also, I’m hoping this section will be expanded to say what Topic Modeling is, since I don’t know after reading this iteration.

      Comment by Shawn on March 29th, 2014

      Hi Amanda,

      At the time, I was reading a lot about MOOCs, and I thought it’d be nice to expand the possibility of this format by actually speaking to the reader about what the section was meant to be about…

       

  • Topic Modeling as an Integral Part of the Historian's Macroscope (3 comments)

  • Reflecting on our process (2 comments)

    • Comment by Arianna Ciula on September 18th, 2013

      I find the experiment fascinating – not only sharing some of the dirty (well pretty polished but still unfinished) laundry, but also the new directions that you as authors might pursue based on other people’s comments.

      I don’t understand though why you (or the publisher?) chose to show only one page per chapter. This makes it difficult for somebody external to the writing process to get a grasp of the whole argument and make more in depth comments – or is it just me not being used to have my reading chopped when others – and not myself – decide?

      Comment by Shawn on September 18th, 2013

      Hi Arianna,
      Thank you for your comments! It is a bit of a disjointed reading experience, we know. We expect that the number of readers who might start at the beginning and work their way through to the end to be the minority. We think of the individual pieces here as sections, that a reader might dip into when they have the finished book in front of them. Some of our sections are comparatively short, around 1000 words, while others push 4000 or 5000. A typical page in a printed book might be 300 words, so what appears on the website as a single page will in the end play out across several.

      As for the overall argument, yes I take your point. We’ve been writing in a more coherent fashion, behind the scenes with Scrivener, but what appears here does appear out of sequence. Ideally though each piece will stand on its own. If you click on the ‘original proposal’ (especially the backstory) and ‘original proposed chapters’, that might give a better sense of how the pieces are all meant to go together.

      It’s interesting that this platform we’re using in a way takes away the decision of how to read in a way that we are all of us not used to. Something we need to reflect on more! (You might also be interested, from that perspective, in Web Writing, a similar open peer-review experiment).

  • Preface (2 comments)

    • Comment by Scott Weingart on September 5th, 2013

      We need to bring this back together to the original section of the example, where our intrepid historian has her interest initially piqued by the language used. Ideally we can end this example showing that she doesn’t just have a macroscopic sense of the trials, but also a contextual sense of the small set of documents she was working on, and how they fit in the larger whole.

      Comment by Amanda Seligman on March 29th, 2014

      This question immediately brings to my mind a question about the relationship between these tools (which I don’t know anything about…yet) and the kinds of tools that qualitative social scientists have long been using to analyze their data.  I believe, for example, that two common tools are NUDIST and NVIVO. I’ve never quite understood why they would need computer programs to read their (say) interview transcriptions when historians just read and infer with our own brains.

  • Compiling several text files into a single CSV file (2 comments)

    • Comment by Ben Mawick on February 28th, 2014

      I wrote a little function in the ‘ugly language’ that might be relevant here:

      https://gist.github.com/benmarwick/9265414

      It gives the full text of each text doc in a row of the CSV file and isn’t sensitive to line breaks – you get all the text in the file. So that’s slightly different from your bat file where you need to know about line breaks in advance.

      Comment by Shawn on February 28th, 2014

      That’s fantastic. Thank you! We’ll update accordingly.

  • Topic Modeling with Paper Machines (2 comments)

    • Comment by Amanda Seligman on March 31st, 2014

      Here again I find myself wondering about qualitative social scientists and the tools they use to analyze interview data.

      Comment by Amanda Seligman on March 31st, 2014

      I like that phrase “generative direction finders.”

      Reading this passage it also strikes me that such visualizations could be very useful for teaching demonstrations with undergraduates. Perhaps some of you have experience with having done exactly this that you could incorporate into the manuscript.

  • Topic Modelling with the GUI Topic Modelling Tool (2 comments)

    • Comment by Sue Hemmens on September 1st, 2014

      Having used the GUI tool and the command line on the same set of documents (a late seventeenth century correspondence corpus), I got more useful results almost as quickly from the command line method by following http://programminghistorian.org/lessons/topic-modeling-and-mallet. The only addition to the default was to use —-optimize-interval as suggested.  I’m not saying that the documents are an ideal set to explore with these methods— that’s debatable! but it seems worth the extra trouble to use the command line tool.

      Comment by Shawn on September 3rd, 2014

      Hi Sue,

      Oh yes, the command line allows for more nuance etc. The GUI in our view is probably best used for quick-and-dirty exploration and its browser functionality. Perhaps as classroom exercise.

      Thanks!

  • Early Emergences: Father Busa, Humanities Computing, and the Emergence of the Digital Humanities (2 comments)

    • Comment by Amanda Seligman on March 29th, 2014

      Why is Literary capitalized?

      Comment by Will Tchakirides on April 1st, 2014

      I wonder where the digital turn in public history fits into this discussion and if public access to digital tools and resources contributed in any way to the nomenclature shift? Perhaps this is unrelated, but i would think it made a difference in how historians have thought about the “digital humanities” and its various applications.

  • How To Become A Programming Historian, a Gentle Introduction (2 comments)

  • Network Analysis Fundamentals (2 comments)

    • Comment by Amanda Seligman on March 31st, 2014

      Here too I suspect there is a large social science literature to draw on.

      Comment by Amanda Seligman on March 31st, 2014

      Having read the next section, I see that this comment was right on but useless to  you.

  • General Comments (2 comments)

    • Comment by Shawn on September 3rd, 2013

      On this page, please feel free to leave us any general reflections on our process, our content, our aims, our shortcomings…. we appreciate all constructive feedback.

      Comment by Zakea Jones on April 3rd, 2014

      I’m attempting to leave feedback to specific paragraphs and getting “ajax” errors, so I am going to leave a general comment and see if this works, and then attempt again to leave specific comments.

      The overall process I feel is a good start. I’m not able to be very critical since this is a very new area for me. Since personally I am among the generation that sees the value in the digital era I do not by any means try to  take away from some of the concerns that others would have regarding data overload. For me to personally experience authors presenting their work during the “in process” stage and opening up to an online “peer review” is a valuable way to cut down time and get instant feedback. So overall I like the process and I feel that it is very informational and not to technical for non-technical people.

  • Automatic Retrieval of Data (1 comment)

  • Dynamic Networks in Gephi (1 comment)

    • Comment by Shawn on June 25th, 2014

      I’ll just also mention that you can take your csv file and bang it into http://palladio.designhumanities.org/ for some instant visualization, as well as timelines, geographic mapping, etc.

      Palladio also does timelines in networks. But you have to format the date Year-Month-Day (2014-01-01). But, having gotten to this point, you should be able to work out what regex will achieve that?

  • Basic Scraping: Getting Your Data (1 comment)

  • When Not To Use Networks (1 comment)

    • Comment by Isaura Leon on April 3rd, 2014

      I really thought that making an example on how a flight network works. Also, I believe that describing how that network of each airport affects the network analysis. However, I feel that it needs to be clear on how it affects in general.

      Does it have an impact on accumulation of digital data? Are these methods helping to use networks?

  • Slicing a Topic Model (1 comment)

    • Comment by Amanda Seligman on March 31st, 2014

      Here I am wondering whether topic modeling is something that historians do intuitively as we read, and whether these tools are a formalization of something that we normally leave up to our brains.

  • Sidebar: Advanced Text Mining (1 comment)

    • Comment by Amanda Seligman on March 29th, 2014

      We so rarely get to see the word “fulsome” in historical scholarship…

  • 8,000 Canadians - adding new columns to a network file (1 comment)

    • Comment by Ian Milligan on September 6th, 2013

      Will have to make a OS X/Linux implementation.

  • What this site is not (1 comment)

  • Chapter Two Conclusion: Bringing It All Together: What's Ahead in the Great Unread (1 comment)

Comments on the Blog

  • Announcing our project to the world (1 comment)

    • Comment by Mr WordPress on May 17th, 2013

      Hi, this is a comment.
      To delete a comment, just log in and view the post's comments. There you will have the option to edit or delete them.

Source: http://www.themacroscope.org/?page_id=6