An experiment in writing in public, one page at a time, by S. Graham, I. Milligan, & S. Weingart

Making Your Data Legible: A Basic Introduction to Visualizations

1 Leave a comment on paragraph 1 0 By this point in the book, you know how to gather data, and we are beginning to explore various ways to visualize it.  While historians often make use of graphs and charts, the training of an historian typically includes very little on the principles of information visualization. Here we provide guidance on the various issues at play when the historian turns to data visualization.

2 Leave a comment on paragraph 2 0 It should now be clear that reading history through a macroscope will involve visualization. Visualization is a method of deforming, compressing, or otherwise manipulating data in order to see it in new and enlightening ways. A good visualization can turn hours of careful study into a flash of insight, or can convey a complex narrative in a single moment. Visualizations can also lie, confuse, or otherwise misrepresent if used poorly. What are the types of visualizations? Why might you choose one kind of visualization over another, and to what end? How can we use visualization techniques most effectively? We will also explore several visualizations which have been used to great rhetorical and analytical effect by historians.

Why Visualize?

3 Leave a comment on paragraph 3 0 A 13th century Korean edition of the Buddhist canon contains over 52 million characters across 166,000 pages. Lewis Lancaster describes a traditional analysis of this corpus as such:

The previous approach to the study of this canon was the traditional analytical one of close reading of specific examples of texts followed by a search through a defined corpus for additional examples. When confronted with 166,000 pages, such activity had to be limited. As a result, analysis was made without having a full picture of the use of target words throughout the entire collection of texts. That is to say, our scholarship was often determined and limited by externalities such as availability, access, and size of written material. In order to overcome these problems, scholars tended to seek for a reduced body of material that was deemed to be important by the weight of academic precedent.[1]

5 Leave a comment on paragraph 5 0  

6 Leave a comment on paragraph 6 0 As technologies advanced, the old limitations were no longer present; Lancaster and his team worked to create a search interface (figure 5.1) that would allow historians to see the evolution and use of glyphs over time, effectively allowing them to explore the entire text all at once. No longer would historians need to selectively pick which areas of this text to scrutinize; they could quickly see where in the corpus their query was most-often used, and go from there.

7 Leave a comment on paragraph 7 0 5.15.1

This approach to distant reading– that is, seeing where in a text the object of inquiry is densest– has since become so common as to no longer feel like a visualization. Amazon’s Kindle has a search function called X-Ray (figure 5.2) which allows the reader to search for a series of words, and see the frequency with which those words appear in a text over the course of its pages. In Google’s web browser, Chrome, searching for a word on a webpage highlights the scroll bar on the right-hand side such that it is easy to see the distribution of that word use across the page.

9 Leave a comment on paragraph 9 0 5.25.2

The use of visualizations to show the distribution of words or topics in a document is an effective way of getting a sense for the location and frequency of your query in a corpus, and it represents only one of the many uses of information visualization. Uses of information visualization generally fall into two categories: exploration and communication.


11 Leave a comment on paragraph 11 0 When first obtaining or creating a dataset, visualizations can be a valuable aid in understanding exactly what data are available and how they interconnect. In fact, even before a dataset is complete, visualizations can be used to recognize errors in the data collection process. Imagine you are collecting metadata from a few hundred books in a library collection, making note of the publisher, date of publication, author names, and so on. A few simple visualizations, made easily in software like Microsoft Excel, can go a long way in pointing out errors. Notice how in the chart in figure 5.3, it can easily be noticed that whomever entered the data on book publication dates accidentally typed “1909” rather than “1990” for one of the books.

12 Leave a comment on paragraph 12 0 5.35.3

Similarly, visualizations can be used to get a quick understanding of the structure of data being entered, right in the spreadsheet. The visualization in figure 5.4, of salaries at a university, makes it trivial to spot which department’s faculty have the highest salaries, and how those salaries are distributed. It utilizes basic functions in recent versions of Microsoft Excel.

14 Leave a comment on paragraph 14 0 5.45.4

15 Leave a comment on paragraph 15 0 More complex datasets can be explored with more advanced visualizations, and that exploration can be used for everything from getting a sense of the data at hand, to understanding the minute particulars of one data point in relation to another. The visualization in Figure 5.5, ORBIS, allows the user to explore transportation networks in the Ancient Roman world. This particular display is showing the most likely route from Rome to Constantinople under a certain set of conditions, but the user is invited to tweak those conditions, or the starting and ending cities, to whatever best suits their own research questions.

16 Leave a comment on paragraph 16 0 5.55.5

Exploratory visualizations like this one form a key part of the research process when analyzing large datasets. They sit particularly well as an additional layer in the hermeneutic process of hypothesis formation. You may begin your research with a dataset and some preconceptions of what it means and what it implies, but without a well-formed thesis to be argued. The exploratory visualization allows you to notice trends or outliers that you may not have noticed otherwise, and those trends or outliers may be worth explaining or discussing in further detail. Careful historical research of those points might reveal even more interesting directions worth exploring, which can then be folded into future visualizations.


18 Leave a comment on paragraph 18 0 Once the research process is complete, visualizations still have an important role to play in translating complex data relationships into easily digestible units. The right visualization can replace pages of text with a single graph and still convey the same amount of information. The visualization created by Ben Schmidt reproduced in figure 5.6, for example, shows the frequency with which certain years are mentioned in the titles of history dissertations.[2] The visualization clearly shows that the great majority of dissertations cover the years after 1750, with spikes around the American Civil War and the World Wars. While my description of the chart does describe the trends accurately, it does not convey the sheer magnitude of difference between earlier and later years as covered by dissertations, nor does it mention the sudden drop in dissertations covering periods after 1970.

19 Leave a comment on paragraph 19 0 5.65.6

20 Leave a comment on paragraph 20 0 Visualizations in publications are often, but not always, used to improve a reader’s understanding of the content being described. It is also common for visualizations to be used to catch the eye of readers or peer reviewers, to make research more noticeable, memorable, or publishable. In a public world that values quantification so highly, visualizations may lend an air of legitimacy to a piece of research which it may or may not deserve. We will not comment on the ethical implications of such visualizations, but we do note that such visualizations are increasingly common and seem to play a role in successfully analyzing data, proving your case for peer review, or help make your work accessible to a general public. Whether the ends justify the means is a decision we leave to our readers.

21 Leave a comment on paragraph 21 0 [1] Lewis Lancaster, “From Text to Image to Analysis: Visualization of Chinese Buddhist Canon” Abstract for Digital Humanities DH 2010 King’s College London 7th – 10th July 2010, http://dh2010.cch.kcl.ac.uk/academic-programme/abstracts/papers/html/ab-670.html

22 Leave a comment on paragraph 22 0 [2] Ben Schmidt “What years do historians write about?” Sapping Attention May 9, 2013 http://sappingattention.blogspot.ca/2013/05/what-years-do-historians-write-about.html

Page 51

Source: http://www.themacroscope.org/?page_id=837