Guest Posts: III. Using Digital Humanities Tools to study Spanish Influenza in State Medical Journals

~This is the third of four guest posts from a class taught by Tom Ewing at Virginia Tech in fall 2017. Read the other parts here: I and II.

The state medical journals can be analyzed using tools and methods from the digital humanities and data analytics to provide unique insights into the historical significance of the 1918 Spanish Influenza. These methods include term frequency, text visualizations, and network analysis of term collocations. The tools used for this analysis, each easily accessible to scholars and students, including Voyant and Google Sheets.

Accessing the full text versions of the state medical journals is possible through the internet archive, which provides links to a full text version of each volume. The text version often includes some additional texts from the digitization process, which can be deleted from a text version of the document, to preserve the integrity of the original. Some visualization tools, such as Voyant, allow users to enter a url, which can be done from the internet archive directly. In this case, the additional text may appear in the visualization, but can be deleted or ignored.

III.1

The six state medical journals for the period most directly related to the Spanish Influenza, 1918 – 1919, include more than four million total words and 86,000 unique terms. Illustration III.1 is a cirrus cloud representation using Voyant showing the one hundred twenty five most common terms (excluding common stop words) across all four million words. Some of the terms are obviously relevant to any medical field (medical, cases, treatment, patient), while others are perhaps more specific to state medical journals (county, state, society, and member). The letter, a^, appears in these texts as a result of optical character recognition software, and thus is not consistent with the original text. While this visualization is useful for showing the most common terms across all the journal texts, it provides little analytical insights into the Spanish flu epidemic (or any other specific topic).

III.2

Using the Contexts and Cirrus Cloud tools in Voyant to analyze the same corpus from six state medical journals does provide a more insightful visualization of texts. The Contexts tool extracts the fifteen words on either side of the keyword, “influenza” (Illustration III.2), thus producing a corpus of 60,000 total words and 6,000 unique words. The advantage of using this tool is to produce a corpus of terms that are all directly related to influenza in the context of these six journals at the most relevant time. The Cirrus cloud of top 125 words produced from this corpus is more insightful in terms of understanding how state medical journals reported on the Spanish flu (Illustration III.3). The two most prominent terms illustrate key aspects of this disease: the widespread impact that led to the frequent designation of an epidemic and the association with pneumonia, which actually caused most deaths during the epidemic. Other medical terms that appear frequently in the same context as influenza, such as d

III.3

isease(s), patient(s), case(s), and hospital, are not specific to the epidemic. By contrast, the frequency of the term “bacillus” reveals how medical journals used this term to explain the causal agent of the influenza. As will be discussed below, the appearance of the term vaccine(s) among the top 125 words is perhaps the most suggestive aspect of this visualization.

III.5

As suggested above, collocation is a useful tool for understanding the meaning of terms within a particular context. Illustration III.4 shows the terms most commonly collocated with the term influenza across all six state medical journals (Pennsylvania, Virginia, Missouri, Indiana, Kansas, and Kentucky). As was evident in the Cirrus cloud discussed above, the most frequent terms include seemingly obvious terms, such as case(s), epidemic, disease(s), and pneumonia. Yet here again, the frequency of the term vaccine(s) is notable, as it suggests that the state medical journals often reported on influenza and vaccines in the same context.

The relative importance of these terms can be documented by seeing the frequency of terms in the context of vaccination (using the truncated term, vaccin*, which includes both vaccine and vaccination). Illustration III.5 identifies medical terms most frequently collocated with vaccin’*, indicating that influenza was the most common term, followed by typhus, with pneumonia, small pox, and tetanus appearing less frequently.

III.6

Illustration III.6 compares how frequently the nine most common disease terms were collocated with vaccin* in the entire corpus of the six state medical journals for 1918-1919. The fact that “influenza” appears most commonly across this corpus indicates that discussion of vaccines during this two year period across these six journals was very much about the influenza — even though no effective vaccination against influenza was available at the time.

Table III.1

This approach has the potential to offer new ways to interpret sample texts extracted from a very large corpus. Starting with the 60,000 words identified as the context for the term influenza across these six medical journals, it then becomes possible to identify contextual phrases surrounding the term “vaccin*” These phrases were then classified as having a positive or negative statement about vaccination, or a meaning that is either neutral or indecipherable. Approximately 20% of the nearly 300 phrases could be identified as either negative or positive; a sample are listed in table III.1 below.  These phrases suggest some important issues related to vaccines during the 1918 Spanish flu, including questions about therapeutic benefits, the quest for preventions and cures, and safety of serums. Terms such as reliability, results, investigation, evidence, recommendations, and laboratory suggest the ways that medical journals engaged the scientific method in testing vaccines. Yet removing these phrases from their context, both in terms of complete sentences or paragraphs and the date and journal title, as well as the additional problems associated with optical character recognition and text encoding, makes these interpretations more suggestive than conclusive. In this sense, the digital humanities tools are more valuable for identifying important phrases and patterns, but actual interpretation of historical significance requires more traditional methods of close reading and analysis.

State medical journals can also be used to examine trends in keywords. Medical Heritage

III.7

Library and the Internet Archive make it possible to examine several volumes of the same journal over a period of years by entering the url’s. Illustration III.7 shows the distribution of the term “influenza” each year from 1916 to 1922, a six year period with several years on either side of the 1918-1919 epidemic for state medical journals from Indiana, Missouri, and Kansas. As clearly indicated in this chart, all three journals provided little reporting on influenza in 1916 and 1917, the years prior to the Spanish influenza epidemic. The frequency of influenza in the Journal of the Indiana State Medical Association increased sharply in 1918. By contrast, the Journal of the Missouri State Medical Association and Journal of the Kansas Medical Society showed slight increases in 1918, followed by much greater increases in 1919. The frequency of influenza decrease in all three journals in 1920, and by 1922, had decreased so only slightly higher rates as were reported in 1916. This distribution of terms suggests two important interpretations of how the medical profession responded to the Spanish influenza: first, the epidemic prompted a period of intense attention to the disease, as reflected by the fact that almost 60% of the times “influenza” appeared was in the years 1918-1919, which accounted for less one-third of the seven years covered in this chart. Yet the second interpretation is equally important, as it appears that attention to influenza quickly dissipated after the epidemic ended, at least as measured by the frequency of these keywords.

III.8

A comparison of term frequency between the Journal of the Kansas Medical Society and a newspaper, the Topeka State Journal, available from Chronicling America, provides further evidence of the scholarly value of state medical journals (Illustration III.8). This chart recognizes that the methods of measuring are slightly different: the journal counts all keywords whereas the newspaper counts pages on which the keyword appeared, so the actual count of keywords in the newspaper would likely be higher. Both the state medical journal and the newspaper showed a significant increase in 1918-1919, followed by a gradual decrease over the next several years. Whereas the newspaper showed a dramatic increase in 1918, and almost the same level of reporting in 1919, the increase in term frequency in the journal occurred in 1919, as discussed above. While the newspaper also revealed a decrease in 1920, the relative decrease following the peak of the epidemic was less consistent than the data from the journal, suggesting the possible appearance of advertisements in this popular medium containing the word “influenza.” In each publication, the years 1918-1919 accounted for just over 60% of all the appearances of the term “influenza” in this seven year period.

Bookmark the permalink.