Thick Networks

In this next series of images I’ve shifted the emphasis from the relationship between people and organizations to the links that people share by virtue of belonging to the same organizations. In the former case, both people and organizations are nodes of the network connected by lines (edges). In the latter case, only people are nodes. Organizations are represented by lines connecting people. Each person is connected to every other person with whom they share an organization. The result is a much thicker field of relationships–or at least the appearance of it.

When I cycle through this set of images, I am reminded of brain scan imagery in which various types of stimuli ignite neurons in different parts of the brain. (In the gallery images, the colored dots are connected to the selected person while the white dots are not connected). There are four major clusters. None are completely homogeneous, but in the interest of description I’ll name them as if they are. From left to right, we begin with the immigrant trade union militants (e.g., Pauline Newman, and Rose Schneiderman who is slightly more connected to the WTUL group), move through the central group made up important of union and Socialist leaders (e.g., Debs, Berger, Maurer, Dubinsky) that also includes African American unionists Owen Chandler and Frank Crosswaithe. Moving further to the right we come to the mainstream, unremarkable AFL unionists on the bottom right, and finally at the top right corner we finish with railroad brotherhood leaders.

You can see the visual effect more strongly, although more slowly, in the interactive version of the chart One node network chart.

Networked Labor Movement: I reach an impasse, and go around

This is the fourth a series of posts I am writing to help me think through the use of network analysis and visualization.

alww-corrected
A simplified network chart based on the complete ALWW directory. The chart shows only individuals with 3 or more connections.

About seven months ago, I was merrily chugging along on this series using the index of the 1925 American Labor Who’s Who as a database for network analysis when I hit an impasse. I was using the list of names and organizations from the book’s index to build network charts. However, the simple structure of the index, so handy for the analog book, adds a layer of abstraction/interpretation that gets in the way of analysis.

The Labor Who’s Who index presents names according to two types of categories. The first might be called “varieties of organization” and includes American Federation of Labor Affiliated Bodies, Independent Unions, Political Parties, and Miscellaneous. Of these, only “AFL-affiliated” is an organic category. “Political Parties,” on the other hand, is a conceptual category, not an entity that the Socialist Party or the Republican Party affiliated with. At the next level down things get more complicated. Things get even messier in the Miscellaneous category, which includes Journalists and Writers, Negro Progress, Workers Education, and a few others. Unfortunately, the index doesn’t tell us the particular newspapers and organizations that make up these sub-groupings in Miscellaneous.

Neither does the index list all the organizational affiliations listed in individual entries, it is more of a snapshot of what the compilers thought were the most important memberships of each person. The result is a simplified, and perhaps, distorted image of the network of associations, and my research impasse. I was at the point of pulling out particular sections of the network chart (those individuals who sat between the two main groupings), but it seemed better to stop and develop the full database than continue with the index alone.

Easier said than done. The complete directory of over 1,000 names is much messier than the index (see the post “Old Book, New Data”). In addition to basic OCR scanning errors there are a few missing and torn pages in the scanned version. The enormity of the task of cleaning the data myself loomed. One solution was to “crowd source” the data cleaning, but that might take a long time and who would really be interested? Another potential solution was to deploy undergraduate students as a “curated crowd.” Because I was already scheduled to teach an upper division lecture course on American Working Class Movements in the fall of 2014, I developed a course project that included a small amount of data cleaning for students–and (as it turned out) a lot of help from two graduate students in the UCLA Center for Digital Humanities. I’ll write about what went right and wrong with that process in a later post, but the upshot is that now I have a working version of the complete directory.

And with that news, I will begin to post more regularly over the next month.

 

Old Book, New Data

Labor Who's Who title page

(Originally posted on bughousesquare.wordpress.com)

Over the past year or so I’ve been working on digital history project that aims to convert a 1925 American Labor Who’s Who into a research and teaching database and wiki. It continues to be “a learning experience,” as my mother used to call all the unpleasant encounters of childhood. Not all bad, to be sure, but not all good. Since I have versions of the data up on the internet, I thought I should post some reflections.

Labor historian Jon Beck from the Michigan State Industrial Relations program started my thinking about the Labor’s Who Who around 2007 or so when he suggested it might be useful for my project on working class autodidacts. The Rand School of Social Science sponsored the compilation of the Who’s Who in 1925 under the direction of Solon De Leon (son of famed radical Daniel De Leon). De Leon and his colleagues threw open the front door to the House of Labor, so to speak, including in the roughly 1,300 entries for the U.S. activists in the fields of immigrant rights, civil liberties, cooperatives, progressive and radical politics, as well as the to-be-expected trade unionists (there are 300 additional non-US activists–a few of these were deported or self-exiled US activists).

Nineteen twenty-five was a curious moment for the American labor movement. The industrial union upsurge of the 1910s was sputtering under the weight of repression, factionalism, and failure. The powerful unions of the CIO were a decade or more in the future. Meanwhile, conservatives held a tight, if a bit desperate, grip on the political machinery of trade unionism at the national level, antiunion Republicans were in the White House, and reactionary groups like the KKK and American Legion were popular. And yet, there was a great deal of activity and organizational creativity in some unions, and there was a blossoming network labor colleges training the leaders of the ’30s.

The Labor Who’s Who is a snapshot of this contingent moment and some of the people who lived it. Each entry is a telegraphic biography. Some provide only name, professional title and address at the time of publication. But many sketch rich life histories. Nearly all provide details on birth date and place, family background, education, migration, and work histories, as well as key organizations, events and publications. It includes both long-serving elders whose careers stretched back to the 1870s, and emerging leaders who would continue to be active into the second half of the 20th century.

For years I had a library copy of the book on my office shelf, thinking I would get to the project eventually. Then in 2012 I discovered the book had been scanned by Google and was sitting behind the access wall in the HathiTrust (HT) digital collection. You could search keywords, but the search only returned a few words and a page number. From my key word searches, I knew that about 40 individuals identified themselves as “self-educated,” but learning more about the educational and organization matrix represented in the directory was just beyond reach. Hoping to avoid the wrath of Disney and other commercial publishers, HT takes a defensive approach to copyright. Most things published after the easy cut off for public domain (before 1923) go behind the access wall.

Very frustrating. And ironic. Here was a book published by a radical college, locked behind a copyright wall at the behest of capitalist media corporations. Not that these corporations give a hoot about the Labor Who’s Who, it’s just structural. Everything after 1922 goes behind the wall unless someone specifically requests it be freed.

Thus was born what I’m now calling the “HathiTrust Liberation Project.” Hundreds and hundreds of labor and leftist volumes published between 1923 and 1963 are in the public domain unless their copyright holders renewed the copyright (there is an online database of to check for renewed copyrights: http://comminfo.rutgers.edu/~lesk/copyrenew.html ). Unlike literary works, mundane works of non-fiction and social movement publications are usually not renewed. Many of these volumes are already digitized, but are blocked. Likewise, a surprising number of post-1923 government documents are behind the access wall.

The Labor Who’s Who was my first foray into old book liberation. Through the good graces of the UCLA Library, I was able to convince HT that the copyright on the Labor Who’s Who probably wasn’t renewed, and in any case the socialists won’t kick if you open it up. Somebody flipped a switch and the volume appeared. This was in the spring or summer of 2012.

The next task was extracting and cleaning OCR’d text. This turned out to be a little more complicated than I expected. In the end, I downloaded an EPUB version of the Who’s Who, and copy-and-pasted the text into a separate file. So far, so good. But this was a long way from a database. With the help of UCLA librarian Zoe Borovsky and Miriam Posner of the Center for Digital Humanities, I got some help breaking the text up into discreet entries and, eventually, data fields. However, there were many, many text recognition errors. I probably could have hired someone to do it (if I had the money), but in the end I did most of the corrections myself. Let’s just say I became intimately familiar with the contents of the book. And isn’t that the traditional activity of scholarly humanists after all, even if this mode of familiarity generally is not recognized as such by personnel committees.

So by the late fall of 2012, I had a relatively clean text file with entries broken into fields: name, titles, birthplace, birth date, father’s occupation, and a residual field that was too irregular to easily parse that included things like education, organizations, activities, publications, home and work address. Next came the task of reorganizing this information from a flow of text into a spreadsheet, rather tediously done by cutting and pasting in Microsoft Excel.

From the start, I had envisioned the Who’s Who database as a teaching tool, as well as a research project. I imagined students using the entries as a starting place for biographical papers, so I needed a student-friendly interface. I had experimented fitfully having students write or edit Wikipedia entries in my classes, so it seemed natural to put the Who’s Who data in a wiki. A regular wiki is searchable, but doesn’t really have database functions. To get those, I used the Mediawiki extension bundle Semantic Mediawiki. The semantic wiki allows you to define data fields and relationships, import data, search across data fields, and enable students or other users (if you wish) to edit the data through forms.

birthplacesworkaddressI also loaded the data into a Google Fusion Table, which allows you to quickly make maps from any geographic data (e.g., birthplaces). Fusion Tables is easy, but limited in terms of customizing. My students used the filtering and mapping functions to produce in-class reports on the demographics of various organizations represented in the directory. Semantic Mediawiki is much more flexible. But for the non-expert it was one of those “learning experiences.” Many late nights, crashes, and frustrations before ultimate success. In the future I hope to use it in my labor history classes to train students how to use a wiki before I set them off on the actual Wikipedia.

What remains to be done is the “Other” field–education, organizations, publications–lots of good stuff. I’m currently working with folks at the Center for Digital Humanities, and hope to have that done by late winter. In the meanwhile, I’m doing some analysis of subsets of the Who’s Who, particularly the organizational networks. And that presents me with my next “learning experience,” Gephi.