DÍScoLA: Developing Digital Scholar- ship Initiatives

SALALM 61, Panel 1, May 11, 2016, 11:00am-12:30pm
Moderator:       Irene Munster, The Universities at Shady Grove
Rapporteur:      Lara Aase, University of Washington

Panelists: Alex Gil, Digital Scholarship Coordinator, Columbia University
Marisol Ramos, Subject Librarian for Latin American, Caribbean and Latina/o
Studies, University of Connecticut
Alison Hicks, Romance Languages, Literatures and Cultures Librarian, University
of Colorado Boulder
Melissa Gasparotto, Librarian for Latin American Studies, Spanish and
Portuguese, Rutgers University Libraries

Because of technical difficulties with recording equipment, unfortunately we are unable to provide reporting on the first two presentations, Alex Gil’s Where is Digital Humanities in Latin America and the Caribbean? and Marisol Ramos’ Damas y Señoritas: Visualizing Printing Presses and Editorial Offices of 19th Century Spanish Women’s Magazines in Spain and Puerto Rico. This report is a summary based on presenters’ slide presentations, drafts, and notes, and on wording from the HathiTrust websites. Many apologies for the error in recording.

Digital Scholarship as Researcher Practices

Alison Hicks presented on the concept of digital scholarship and the role of the librarian as scholar and facilitator of networked participation. She framed her discussion using questions, and she referred to recent research and publications to explore answers to those questions. For a preprint of Hicks’ chapter on this topic, including her bibliography, please see http://scholar.colorado.edu/cgi/viewcontent.cgi?article=1075&context=libr_facpapers

Who or what is a digital scholar, when all scholars these days use technology in one way or another? To unpack this question, Hicks discussed both scholarship and digitality. Scholarship, according to Boyer in 1990, goes beyond just the generation of new knowledge to involve discovery, integration, application and teaching. Thus, modern (digital) scholarship is more than the adoption of new research methods, skill sets, tools, or technology; it necessitates new practices and habits in outreach, engagement, and education, as well as the willingness to grapple with ideas about the nature and purpose of scholarship, accountability, impact, and control. Similarly, digitality is more than just the use of new technologies to enhance research (making scholarship faster and more collaborative); it also embraces the value of openness, participation, and informal collaboration.

Digital scholarship goes beyond the activities listed by the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities & Social Sciences, which are the following:

  1. building a digital collection of information for further study and analysis,
  2. creating appropriate tools for collection building,
  3. creating appropriate tools for the analysis and study of collections,
  4. using digital collections and analytical tools to generate new intellectual products, and
  5. creating authoring tools for these new intellectual products, either in traditional form or in digital form.

Digital scholarship is not merely a new publishing model, for instance; it can in fact change the structure of scholarly norms. And the development of digital scholars cannot rely on a mere fixed set of functional digital literacy skills; instead, scholars will have to develop the ability to learn and teach continually in a dynamic, ever-changing environment. The new, decentralized medium, with different kinds of gatekeepers, is less exclusionary and may encourage more inclusive research and publicization. Such openness is key to digital scholarship. As Veletsianos and Kimmons pointed out in 2012, digital scholarship is enacted through three major forms: open access and open publishing; open education, including open educational resources and open teaching; and networked participation (e.g., a scholar who has uploaded a manuscript for feedback to Academia.edu to share ideas with broader audiences before formal publication; an educator using Twitter to engage in professional or social commentary with others in the field; a PhD student blogging on WordPress to discuss emerging ideas from her thesis).

What is the impact of these scholarly practices within traditional academic reward and promotion structures? It seems too soon to tell as yet, because although academics use new open, collaborative, and participatory technologies to build influence and reputation, academia does not currently have a way of measuring this impact and engagement. Traditionally, value has been measured by other academics (e.g., through peer review) in terms of exclusivity, knowledge scarcity, and economic productivity. Openness is therefore both a source of opportunity and a point of tension. What do these new developments mean for a digital scholar’s identity? Personal and professional identities and boundaries can be blurred through social networks and lead to a lack of privacy, where one’s academic identity may be “undermined” by networked participation and even online harassment, magnified by the long memory of archived online content.

What about the tools that we use for participatory scholarship? They are not neutral platforms from which we can engage in networked practices; instead, they can be seen as unstable, biased, privileged, and money-making platforms that may reinforce existing structures and social norms. Internet information “bubbles” can bring together like-minded individuals, and the personalizing algorithms of social media and even search engines can filter information and make it seem as if other points of view do not exist. Veletsianos and Kimmons warn that the ideals of educational justice assumed to drive networked participatory scholarship may not be intrinsic but in fact characteristics of early adopters.

What are librarian roles within these new landscapes? Librarians can do more than manage repositories or facilitate scholarly communication; they can also provide instruction in practices that are dynamic, flexible and subject to change. Here are four ways that librarians can contribute to digital scholarship.

  1. Re-center workshops around the practices of networked participation. Rather than focusing on demonstrating a tool, explore how the tool might fit into the researcher’s existing practices. At the University of Colorado Boulder, workshops have focused on workflow and software feature comparison rather than on technical features (e.g., comparing Mendeley, Zotero, Endnote, and Papers, rather than siphoning learners into learning one specific or institutionally mandated technology). Workshops start with a series of questions about attendee needs (research/study practices, disciplinary norms/constraints), then highlight how each tool could match participant needs rather than vice versa.
  2. Develop learner awareness and facility with new scholarly practices. Workshops should emphasize critical appraisal of the pitfalls as well as on the opportunities of digital technologies; for instance, UCB’s Creating a digital identity workshop questions the purpose and goals of an online identity and discusses the benefits and drawbacks of using commercial sites.The workshop on Improving your impact critically engages with the concepts of outreach, public discourse and measurement, uncovering assumptions, fears, and concerns about success in the academy, rather than just assuming that the use of different technology or metrics will automatically lead to greater representation or quality of opportunity within higher education.
  3. Create public digital scholarship discussion fora. Librarians at UCB have partnered with educational technology staff to create events, open to the entire campus, to discuss digital scholarship. In 2014, Academics Online week brought scholars and librarians together to exchange ideas about the nature of digital scholarship and its potential impact on their work, to test technology, and to raise awareness in an open conversation.
  4. Involve undergraduates.  Many universities now publish undergraduate theses and senior projects online through open access institutional repositories.The concept of networked participation can also lead to redesign of undergraduate research assignments, which tend to focus on a final essay or research project. Librarians can work with faculty to construct scaffolded assignments focusing on the intermediary steps that lead to a final paper (for example, by following a Twitter hashtag, or mapping a scholar’s informal online conversations), making questions of inquiry, as well as authority and evaluation, more visible to students.

HathiTrust for Latin American Studies Research: Building and Mining Thematic Collections

Melissa Gasparotto spoke about the collection-building features in the HathiTrust Digital Library and the textual analysis tools available in the HathiTrust Research Center. She described the specific ways in which Spanish-language content is treated by the HathiTrust algorithms.

HathiTrust Digital Library provides a familiar search experience for its catalog as well as the ability to search full-text in its digitized documents. Users can build, share, and search collections and export metadata. The search engine, Solr, however, can have trouble dealing with the HathiTrust index because it is so large–with more than 4 billion words–and because it is multi-language. Stop-words in one language may be keywords in another (e.g., lo, die, is), and Spanish-language corpora (lists of words to train algorithms) in particular are small and error-filled, so searches in Spanish are not as accurate as in English. Multilanguage searches impact IDF, or inverse document frequency, scores, the frequency with which a term appears in one document vis a vis the entire collection. Further, OCR (optical character recognition, the conversion of images to machine-encoded text) and tokenization (breaking up a stream of text into meaningful elements like words and phrases, which presents problems with contractions and hyphenated words) clutters up the index with junk “words.” That clutter negatively affects searching, ranking, and speed. HathiTrust is not yet searched enough for it to obtain much user data for algorithm improvements.

HathiTrust Research Center (https://analytics.hathitrust.org/) provides services for public domain texts only, but it supports a variety of functions. Currently the Research Center is only available to scholars from non-profit institutions of higher education. Using Workset Builder, researchers can run a set of prepared algorithms to extract names, dates, places, classify volumes, and chart on a timeline. Such algorithms are useful, work well with Zotero, are a good entry-point to the use of digitized texts, and facilitate digital humanities projects, but the algorithms work best on small collections, and results vary depending on the language of the books in the target workset.

The Research Center also supports functions called the Extracted Features Dataset and the Data Capsule. The Extracted Features Dataset is useful for linguistic analysis. Scholars can download a dataset of page-level information culled from a workset, including automatic language detection, token count, part of speech, and frequency, and then export only the information they need for their particular analysis. The Data Capsule is a secure virtual computer users can check out for a limited time for analytical access to the digitized public works of the HathiTrust. At the time of this presentation, access through the 15 extant Capsules was only to public-domain works, but access to digitized works in-copyright should be available as of the summer of 2017. Users can perform computational analysis within the secure Data Capsule environment and then export the results of their analysis, but they cannot export volume content, which allows for computational access to restricted texts without violating copyright law. The Data Capsule supports non-consumptive research (in which computational analysis is performed on one or more books) rather than research in which a scholar reads or displays a substantial amount of text to understand the intellectual content presented within the book. Non-consumptive analytics includes image analysis, text extraction, textual analysis and information extraction, linguistic analysis, automated translation, and indexing and search.

HathiTrust is a useful tool for digital humanists, providing access to a huge number of digitized texts and the ability to extract and manipulate data. In the future, SALALMistas can hope that HathiTrust’s search engine and Research Center will improve its algorithms and services for Spanish- and Portuguese-language speakers and scholars.

