You can read abstracts of the papers accepted below, by clicking on each speaker’s name.
Peter Auger, University of Oxford
Snapshots of Early Modern English Responses to French Poets
Alistair Baron and Andrew Hardie, Lancaster University
Prerequisites to a corpus-based analysis of EEBO-TCP
Giles Bergel, University of Oxford
The Politics and Poetics of Transcription
Daniel Carey and Anders Ingram, National University of Ireland, Galway
Richard Hakluyt’s Principal Navigations: TCP and the Development of a Critical Edition
Simon F Davies, University of Sussex
EEBO-TCP in reception studies: reading demonology in early modern England
Alison Findlay and Liz Oakley-Brown, Lancaster University
Jacob J S Halford, University of Warwick
The emergence of “new philosophy” in the discourses of seventeenth century philosophy
Heather Froehlich, Richard J Whitt and Jonathan Hope, University of Strathclyde
EEBO-TCP as a tool for integrating teaching and research
Mark Hutchings, University of Reading
Editing the Renaissance in the Classroom
Leah Knight, Brock University
“EEBO-Driven”: Ten Years of Test-Driving
Marie-Hélène Lay, Université de Poitiers/University of Poitiers
VariaLog : how to locate words in Early Modern Stages of French and English
Martin Mueller, Northwestern University/University of Poitiers
Towards a Book of English: A linguistically annotated corpus of the EEBO-TCP texts
Michelle O’Callaghan and Alice Eardley, University of Reading
Using TCP Files in Digital Editions: Introducing Verse Miscellanies Online
Stephen Pumfrey and Paul Rayson, Lancaster University
The semantics of liberty in Early Modern English
Sebastian Rahtz and James Cummings, University of Oxford
Kicking and Screaming: Challenges and advantages of bringing TCP texts into line with the Text Encoding Initiative
Helen Sonner, Queen’s University Belfast
The ‘Popular Construction’ of Meaning in Early Modern Print (Or: How I Learned to Stop Worrying and Love the Full-Text Search)
Matthew Steggle, Sheffield Hallam University
Lost plays and EEBO-TCP: a case study from Dekker
Rebecca Welzenbach, University of Michigan
Transcribed by hand, owned by libraries, made for everyone: EEBO-TCP in 2012
Mary Erica Zimmer, Boston University
From Aspiration, Through Education: Revisiting Spenser’s “Letter of the Authors”
Posters have been accepted from:
James Cummings, University of Oxford
Re-use, enhancement, and exploitation: An investigation of projects using EEBO-TCP materials
Ian Gadd, Bath Spa University, Giles Bergel, James Cummings and Pip Willcox, University of Oxford
Digitizing the Stationers’ Register
Jayne Henley, Llyfrgell Genedlaethol Cymru/National Library of Wales
Editing Welsh texts for EEBO-TCP
Jim Kuhn, Sarah Werner and Owen Williams, Folger Shakespeare Library
F21 – Interoperable Digital Editions of Early English Plays
Judith Siefring, University of Oxford
Introducing SECT: Sustaining the EEBO-TCP Corpus in Transition
Digital resources have the potential to transform early modern reception studies: it is possible tolocate a set of printed references to an author or work within minutes that would once havetaken months to gather. In this paper I assess how far EEBO-TCP can advance ourunderstanding of the English and Scottish reputations of early modern French poets.
TCP texts allow us to supplement the findings of earlier research on English responses to canonical French poets like Joachim Du Bellay and Pierre de Ronsard. In the case of Guillaumede Saluste Du Bartas (1544-90), EEBO-TCP keyword searches add about 25 new references in print before 1641 to the 105 gathered by Anne Lake Prescott and other scholars. As valuable assuch new material is, however, it only advances our knowledge when analyzed using other modern critical tools and methods, initially to identify gaps in the database’s coverage (e.g. Latin treatises, pamphlets) and then to contextualize references. EEBO-TCP complements existing research techniques in helping us to develop a more detailed picture of these poets’ reception histories.
More revolutionary is the potential for electronic searches to offer us snapshots of previously neglected English interactions with French poets, and so reshape the framework within which we approach early modern Anglo-French literary relations. With Du Bartas, for example, TCP texts reveal almost 90 further references in print between 1641 and 1700, few of which havebeen discussed before. These references challenge the view that Du Bartas’ reputation simply collapsed after the Civil War, and encourage us to examine more closely his poetry’s connections with non-fictional treatises and other prose works in this later period.Searches can also stimulate research on how less familiar French poems are being circulated, translated and read in Stuart England.
For example, TCP texts reveal that Guy de Faur, Seigneurde Pibrac, a Catholic apologist for the St Bartholomew’s Day Massacre, had a presence in seventeenth-century English literary culture, both in histories of the Wars of Religion, and as author of a set of moral Quatrains. Similarly, we can gain instant orientation about how widely other French religious poets little-known today — such as Odet de la Noue, Pierre Matthieu, Jean Bertaut and Pierre Duval — were being read abroad.The EEBO-TCP corpus could further assist Anglo-French reception studies if its coverage and links with the Universal Short Title Catalogue develop to facilitate multi-lingual research that draws attention towards Latin and French texts as well as English responses.
Most valuable of all, EEBO-TCP could provide a platform for scholarly collaboration; for example, annotations and cross-referencing between texts would make it possible to verify, collate and publicize lists of printed references for use by others. In such ways, technology could facilitate cooperation between scholars of different nationalities and interests, as well as helping individual researchers to ask important questions more quickly.
With Phase 1 containing over 25,000 transcribed texts, approximately 700 million words, and Phase 2 already containing nearly 15,000 texts, over 200 million words (and growing), EEBO-TCP offers an unrivalled resource to corpus linguistic studies of Early Modern English in terms of scale and coverage. However, several preparatory steps are necessary to enable meaningful corpus analyses of the EEBO data. This paper describes these processes and some of the concomitant difficulties, whilst also outlining the analysis methods offered by the corpus analysis tool CQPweb.
Historical corpus linguistics is a well-established research area, and a wide range of corpora have been developed. These corpora have tended to be carefully designed and finely structured, and to focus on a particular genre or text-type; in consequence most are relatively small in size, e.g. the Corpus of English Dialogues at 1.2 million words (Culpeper & Kytö, 2010). By contrast, the scale of the EEBO-TCP data is comparable to the very largest modern corpora now available, e.g. web-crawled corpora of billions of words (Baroni et al., 2009).
Powerful software tools are required to process corpora on this scale. One such tool is CQPweb, a user-friendly web-based interface to the IMS Open Corpus Workbench. CQPweb’s analysis functions include concordancing (displays of word occurrences with their immediate context), collocations (statistical analysis of word co-occurrence) and keyword analysis (words significantly more frequent in one set of texts compared to another). To get the most out of these analysis techniques, corpus metadata is needed to allow users to narrow analyses to a particular set of texts or to compare sub-corpora defined by metadata-based filtering. We have been able to extract various metadata fields from the EEBO-TCP headers of each text, most notably the date and place of publication, which allows diachronic analysis. We have also implemented various types of corpus annotation of the EEBO-TCP data, notably part-of-speech tags (grammatical labels assigned to each word) and semantic tags (topic or concept labels assigned to each word). This allows CQPweb analyses to be performed at the annotation level rather than the word level; for example, a researcher could look at which topics are more prevalent in different time periods.
One particular issue with the computational analysis of historical texts is the large amount of spelling variation generally present. We have previously shown that like all Early Modern English corpora, EEBO-TCP contains a large amount of spelling variation (Baron et al., 2009). It has also been shown that this spelling variation has a detrimental effect on the accuracy of various corpus linguistic techniques, e.g. part-of-speech annotation (Rayson et al., 2007) and keyword analysis (Baron et al., 2009). Here we show how the Variant Detector (VARD) tool (Baron & Rayson, 2009) can be used on EEBO-TCP to automatically insert modern equivalents alongside the original word-forms.
The preparatory steps of metadata extraction, spelling modernization and corpus annotation allow significantly more powerful and accurate computational analysis to be performed on EEBO-TCP.
Baron, A. and Rayson, P. (2009). Automatic standardization of texts containing spelling variation, how much training data do you need? In M. Mahlberg, V. González-Díaz and C. Smith (eds.) Proceedings of the Corpus Linguistics Conference, CL2009, University of Liverpool, UK, 20-23 July 2009.
Baron, A., Rayson, P. and Archer, D. (2009). Word frequency and key word statistics in historical corpus linguistics. International Journal of English Studies 20 (1): 41-67.
Baroni, M., Bernardini, S., Ferraresi, A. & Zanchetta, E. (2009). The WaCky Wide Web: A Collection of Very Large Linguistically Processed Web-Crawled Corpora. Language Resources and Evaluation 43(3): 209-226.
Culpeper, J. & Kytö, M. (2010). Early Modern English dialogues: Spoken Interactions as Writing. Cambridge University Press, Cambridge.
Rayson, P., Archer, D., Baron, A., Culpeper, J. and Smith, N. (2007). Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora. In Davies, M., Rayson, P., Hunston, S. and Danielsson, P. (eds.) Proceedings of the Corpus Linguistics Conference: CL2007, University of Birmingham, UK, 27th-30th July 2007.
Transcription has received relatively little scholarly attention, even within fields such as editorial theory or information-management. Often characterised as “data capture”, a preliminary to higher-level analysis, it has typically been outsourced in the production of critical or documentary editions since long before the advent of the Text Creation Partnership. Scholars seeking reliable transcriptions have a choice between outsourced keyboarding managed by a project such as EEBO-TCP; crowdsourcing; or doing it oneself or within an academic project. While outsourcing offers compelling economic advantages, the intellectual consequences of partitioning transcription from notionally higher-level editorial or critical knowledge-work have not been widely addressed. This paper will offer some perspectives on (with apologies to Peter Stallybrass and Allon White) the “politics and poetics of transcription”, based on the proposer’s practical experience in sourcing transcriptions of early English printed ballad-texts from several quarters, including TCP, for digital humanities projects. It will argue that transcription, far from being a mechanical or mundane activity, can be a demanding intellectual discipline. It will argue that transcription benefits from being carried out in a close proximity to textual, contextual and linguistic scholarship; and that scholarly users of TCP and other texts might profitably reflect on (and contribute to) the mechanisms through which their materials are sourced.
The Text Creation Partnership is not only revolutionizing research and teaching in early modern studies, but also shaping the methodology and relevance of more traditional bibliographical projects such as critical editions. One of the most ambitious projects to draw upon the resources presented by the TCP is the Hakluyt Project. This paper will explore the Hakluyt Project’s use of TCP, our reasons for choosing these resources over other options, and some of the problems we have had to solve.
The Hakluyt Project is producing a critical edition of Richard Hakluyt’s The Principal Navigations, Voyages, Traffiques, and Discoveries of the English Nation (second edition, 1598-1600), the most important collection of English travel writing ever published covering European activity and ambition from the New World to Muscovy, the Levant, Persia, the East Indies and Africa. Originally published in three massive folio volumes (approx. 1.76 million words), the modern critical edition of The Principal Navigations is under contract with Oxford University Press in 14 volumes, under the general editorship of Prof. Daniel Carey (National University of Ireland, Galway) and Prof. Claire Jowitt (Nottingham Trent University).
Producing such a large a critical edition, with 25 participating editors, has only become manageable with the advent of modern communications technology (email, video conferencing, etc.). Our starting point is the TCP text of the Huntington Library copy of The Principal Navigations, which we shall be treating as a “raw” text. This “raw” text will be corrected against the PDF images of the Huntington Library copy, available through Early English Books Online. It is invaluable for a large and international collaborative project such as ours to have a common reference point, which can be transmitted electronically, and as such the resources provided by EEBO and TCP are invaluable.
However, our two central aims in producing this edition also highlight the limitations of the material available through the TCP. The first of these aims is to produce an authoritative text of The Principal Navigations, yet the TCP version does not include an important text cancelled from the 1598 edition of volume one (the so-called “Cadiz leaves”) which raises important questions of copy text. The second is to provide annotations and references to guide the reader through Hakluyt’s dense, lengthy and often difficult text. In using TCP and EEBO more generally, it is important to understand the nature and reliability of the material they make available. The Hakluyt edition aspires to a level of accuracy and contextualisation which TCP is not trying to match; nonetheless TCP has provided invaluable material for the practical process of producing such a large and difficult critical edition. This paper with a call for debate and discussion in the academic community of the character and origin of the material available through TCP, the creative possibilities it raises, and metadata presented alongside it.
The proposed paper forms part of a wider study into the material production and contemporary reception of writing on witchcraft in early modern England. Assessing the reception of printed demonology has involved comparative study of the citation and use of both English and Continental writing on witchcraft, both within the discourse and without. This research would not have been possible without the opportunity for text-search opened up by EEBO-TCP. Full-text searches were used to build up a database of citations, the results of which offer a reassessment of the relative importance of particular works on witchcraft during the period. The research also makes a contribution to our wider understanding of reading habits during the period. This paper will present the results of this research, alongside discussion of the benefits and difficulties of using EEBO-TCP to obtain the results.
Public revenges are for the most part fortunate; as that for the death of Caesar; for the death of Pertinax; for the death of Henry the Third of France; and many more. But in private revenges, it is not so.
Francis Bacon, Of Revenge
This conference interrogates how EEBO has transformed twenty-first century approaches to early modern studies. With this overarching question in mind, and following on from the previous 2 papers on this panel, our talk explores the wider pedagogical and research implications of using Lancaster University’s tool CQPweb to stimulate innovative ideas in terms of late fifteenth- and sixteenth-century literary texts. In sum, we suggest ways in which a cultural materialist critical practice can be nuanced by Paul Rayson and Stephen Pumfrey’s ground-breaking project.
The first half of the paper visits undergraduate teaching. As the quotation from Francis Bacon at the head of this abstract suggests, an important topic for many courses is the discursive texture of early modern England’s concepts of justice and revenge. Using Shakespeare’s First Folio (1623) and the syllabi of Lancaster’s Level 3 year-long programme of study of the author’s writing, Liz Oakley-Brown explores some of possibilities for presenting third-year students with the ideological issues at stake while simultaneously facilitating close textual analysis.
In the second half of the 20-minute discussion, Alison Findlay considers the ways in which EEBO and CQPweb have allowed her to take up and develop her understanding of gender and revenge originally published in Feminist Perspectives in Renaissance Drama (1999) and more recently examined in her keynote talk at last week’s conference at Bristol University, Female fury and the masculine spirit of vengeance (5-6 September 2012).
This paper shows how the phrase “new philosophy” went through a semantic shift during the seventeenth century. It demonstrates how the searchable texts of Early English Books Online can be used to explore semantic changes by looking in particular at the way the phrase “new philosophy” was used throughout the century and charts the semantic changes it went through. By tracing the trajectory of the phrase through the EEBO-TCP corpus it can be seen that the phrase had a semantic shift, going from being a derogatory term for Copernicans at the start of the seventeenth century to being the esteemed name for the new paradigm of natural philosophy by the eighteenth century.
Looking at the semantic field of the term and comparing it to the discourses that it was embedded in can highlight the semantic shift. New philosophy was connected to the debate regarding the ancients and moderns and was involved in the discourses of natural philosophy; it was the combination of both discourses that created the climate in which new philosophy was conceptualised. The first section of the paper will focus on reconstructing the discursive trends in the corpus regarding novelty, modern and attitudes to the past, detailing the linguistic patterns over the century. In it the frequency of certain key words related to the ancients and moderns shall be given to provide a survey of the linguistic contours that the following sections shall use as the point of departure to look at in greater detail.
With this broad outline of the discursive trends that are found in the seventeenth century surrounding novelty, modern and the ancients I will elucidate on the dynamics found in the use of new philosophy. Using occurrences of the phrase in the EEBO-TCP corpus I will flesh out the specifics of how the dialogue regarding new philosophy developed and in particular it will relate the rise in use of new philosophy to discursive changes found relating to natural philosophy in the EEBO-TCP corpus. New philosophy emerged and was re-appropriated as a positive term as a philosophy based on the ancients became problematic and undesirable. The failure of scholastic philosophy resulted in a fragmentation of philosophy and a plethora of rival philosophies to replace it; new philosophy became salient as a label to distinguish itself from the now unpopular ‘old philosophy.’ The result of this is that we can see how semantic shifts are embedded in the linguistic structure of the time. The significance of the semantic shift in the phrase ‘new philosophy’ and the change in the discourse of philosophy is that it is potentially representative of a paradigm shift within natural philosophy. This paper demonstrates how EEBO can be used to provide insight into linguistic changes and in so doing it highlights a discernible change from scholastic philosophy to a rational experimental philosophy that could be used as a defense against revisionism for a Kuhnian paradigm shift.
The availability of TCP-EEBO texts enables research-based teaching, and indeed research-discoveries by undergraduates. Textlab, a course which is part of the Vertically Integrated Projects (VIP) initiative at The University of Strathclyde, integrates students from all levels, and seeks to foster collaboration between students and faculty from a variety of disciplines and promote research-led teaching on Early Modern literature. Students work in teams, testing newly developed software programmes to identify specific linguistic features of literary texts. Textlab is related to the international “Visualizing English Print from 1470-1800” project (funded by the Mellon Foundation), which involves collaboration between The University of Strathclyde, UW Madison, and the Folger Shakespeare Library in Washington, DC. This project’s goal is to trace how computer-aided analysis of texts can help us track the emergence of linguistic and genre forms in the Early Modern period of the English language.
In the first iteration of TextLab, students used tools developed by members of the Visualizing English Print project and others to investigate Shakespeare’s language. They were immediately capable of analyzing Shakespeare, making genuine discoveries about his texts. For example, it is unexpected that she will be an especially important word in The Two Gentlemen of Verona, but as Jessica Wagstaff, a student enrolled in the course, points out that “language in The Two Gentlemen of Verona focuses heavily on the discourse of women as object[s] of attainment”, adding that the feminine pronoun she is far more likely to appear in The Two Gentlemen of Verona, but only as the object of the male characters: women in this play are discussed but are not playing an active role. And Jamie-Leigh Green, another student, discovers that “love” rarely appears in The Tempest, despite it being categorized as a “romance” – only to discover that Shakespeare’s late plays are only loosely connected by this generic division – which definitely does not correlate to our modern conception of “romance”.
Naturally, the next step is to enter the larger world of early Anglophone print: the period itself is host to types of linguistic variation that span the rise of several major generic forms, Atlantic exploration and colonization, and orthographic standardization. EEBO is poised to become the leading depository of searchable Early Modern records. One of the driving forces behind Visualizing English Print is that through the digital analysis of non-canonical texts, larger patterns of linguistic practice can be identified and explored.
In this presentation we outline the structure of Textlab and suggest ways that the EEBO-TCP initative can be implemented into our existing structure, as well as possibilities for further research. We also suggest ways that English Studies and Computer Science students can be involved in large-scale digital humanities projects such as the Visualizing English Print. This presentation will be suitable for anyone interested in how such digital text analysis software programmes can be integrated into teaching about language and literature, as well as how major international research projects can be integrated into teaching (and vice versa).
For some time now bibliographical and textual matters have been an important part of more mainstream, traditional literary scholarship, rather than, as once was the case, a separate discipline; and yet when we teach the Renaissance we tend to do so using modern editions. Today’s students are the “Norton generation” – exposed to a version of literary history where difference is flattened out, and all texts look the same. Four years ago I devised an undergraduate special option, open to final-year students, designed to introduce students to the complexities of the early modern text and the diverse practices that produced it. Using the resources available on EEBO students select a short text or texts for editing and produce a critical edition that conforms to modern editorial conventions. This paper outlines the rationale of the course, how it works, and the kinds of issues that arise when we use EEBO in the classroom. I will draw on examples of case studies produced by Reading University students to highlight some of the theoretical and practical matters that arise. Particular attention will be paid to what has proven to be a central issue for students, namely how and when (and when not) to modernise spelling, and how the practicalities of using EEBO in teaching extend the range and scope of what undergraduate students understand by the term “the Renaissance”.
One of my happiest minor accomplishments after taking up my first job was to convince my faculty to invest in EEBO-TCP; since then, in part owing to its considerable expense and my role in sinking our ever-fewer dollars into it, I have made concerted efforts to integrate the database, in innovative ways, into every course I have designed and taught. I have, among other things, used this resource to:
- supplement or replace traditional textbooks;
- study “core samples” of print culture from particular years;
- compare editions of the “same” text or title across decades;
- provide rough copy-text for student-made editions;
- help students inform each other about contemporary contexts for syllabus texts;
- highlight the meaning of materiality in textual studies;
- compare digital mediations with “the thing itself” in our rare book room; and
- demonstrate the potential and pitfalls of so-called “full-text” searching.
Based on this variety of experiences I would like to offer conference delegates a modest assessment of what does and does not appear to work optimally with EEBO-TCP, in its current incarnation, in various pedagogical contexts. I will offer suggestions, in part based on my students’ comments and critiques, of which features might improve its role in the classroom and outside it. My primary aim, however, will be to share the nature and success of specific assignments and other pedagogical strategies for bringing out the best work with EEBO-TCP from students who can be remarkably, if selectively, technophobic (although they wrangle their smart-phones just fine) and who are often poorly prepared for working responsibly with the textual artifacts that EEBO offers or for understanding the degree to which those artifacts are mediated and transformed — inevitably, both for better and for worse — through their digital representation.
I would also like to speak more broadly to my own distinctly mixed experiences as an early adopter and sometimes beleaguered champion of the database in a discipline more wary of change, technology, and large-scale (as opposed to piece-meal) expense than I used to know. I will therefore address the push-back I have experienced with respect to what one critic identified as “EEBO-driven” research and teaching. With this phrase in mind, I will contest the notion that the database can ever be the one in the driver’s seat; rather, EEBO is a vehicle that permits those willing to climb aboard to explore remote textual and historical places, and to reach and move across them at paces unprecedented before the adoption of this database, particularly by institutions and scholars who were formerly resource-poor with respect to rare books and other modes of access to early printed cultural remains.
The efficiency of search engines is based on the principle that the information sought can be retrieved by “looking for words” conveying the information. This amounts to taking for granted that words are always written in the same way. This view, which is well adapted to texts produced in contemporary periods of language history, is not suited to texts produced during the French Renaissance, and what is true for early French is true for early English too.
It is therefore necessary to adapt search engines based on word form identification if they are to render the service expected. Several strategies can be envisaged and the purpose of this paper is to focus on one which resorts to linguistic expertise.
A Java program was developed, which first transforms a list of words into an extended list of forms, using that for a rules set, based on linguistic knowledge about morphology and spelling history. Having done this, the need is to localize the different forms attested in the old spelling in a text, according to the requested form. Hence one can identify two “phases”.
(1) Generating the extended form of the request: at this step, the program generates 3 files; 2 of them are dedicated to synthetic information about the process (using the rules: how often they have been used) and the end result (how many forms generated). The third one is a file containing, for each word, the list of the generated forms as well as the rules used in the process.
(2) Finding the right form within a text: when the extended request is calculated, the ultimate test is to identify all the variants really attested in the text. This is the second phase of our program. The output file of this last part of the process is an HTML file with a graphical highlighting (or bold character) of the identified variant. Moreover, each form is connected to a bubble showing the rules used to derive the variant.
The present paper aims to describe the tool which was first developed in the particular context of the Virtual Humanistic Library, and then to show the way to adapt it to early English.
Phil Burns and I are engaged in a project that will create a linguistically annotated corpus of the EEBO TCP texts. We envisage this as an important first step towards something that one might call English Epochs Electronically or, more simply, a Book of English and that will consist of
- large, growing, collaboratively curated, and public domain corpus
- written English since its earliest modern form
- full bibliographical detail
- and light but consistent structural and linguistic annotation
Non-linguists have little patience with the jargon of that discipline and feel some sympathy with the charge that Shakespeare’s peasant rebel Jack Cade brought against the Lord Say:
It will be proved to thy face that thou hast men about thee that usually talk of a noun and a verb, and such abominable words as no Christian ear can endure to hear.
2 Henry 6, 4.7.35-39
But light linguistic annotation is best thought of as metadata at the word level that introduce some rudiments of readerly knowledge in a manner that a machine can process. Brian Athey at the University of Michigan observed that “Agile data integration is an engine that drives discovery.” In the world of text-centric disciplines such agility is achieved by corpora whose texts are surrounded with metadata at the top level of bibliography, the mid level of document structure, and the bottom level of individual words. If the levels of this triple-decker structure of metadata can be queried separately or in combination you have a an excellent foundation for the digitally assisted analysis of many historical, philological, rhetorical, and thematic questions.
The project will involve improving MorphAdorner and the training associated with it. (A proposal for partial funding is currently under consideration by the Mellon Foundation) MorphAdorner is a broadly based Natural Language Processing (NLP) tool suite that was developed in the context of the Wordhoard and MONK projects and has been used for a variety of texts from the early 1500’s to the present day. Its chief comparative advantage, however, lies in its treatment of Early Modern data. The improvements in MorphAdorner will focus on
- tokenization and the intersecting problems of identifying abbreviations and establishing sentence boundaries
- detecting and tagging non-English text regions
- procedures for partial or iterative annotation of heterogeneous text regions where these are explicitly marked in XML encoding
- Fully TEI-compatible output options
- Automatic detection and correction of incompletely or incorrectly transcribed words
Work on the training data will pursue divide and conquer strategies that attend to the heterogeneity of the corpus and of many individual texts in it. We will create a crude time/genre grid of the corpus, extract samples from them, “MorphAdorn” those samples, review them for systemic errors, adjust MorphAdorner routines, and iterate the process. In the final run, different segments of the EEBO corpus will be annotated on the basis of different training data, and parts of many texts will be separately annotated.
EEBO is often viewed as a searchable digital archive of microfilmed page images of early printed books. Alongside these digital images are the TCP XML-TEI transcriptions of at least one edition of the texts on the site. It is these files that have the potential to change EEBO from a relatively closed archive to an open resource. Recent digitising projects have begun to use these TCP files as the basis for digital edition and added tagging and introduced levels of user functionality. This paper will launch the Verse Miscellanies Online website, and, using this project as a case study, will provide an account of what can be done with TCP files in terms of the production of scholarly editions.
Our paper uses the powerful corpus linguistic tool CQPweb to interrogate the searchable texts of EEBO phase 1 to investigate how discourses of liberty changed from 1473 to 1700. We show how it was transformed from an overwhelmingly religious term (liberty from sin, libertine behaviour, etc) to a more secular one of the liberty and rights of the individual). In doing so we exemplify the extraordinary new research questions, and answers that EEBO-TCP has made possible.
The research we will present arises from a unique collaboration between a computational linguist (Rayson) and an intellectual historian (Pumfrey), which has led to even wider interdisciplinary collaborations in Lancaster University as evidenced in this panel. It builds on the combined research methods developed in their first published paper, “Experiments in Early Modern English” (Literary and Linguistic Computing [forthcoming, 2012]), which analysed the emergence of the scientific concept of experiment. That paper, and the one to be presented here, demonstrate the importance in framing research questions and in generating meaningful answers of collaboration between the corpus specialist and the traditional humanistic reader of texts on EEBO.
Building on the methods presented in the first paper in this panel (Baron and Hardie), we will describe the advanced methods available from computational linguistics that can be applied to the EEBO-TCP texts when viewed as a corpus. For thirty years, corpus linguists have used large collections of machine-readable texts for studying language. They have collected representative samples of written text and transcribed spoken language and compiled them into corpora for the analysis of linguistic features and language varieties. Typically, grammatical annotation (part-of-speech tagging) has been applied to assist in the study of lexical and grammatical features. In this paper, we present another level of linguistic annotation at the level of word meaning. Semantic tagging assigns a label to each word or phrase in the corpus to enable it to be located within a semantic field taxonomy. Once the whole corpus has been analysed in this way, it permits the search for concepts rather than simple words and phrases. Our research in this area is preliminary since the automatic semantic tagger was trained for modern data and the tagset was derived from the enterprise of dictionary building in the late 20th century. Thus both may need backdating in order to improve their applicability to EEBO-TCP. One aim of the larger research enterprise under which this panel fits is to uncover just how applicable such tools are in the historical setting.
Concerning the changing discourses of liberty, our results show a rapid transformation from the religious to the secular in the decades following the outbreak of the English Civil War. This is the period when Thomas Hobbes and the young John Locke were framing new political theories based on their analyses of human nature and human rights. Combined with simultaneously secularising concepts of experiment, philosophy, etc., our results have great significance for big questions about the construction of modernity, and even “Enlightenment” in late seventeenth-century England. Our work shows that when the EEBO-TCP texts are properly interrogated as a corpus they have the potential to transform our understanding of the history of early modern England itself.
This paper addresses some of the practical problems of working with the underlying digital files prepared by the Text Creation Partnership using tools developed for standard TEI.
The 40,000 files of the TCP EEBO collection have been built up over the last decade using the markup technology of the 1990s, namely SGML and variations on the third edition of the Text Encoding Initiative Guidelines, to a gradually increasing standardof consistency. The delivery of the texts through the conventional web site works well, but problems arise if we want to take advantage of some of the tools now commonly used to process digital files, particularly those based on the current TEI recommendations. This involves transforming the SGML markup to XML, and then to the latest edition of the TEI (P5). We will investigate some of the problems involved in this type of conversion, such as:
- changes needed to the TEI Guidelines themselves to cover textual phenomena identifiedin EEBO which cannot adequately be described in current TEI recommendations
- decisions needed to map some of the variants adopted by TCP back onto canonical TEI P5 markup
- testing whether the conversion has lost any content along the way
We will present some of the software we have developed for the TCP conversion, and show how it can be delivered in a production environment. The exercise of transformation gives an interesting opportunity to examine some of the encoding of TCP texts, analyze the range of textual phenomena which are recorded, and predict which structures which will be amenable to discovery by future scholars.
The 40,000-text corpus of TCP also provides a good test of general TEI tools. For this paper wedescribe some tools and the results we found when using them on TCP texts. As a case study we examine the generation of ebook editions (ePub format) of the TCP texts from the converted TEI. The results of such conversions will be assessed for their usefulness for contemporary readers and any failures in representing the intellectual content of the original text.
This paper will discuss the motivations and challenges of creating a database of poetic form based on EEBO-TCP texts. We are current completing a pilot version of a database that will eventually contain information on the genres, metrical structure and rhyme schemes of all poetry in early modern printed texts. In this resource, users may search a variety of criteria, searching specific poetic forms by name (“ode”, “ballad”), or metre (“iambic tetrameter”), or rhyme scheme (“abcb”) and then refine search results within a time period, or according to author, or place of publication, or language. They may also enter information about a poem, such as its rhyme scheme, and be able to view similar or identical forms in the database.
Our strategy is to create an interface which will interrogate marked-up EEBO-TCP texts, as users present queries. We employ a Text Encoding Initiative (TEI) P5-compliant scheme to enrich the relevant encoded texts, using a base tag set for the encoding of rhyme scheme and metrical information, along with a rhyme element to support simple analysis of rhyming words. The primary function of these tags is to encode the conventional metrical or rhyming structure within which a poet is working. By making this information available to users, the database will enable users to ask the kinds of questions about poetic form that we currently ask about words and phrases in EEBO-TCP texts, such as what the origins of a given form are, how does its structure and use change over time, whether a form relates to others and how this relationship develops over time.
A database of poetic form offers an exciting new direction for research and teaching using EEBO-TCP. Online resources using EEBO-TCP texts have enabled students and scholars to conduct literary historical research in novel and exciting ways, allowing users to discover obscure publications and pursue new lines of enquiry through keyword-searchable full texts. The TCP has helped to revolutionize the way we study words and phrases, but our current online resources based on EEBO-TCP are not designed to support sophisticated analysis of the formal properties of early modern texts. As scholars begin to reassess the relationship between formal and historical study, we need electronic resources that will enable us to study form in a historically-rooted way, stimulating new questions about parallels and differences across literary periods.
This paper will outline the kinds of data this resource could provide, and the kinds of questions we would want to ask of it. In addition, the paper will address some of the theoretical and practical challenges involved in encoding formal features in EEBO-TCP texts, and will suggest ways in which this process might be conducted on a larger scale in the future. Finally, the paper will include screenshots, and perhaps a brief demonstration of the pilot database, in order to explore how such a resource might be used in future teaching and research.
In a recent seminar at Queen’s University Belfast, an eminent scholar mentioned his discomfort with full-text searches of EEBO: “It always feels like cheating”. One year earlier, I probably would have agreed. In the early stages of my dissertation, I was frequently, but somewhat sheepishly, conducting full-text searches – grateful to EEBO-TCP for making such investigations possible, but worried that such an approach was somehow not worthy. Now, in the final months of my PhD, I have learned to stop worrying about the legitimacy of full-text searching, and have come to believe that EEBO-TCP offers particular opportunities for considering the confluence of humanist rhetoric and the rise of “cheap print”.
In this paper, I will briefly review how full-text searching led me to question some of the heuristics that have shaped recent research in my dissertation’s area of interest. Far from providing a methodological shortcut, full-text searching of EEBO consistently presented scholarly challenges, such as beckoning me to cross perceived boundaries of genre, period, or disciplinarity. Although I resisted these seeming distractions at first, following those leads eventually allowed me to discover a previously unrecognized connection between the seventeenth-century rhetoric of colonial ‘plantation’ and a particular theme that recurs in Protestant pamphleteering from the previous century.
Recognizing a layer of meaning in early modern colonial texts that has been transparent to modern criticism was very valuable for my dissertation, of course. On a more general level, however, tracing plantation’s appearances in EEBO also led me to see the word’s rise to prominence in the Jacobean era as a product of humanist rhetoric at its apex and “cheap print” in its nativity. Joad Raymond has argued that scholars should avoid the anachronism of applying either the theory or the terminology of the Habermasian “public sphere” to the early modern era. He has suggested, instead, that we pay closer attention to the terminology that early modern writers themselves used to discuss discourse (see, for example, his introduction to Cheap Print in Britain and Ireland to 1660). Building on Raymond’s explorations of early modern approaches to the word popular, this paper will suggest that we also listen to the nuances of early modern construction, and consider how humanist rhetoricity and cheap print may have combined to shape what might be thought of as the “popular construction” of various keywords in early modern English. Among its other advantages for teaching and research, this model, unlike the Habermasian ideal, seems to share much in common with the workings of contemporary discourse. Thus, my paper will suggest that EEBO-TCP offers today’s scholars and students a valuable historical “laboratory” for exploring a more general phenomenon: how meaning is made when rhetoric is cast broad via new technologies.
There are at least 550 early modern plays for which there survives some evidence, but not a full playscript. Long neglected, these lost plays have recently become the focus of considerable scholarly interest. EEBO-TCP is a powerful and elegant tool for establishing new factual information about them.
This paper arises from my forthcoming book, Digital Humanities and the Lost Drama of Early Modern England: Ten Case Studies (Ashgate, 2013), which makes heavy use of EEBO-TCP. In this paper, I first briefly discuss problems and opportunities I have encountered, including theoretical questions, such as the assessment of negative evidence; and techniques, such as methods of proximity searching. Then, I offer a case study.
Henslowe’s Diary records that, in January 1600, Thomas Dekker was at work on a play entitled Truth’s Supplication to Candlelight. Previous scholarship has tended to assume that this was an allegorical history about the reign of Queen Elizabeth, which survives in modified form as The Whore of Babylon. This conjecture, in turn, feeds into our sense of the whole dramatic culture of the years 1599-1601, that period of particularly crucial interest, not least for studies of Shakespeare. This paper uses EEBO-TCP to examine the lost play’s title. In the process, it also uncovers a seam of surprising evidence about the early reception of Thomas Nashe. EEBO-TCP seems to show that Truth’s Supplication to Candlelight was in all likelihood not an allegorical history play, but rather a satirical comedy set at night.
The Early English Books Online-Text Creation Partnership (EEBO-TCP) has now been working continuously for more than 12 years to produce accurate encoded text versions of the books represented as facsimile page images in Early English Books Online. To date, we’ve produced more than 45,000 texts, working toward the goal of producing one edition of each English-language work represented in EEBO — around 70,000 works in all.
This presentation will review EEBO-TCP’s unique organizational structure and business model, describing the challenges and benefits of working closely with a commercial publisher and with our many supporting partner libraries. In addition, I will give an overview of the EEBO-TCP production workflow: where do the texts come from, where do they go (and where might they go), and what does it all cost?
In the dozen years that EEBO-TCP has been working steadily through the EEBO corpus, there has been significant development in the available tools, support, and opportunities for working with electronic text — many of which we will learn about throughout the conference. In other words, the landscape in which EEBO-TCP was born has changed dramatically, leading to new challenges and opportunities for our work. This update will conclude by describing some of these, with the aim of opening a discussion that will continue, in some form or another, throughout the conference.
How might we understand Spenser’s ‘Letter of the Authors’ as influencing readers’ responses to the 1590 Faerie Queene? While the question is time-honored, the insight provided by new research methods sheds surprising light. Not only do EEBO text images illuminate the ‘Letter’s’ role within the overall volume, but TCP searches also reveal the document’s further resonance.
As discussed by Andrew Zurcher (2005-6), publication of the 1590 Faerie Queene was an event of substantial scope, size, and complexity. Yet as Michael Murrin (1997) has noted, the poem’s publisher, William Ponsonby, “kept” it “in quarto volumes” and “issued it in installments.” These choices suggest a “broad, mixed audience” as envisioned for the work and raise questions of the poem’s reception in circles beyond the courtly. As has been widely noted, the three-book 1590 Faerie Queene opts to place nearly all of its dedicatory materials, including the ‘Letter’, at its end.
While this unusual feature may reflect exigencies of the printing process, any new content revealed in the concluding ‘Letter’ would have also influenced further readings of the main text. Notably, the ‘Letter’ foregrounds Redcrosse’s “clownishe” origins, counterpointing his eventual rise to prominence through repeated — if problematic — actions supported by grace. Understanding Book I as echoing Aristotle’s concern, in the Nichomachean Ethics, with repeated, habitual action as shaping and perfecting the “morall vertues” casts intriguing light on both the overall poetic structure described in Spenser’s ‘Letter’ and the terms he offers for the poem’s interpretation.
Historically, critics have generally explored the “twelve priuate morall vertues” that “Aristotle hath deuised” to ask which classical quality might correlate most closely with Book I’s “Holyness.” Yet this focus overlooks a deeper structural connection. Attending to the terms in which moral virtue is formed in Aristotle calls forward the repeated, doubled actions also found at The Faerie Queene’s heart. Viewing Book I’s two main figures—one a noble youth “by Timon thoroughly instructed,” the other a “rusticke” initially raised “in ploughmans state”— as engaged in parallel actions highlights their analogous, if unequally developed, capacities while underscoring the vital role played by proper education in each’s formation and realized character.
While Italian commentaries would have affirmed Aristotelian links among virtue, repeated action, and personal qualities to the university-educated Spenser, similar dynamics of “vertuous” fashioning would have been familiar to the far greater number who had experienced the grammar school’s rigors, as attested to by broadly circulated English educational treatises such as Thomas Elyot’s 1530 The Boke Named the Governour and Roger Ascham’s 1570 The Scholemaster.
In light of the ‘Letter’s’ emphasis upon Redcrosse’s apparently humble origins, the moment when his underlying nobility is “red aright” gains both poignancy and power. As Murrin’s work suggests, Book I’s trajectory of knightly development marries a growing popular taste for continental romance with educational, cultural, and social aspirations of England’s growing commercial class. Accepting the ‘Letter’s’ invitation to read Book I anew suggests the text’s possible appeal to broader audiences while drawing attention to further choices made in the expanded poem’s future.
The use of EEBO-TCP materials for research is quite expected, but what may be less familiar to those who use the online resource is the construction of new resources based on the underlying files. These use a variant of the recommendations of the Text Encoding Initiative to encode the textual phenomena present in the early printed books. InfoDev (and the computing services in Oxford generally) have been involved in a number of projects which make bigger and better resources based on the underlying EEBO-TCP files. One of the first steps is conversion of the EEBO-TCP SGML to modern TEI P5 XML, but that is only the start. Projects such as The Holinshed Project, the Electronic Database of Poetic Form, Verse Miscellanies Online, and Great Writers Inspire have all benefited from EEBO-TCP materials. This poster will look at common methodologies and technologies across these projects in an attempt to catalogue the lessons learned and codify the technological processes used in exploiting the richness of EEBO-TCP materials.
This poster examines the technical background to the Stationers’ Register Online (SRO) project based at the University of Oxford.
The pilot SRO project received institutional funding from the Lyell Research Fund to transcribe Arber’s and Eyre and Rivington’s editions of the book-entry Register of the Stationers’ Company, (1554 to 1640, and 1640 to 1708).
The Stationers’ Register is arguably the most important primary source for the study of the history of the book in Britain other than the books themselves. The Register was the primary means through which ownership of texts was asserted, disputed, regulated and monitored from c.1577 until 1924, and survives intact in two series now held in Stationers’ Hall and at the National Archives.
As part of the preparation for the digitisation of the earliest volumes of the Register by a keying company, the project created a byte-reduced encoding schema for concise digitization without loss of intellectual content. The motivating factor for this is that the keying company charges per kilobyte of output. While XML output was desired, there were significant savings to be had from reducing file size: around a 40% reduction. The transformation stylesheets to convert the transcriptions to “pure” TEI P5 XML from the reduced version created for the project also included automatic recognition and transformation of dates, and fees in Roman numerals.
The poster will examine the creation of this schema and the benefits and challenges of converting basic presentational markup to richer semantic encoding. Additional technical topics under consideration for this poster will be the management of encoding guidelines for the keying company, and quality assurance and additional editorial input, such as disambiguating names, of the resulting data that they provide. The poster will also include an overview of the data’s potential uses: enriching bibliographical databases, such as the English Short-Title Catalogue, and book-historical databases, such as the London Books Trade.
The National Library of Wales has been responsible for editing approximately 200 texts for EEBO-TCP, and many other English texts of Welsh interest. From the earliest printed book in the Welsh language in 1546, a prayer book entitled Yny lhyvyr hwnn (‘In this book’), to the popular Almanacs of Thomas Jones in the seventeenth century, EEBO-TCP is a vital resource for those pursuing studies of the Welsh language, its literature, the invaluable translations of religious material into Welsh, and many other disciplines of the Early Modern period. This poster presents some of the key Welsh publications from the period, along with some of the issues faced when editing Welsh language texts.
The Folger Digital Folio of Renaissance Drama for the 21st Century (F21) is an ambitious project. Its ultimate goal is creation of interoperable digital editions of some 500 plays by William Shakespeare’s contemporaries written or performed between 1576 and 1642. At least 350 of these plays are part of the Folger Shakespeare Library’s collection. The richness of this collection — and the possibilities these works have for teaching both Renaissance drama and the techniques of digital transcription — make it an ideal home for such a project.
Nearly 400 of the in-scope plays have been digitally transcribed as part of the Text Creation Partnership (TCP). Perhaps the most interesting aspect of this project is that we intend to have undergraduates from a number of partner institutions correct and upgrade — on the basis of comparison with page images — the textual encoding and transcriptions. The project pilots a model of large-scale crowd-sourcing with undergraduates that might be scaled up and repeated in other humanities projects, while also promoting a new mode of scholarship at liberal arts institutions and producing the high-quality interoperable collection of plays required for new modes of machine-assisted scholarship. “Interoperable” here means that these editions can be used as the basis for automated, algorithm-driven, corpus-based queries and comparisons that make generalizations across the full collection of texts. The F21 Project aims to provide a powerful, sharable digital version of early modern drama to researchers — so that other forms of critical analysis can be undertaken on this corpus — while allowing undergraduates to claim some pride of ownership in the creation of individual editions.
This poster introduces the SECT: Sustaining the EEBO-TCP Corpus in Transition project, co-host of the EEBO-TCP 2012 conference.
SECT is based at the Bodleian Libraries, and funded by JISC under their Digital Preservation and Curation strand. It runs for 18 months, from February 2012 – July 2013.
SECT is carrying out an investigation into the sustainability of the EEBO-TCP corpus and aims to develop strategies to secure a sustainable future for the collection. Partnering with the Oxford Internet Institute (OII), the project will carry out a benchmarking study and subsequent analysis of EEBO-TCP.
Throughout the conference, we will seek to gather views and opinions of delegates, expert users of EEBO-TCP’s texts, on the present and future of the corpus.