|
|
An Introduction
H. Joachim Neuhaus
1. Textual information systems
The design for the Shakespeare Database project is part of an
ongoing research programme to develop integrated textual
information systems. Instead of textual information systems we
could also speak of
philological information systems
in
order to stress a certain continuity of scholarly methods and
goals in dealing with texts, even though the methodological
potential is much greater and will certainly cause far-reaching
changes in the field. No doubt, in the future there will still be
a legitimate interest in literary texts, which pursues the single
case, the crux, and many philologists may continue in the
Notes
and Queries
tradition. But there will be an electronic
docuverse to integrate and transport these efforts.
Such new information systems go beyond traditional electronic
text retrieval and take full advantage of the more recent tools of
database management, expert systems
and the technical possibilities of
hypermedia
applications. The Shakespeare
texts are sufficiently complex objects to make simple
demo-solutions quite impractical. But they are still manageable in
terms of size and storage space so as to be published on a single
compact disk (CD-ROM). And the Shakespeare texts have probably
been more thoroughly studied, annotated and edited than any other
modern text. There is a vast amount of factual information and
critical assessment for knowledge-based systems to be built upon.
Another incentive for developing such integrated information
systems has been a growing dissatisfaction both with
conventional full-text retrieval software
and with
standard database query languages.
Such query languages are often much too
complicated as a database interface for ordinary users, but also
for experienced users who are using systems as heuristic devices.
Full-text retrieval even if enhanced by wildcard-options, context
specifications, and nearby parameters in sense of regular
expressions is much too weak a tool for serious linguistic or
literary information retrieval. At best it may be used for getting
clues, and it assumes professional background knowledge to be
employed to advantage. It should not be surprising that the impact
of full-text retrieval so far on modern editing or literary
criticism has only been marginal. There has not yet been an
electronic revolution in editing
as there has been a revolution due to
methods and results of the new bibliography some decades ago. For
many authors as well as for Shakespeare we are still using
commentaries, grammars, and dictionaries first compiled and edited
in the 19th century as standard reference works. Quite often these
reference works, such as E. A. Abbott,
A Shakespearian Grammar
(first published in 1869) are nowadays part of an electronic bundle of
disparate sources published as a fashionable CD-ROM product.
In spite of numerous published concordances and dozens of
electronic retrieval packages the electronic text still plays a
conspicuously secondary role. There is no published electronic
Shakespeare edition, which tries to demonstrate the potential of
the new philology, all publications are based on prior
conventional editions published in book format.
The problem with textual databases is less prominent simply
because there are as yet so few examples. The word
database
itself
is still widely used in the sense of a machine-readable textual
archive and not in the established technical sense of an entity-relationship
structure. The main disadvantage of these systems in
textual applications seems to be the admission that at present the
database designer is probably the only really successful database
user. This is due to the fact that database queries in
conventional systems presuppose a complete understanding of the
database architecture, the
database entities,
their
attributes,
and the
database relations.
In a time where for many
philologists and literary critics a word is still a word it seems
to be a bit frivolous to presuppose a structural knowledge of
linguistic lemmatization relations versus type-token relations, or
phrasal units versus morphological units to give just two minor
examples for possible database entities. In dealing with
Shakespeare there is still some confusion in this matter. The
popular BBC television-series
The Story of English
reinforced
a cliché by stating "Shakespeare had one of the largest
vocabularies of any English writer, some 30,000 words."
(McCrumm, R. (1986) 102. London: Faber). A prominent
Shakespearean, Stanley Wells, mentions Spevack's
Concordance
under
his entry "Vocabulary" and writes: "Spevack's
Concordance lists 29,066 different words in Shakespeare's works,
and 884,647 words altogether." (Wells, S. (rev. edn. 1985),
Shakespeare, An Illustrated Dictionary,
185. Oxford: University Press).
"Different words" here probably means text-types (i.e.
sets of equiform text-tokens). Shakespeare's vocabulary in the
sense of a lexical entry (lemma) is noticeable smaller, less than
20,000 lemmata according to our Shakespeare Database statistics.
Prior to the use of textual database architectures such basic
questions, as the size of an author's vocabulary, did not have
reliable answers. The database concept introduces a new level of
rigour, consistency, and completeness into textual philology,
since it presupposes clear definitions of all entities and
relations and at the same time enforces these definitions for each
and every case. The Shakespeare Database has built such a
consistent information structure, and is able to answer
conventional questions as well as new kinds of questions, which
traditionally had no way to be answered. This is the true new
potential for the editorial and critical enterprise.
The tremendous success of database technology in recent years
can be found in applications outside of textual studies where
there is an obvious or clearly defined internal structure. An
accounting system, a subscription system for a journal, an
inventory system, or a component system for a construction plant,
all these database systems have a clearly defined structure, a
typical user has straightforward queries and there is a routine
profile of database transactions. This is generally not yet the
case for textual databases. But in contrast to full-text retrieval
systems the dissatisfaction with database systems is clearly not
due to inherent limitations of the database concept itself. It is
the user interface which is currently much too ineffectual and
unsatisfactory for these applications.
Next page: Navigation
|
|
An Introduction
H. Joachim Neuhaus
1. Textual information systems
The design for the Shakespeare Database project is part of an
ongoing research programme to develop integrated textual
information systems. Instead of textual information systems we
could also speak of
philological information systems
in
order to stress a certain continuity of scholarly methods and
goals in dealing with texts, even though the methodological
potential is much greater and will certainly cause far-reaching
changes in the field. No doubt, in the future there will still be
a legitimate interest in literary texts, which pursues the single
case, the crux, and many philologists may continue in the
Notes
and Queries
tradition. But there will be an electronic
docuverse to integrate and transport these efforts.
Such new information systems go beyond traditional electronic
text retrieval and take full advantage of the more recent tools of
database management, expert systems
and the technical possibilities of
hypermedia
applications. The Shakespeare
texts are sufficiently complex objects to make simple
demo-solutions quite impractical. But they are still manageable in
terms of size and storage space so as to be published on a single
compact disk (CD-ROM). And the Shakespeare texts have probably
been more thoroughly studied, annotated and edited than any other
modern text. There is a vast amount of factual information and
critical assessment for knowledge-based systems to be built upon.
Another incentive for developing such integrated information
systems has been a growing dissatisfaction both with
conventional full-text retrieval software
and with
standard database query languages.
Such query languages are often much too
complicated as a database interface for ordinary users, but also
for experienced users who are using systems as heuristic devices.
Full-text retrieval even if enhanced by wildcard-options, context
specifications, and nearby parameters in sense of regular
expressions is much too weak a tool for serious linguistic or
literary information retrieval. At best it may be used for getting
clues, and it assumes professional background knowledge to be
employed to advantage. It should not be surprising that the impact
of full-text retrieval so far on modern editing or literary
criticism has only been marginal. There has not yet been an
electronic revolution in editing
as there has been a revolution due to
methods and results of the new bibliography some decades ago. For
many authors as well as for Shakespeare we are still using
commentaries, grammars, and dictionaries first compiled and edited
in the 19th century as standard reference works. Quite often these
reference works, such as E. A. Abbott,
A Shakespearian Grammar
(first published in 1869) are nowadays part of an electronic bundle of
disparate sources published as a fashionable CD-ROM product.
In spite of numerous published concordances and dozens of
electronic retrieval packages the electronic text still plays a
conspicuously secondary role. There is no published electronic
Shakespeare edition, which tries to demonstrate the potential of
the new philology, all publications are based on prior
conventional editions published in book format.
The problem with textual databases is less prominent simply
because there are as yet so few examples. The word
database
itself
is still widely used in the sense of a machine-readable textual
archive and not in the established technical sense of an entity-relationship
structure. The main disadvantage of these systems in
textual applications seems to be the admission that at present the
database designer is probably the only really successful database
user. This is due to the fact that database queries in
conventional systems presuppose a complete understanding of the
database architecture, the
database entities,
their
attributes,
and the
database relations.
In a time where for many
philologists and literary critics a word is still a word it seems
to be a bit frivolous to presuppose a structural knowledge of
linguistic lemmatization relations versus type-token relations, or
phrasal units versus morphological units to give just two minor
examples for possible database entities. In dealing with
Shakespeare there is still some confusion in this matter. The
popular BBC television-series
The Story of English
reinforced
a cliché by stating "Shakespeare had one of the largest
vocabularies of any English writer, some 30,000 words."
(McCrumm, R. (1986) 102. London: Faber). A prominent
Shakespearean, Stanley Wells, mentions Spevack's
Concordance
under
his entry "Vocabulary" and writes: "Spevack's
Concordance lists 29,066 different words in Shakespeare's works,
and 884,647 words altogether." (Wells, S. (rev. edn. 1985),
Shakespeare, An Illustrated Dictionary,
185. Oxford: University Press).
"Different words" here probably means text-types (i.e.
sets of equiform text-tokens). Shakespeare's vocabulary in the
sense of a lexical entry (lemma) is noticeable smaller, less than
20,000 lemmata according to our Shakespeare Database statistics.
Prior to the use of textual database architectures such basic
questions, as the size of an author's vocabulary, did not have
reliable answers. The database concept introduces a new level of
rigour, consistency, and completeness into textual philology,
since it presupposes clear definitions of all entities and
relations and at the same time enforces these definitions for each
and every case. The Shakespeare Database has built such a
consistent information structure, and is able to answer
conventional questions as well as new kinds of questions, which
traditionally had no way to be answered. This is the true new
potential for the editorial and critical enterprise.
The tremendous success of database technology in recent years
can be found in applications outside of textual studies where
there is an obvious or clearly defined internal structure. An
accounting system, a subscription system for a journal, an
inventory system, or a component system for a construction plant,
all these database systems have a clearly defined structure, a
typical user has straightforward queries and there is a routine
profile of database transactions. This is generally not yet the
case for textual databases. But in contrast to full-text retrieval
systems the dissatisfaction with database systems is clearly not
due to inherent limitations of the database concept itself. It is
the user interface which is currently much too ineffectual and
unsatisfactory for these applications.
|
Shakespeare Database Project,
Westfälische Wilhelms-Universität
Münster, Germany
Contact :
Shakespeare@uni-muenster.de
with questions or comments regarding this page
- All rights reserved. - Do not copy or redistribute
in any form.
Shakespeare Database © 1989-2005 H. Joachim Neuhaus
|
|