hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From whe...@apache.org
Subject [35/50] [abbrv] hadoop git commit: [partial-ns] Import snappy in hdfsdb.
Date Tue, 05 Jan 2016 19:52:35 GMT
http://git-wip-us.apache.org/repos/asf/hadoop/blob/cb5ba73b/hadoop-hdfs-project/hadoop-hdfsdb/src/main/native/snappy/testdata/lcet10.txt
----------------------------------------------------------------------
diff --git a/hadoop-hdfs-project/hadoop-hdfsdb/src/main/native/snappy/testdata/lcet10.txt b/hadoop-hdfs-project/hadoop-hdfsdb/src/main/native/snappy/testdata/lcet10.txt
new file mode 100644
index 0000000..26b187d
--- /dev/null
+++ b/hadoop-hdfs-project/hadoop-hdfsdb/src/main/native/snappy/testdata/lcet10.txt
@@ -0,0 +1,7519 @@
+
+
+The Project Gutenberg Etext of LOC WORKSHOP ON ELECTRONIC TEXTS
+
+
+
+
+                      WORKSHOP ON ELECTRONIC TEXTS
+
+                               PROCEEDINGS
+
+
+
+                          Edited by James Daly
+
+
+
+
+
+
+
+                             9-10 June 1992
+
+
+                           Library of Congress
+                            Washington, D.C.
+
+
+
+    Supported by a Grant from the David and Lucile Packard Foundation
+
+
+               ***   ***   ***   ******   ***   ***   ***
+
+
+                            TABLE OF CONTENTS
+
+
+Acknowledgements
+
+Introduction
+
+Proceedings
+   Welcome
+      Prosser Gifford and Carl Fleischhauer
+
+   Session I.  Content in a New Form:  Who Will Use It and What Will They Do?
+      James Daly (Moderator)
+      Avra Michelson, Overview
+      Susan H. Veccia, User Evaluation
+      Joanne Freeman, Beyond the Scholar
+         Discussion
+
+   Session II.  Show and Tell
+      Jacqueline Hess (Moderator)
+      Elli Mylonas, Perseus Project
+         Discussion
+      Eric M. Calaluca, Patrologia Latina Database
+      Carl Fleischhauer and Ricky Erway, American Memory
+         Discussion
+      Dorothy Twohig, The Papers of George Washington
+         Discussion
+      Maria L. Lebron, The Online Journal of Current Clinical Trials
+         Discussion
+      Lynne K. Personius, Cornell mathematics books
+         Discussion
+
+   Session III.  Distribution, Networks, and Networking:  
+                 Options for Dissemination
+      Robert G. Zich (Moderator)
+      Clifford A. Lynch
+         Discussion
+      Howard Besser
+         Discussion
+      Ronald L. Larsen
+      Edwin B. Brownrigg
+         Discussion
+
+   Session IV.  Image Capture, Text Capture, Overview of Text and
+                Image Storage Formats
+         William L. Hooton (Moderator)
+      A) Principal Methods for Image Capture of Text:  
+            direct scanning, use of microform
+         Anne R. Kenney
+         Pamela Q.J. Andre
+         Judith A. Zidar
+         Donald J. Waters
+            Discussion
+      B) Special Problems:  bound volumes, conservation,
+                            reproducing printed halftones
+         George Thoma
+         Carl Fleischhauer
+            Discussion
+      C) Image Standards and Implications for Preservation
+         Jean Baronas
+         Patricia Battin
+            Discussion
+      D) Text Conversion:  OCR vs. rekeying, standards of accuracy
+                           and use of imperfect texts, service bureaus
+         Michael Lesk
+         Ricky Erway
+         Judith A. Zidar
+            Discussion
+
+   Session V.  Approaches to Preparing Electronic Texts
+      Susan Hockey (Moderator)
+      Stuart Weibel
+         Discussion
+      C.M. Sperberg-McQueen
+         Discussion
+      Eric M. Calaluca
+         Discussion
+
+   Session VI.  Copyright Issues
+      Marybeth Peters
+
+   Session VII.  Conclusion
+      Prosser Gifford (Moderator)
+      General discussion
+
+Appendix I:  Program
+
+Appendix II:  Abstracts
+
+Appendix III:  Directory of Participants
+
+
+               ***   ***   ***   ******   ***   ***   ***
+
+
+                            Acknowledgements
+
+I would like to thank Carl Fleischhauer and Prosser Gifford for the
+opportunity to learn about areas of human activity unknown to me a scant
+ten months ago, and the David and Lucile Packard Foundation for
+supporting that opportunity.  The help given by others is acknowledged on
+a separate page.
+
+                                                          19 October 1992
+
+
+               ***   ***   ***   ******   ***   ***   ***
+
+
+                              INTRODUCTION
+
+The Workshop on Electronic Texts (1) drew together representatives of
+various projects and interest groups to compare ideas, beliefs,
+experiences, and, in particular, methods of placing and presenting
+historical textual materials in computerized form.  Most attendees gained
+much in insight and outlook from the event.  But the assembly did not
+form a new nation, or, to put it another way, the diversity of projects
+and interests was too great to draw the representatives into a cohesive,
+action-oriented body.(2)
+
+Everyone attending the Workshop shared an interest in preserving and
+providing access to historical texts.  But within this broad field the
+attendees represented a variety of formal, informal, figurative, and
+literal groups, with many individuals belonging to more than one.  These
+groups may be defined roughly according to the following topics or
+activities:
+
+* Imaging
+* Searchable coded texts
+* National and international computer networks
+* CD-ROM production and dissemination
+* Methods and technology for converting older paper materials into
+electronic form
+* Study of the use of digital materials by scholars and others
+
+This summary is arranged thematically and does not follow the actual
+sequence of presentations.
+
+NOTES:
+     (1)  In this document, the phrase electronic text is used to mean
+     any computerized reproduction or version of a document, book,
+     article, or manuscript (including images), and not merely a machine-
+     readable or machine-searchable text.
+
+     (2)  The Workshop was held at the Library of Congress on 9-10 June
+     1992, with funding from the David and Lucile Packard Foundation. 
+     The document that follows represents a summary of the presentations
+     made at the Workshop and was compiled by James DALY.  This
+     introduction was written by DALY and Carl FLEISCHHAUER.
+
+
+PRESERVATION AND IMAGING
+
+Preservation, as that term is used by archivists,(3) was most explicitly
+discussed in the context of imaging.  Anne KENNEY and Lynne PERSONIUS
+explained how the concept of a faithful copy and the user-friendliness of
+the traditional book have guided their project at Cornell University.(4) 
+Although interested in computerized dissemination, participants in the
+Cornell project are creating digital image sets of older books in the
+public domain as a source for a fresh paper facsimile or, in a future
+phase, microfilm.  The books returned to the library shelves are
+high-quality and useful replacements on acid-free paper that should last
+a long time.  To date, the Cornell project has placed little or no
+emphasis on creating searchable texts; one would not be surprised to find
+that the project participants view such texts as new editions, and thus
+not as faithful reproductions. 
+
+In her talk on preservation, Patricia BATTIN struck an ecumenical and
+flexible note as she endorsed the creation and dissemination of a variety
+of types of digital copies.  Do not be too narrow in defining what counts
+as a preservation element, BATTIN counseled; for the present, at least,
+digital copies made with preservation in mind cannot be as narrowly
+standardized as, say, microfilm copies with the same objective.  Setting
+standards precipitously can inhibit creativity, but delay can result in
+chaos, she advised.
+
+In part, BATTIN's position reflected the unsettled nature of image-format
+standards, and attendees could hear echoes of this unsettledness in the
+comments of various speakers.  For example, Jean BARONAS reviewed the
+status of several formal standards moving through committees of experts;
+and Clifford LYNCH encouraged the use of a new guideline for transmitting
+document images on Internet.  Testimony from participants in the National
+Agricultural Library's (NAL) Text Digitization Program and LC's American
+Memory project highlighted some of the challenges to the actual creation
+or interchange of images, including difficulties in converting
+preservation microfilm to digital form.  Donald WATERS reported on the
+progress of a master plan for a project at Yale University to convert
+books on microfilm to digital image sets, Project Open Book (POB).
+
+The Workshop offered rather less of an imaging practicum than planned,
+but "how-to" hints emerge at various points, for example, throughout
+KENNEY's presentation and in the discussion of arcana such as
+thresholding and dithering offered by George THOMA and FLEISCHHAUER.
+
+NOTES:
+     (3)  Although there is a sense in which any reproductions of
+     historical materials preserve the human record, specialists in the
+     field have developed particular guidelines for the creation of
+     acceptable preservation copies.
+
+     (4)  Titles and affiliations of presenters are given at the
+     beginning of their respective talks and in the Directory of
+     Participants (Appendix III).
+
+
+THE MACHINE-READABLE TEXT:  MARKUP AND USE
+
+The sections of the Workshop that dealt with machine-readable text tended
+to be more concerned with access and use than with preservation, at least
+in the narrow technical sense.  Michael SPERBERG-McQUEEN made a forceful
+presentation on the Text Encoding Initiative's (TEI) implementation of
+the Standard Generalized Markup Language (SGML).  His ideas were echoed
+by Susan HOCKEY, Elli MYLONAS, and Stuart WEIBEL.  While the
+presentations made by the TEI advocates contained no practicum, their
+discussion focused on the value of the finished product, what the
+European Community calls reusability, but what may also be termed
+durability.  They argued that marking up--that is, coding--a text in a
+well-conceived way will permit it to be moved from one computer
+environment to another, as well as to be used by various users.  Two
+kinds of markup were distinguished:  1) procedural markup, which
+describes the features of a text (e.g., dots on a page), and 2)
+descriptive markup, which describes the structure or elements of a
+document (e.g., chapters, paragraphs, and front matter).
+
+The TEI proponents emphasized the importance of texts to scholarship. 
+They explained how heavily coded (and thus analyzed and annotated) texts
+can underlie research, play a role in scholarly communication, and
+facilitate classroom teaching.  SPERBERG-McQUEEN reminded listeners that
+a written or printed item (e.g., a particular edition of a book) is
+merely a representation of the abstraction we call a text.  To concern
+ourselves with faithfully reproducing a printed instance of the text,
+SPERBERG-McQUEEN argued, is to concern ourselves with the representation
+of a representation ("images as simulacra for the text").  The TEI proponents'
+interest in images tends to focus on corollary materials for use in teaching,
+for example, photographs of the Acropolis to accompany a Greek text.
+
+By the end of the Workshop, SPERBERG-McQUEEN confessed to having been
+converted to a limited extent to the view that electronic images
+constitute a promising alternative to microfilming; indeed, an
+alternative probably superior to microfilming.  But he was not convinced
+that electronic images constitute a serious attempt to represent text in
+electronic form.  HOCKEY and MYLONAS also conceded that their experience
+at the Pierce Symposium the previous week at Georgetown University and
+the present conference at the Library of Congress had compelled them to
+reevaluate their perspective on the usefulness of text as images. 
+Attendees could see that the text and image advocates were in
+constructive tension, so to say.
+
+Three nonTEI presentations described approaches to preparing
+machine-readable text that are less rigorous and thus less expensive.  In
+the case of the Papers of George Washington, Dorothy TWOHIG explained
+that the digital version will provide a not-quite-perfect rendering of
+the transcribed text--some 135,000 documents, available for research
+during the decades while the perfect or print version is completed. 
+Members of the American Memory team and the staff of NAL's Text
+Digitization Program (see below) also outlined a middle ground concerning
+searchable texts.  In the case of American Memory, contractors produce
+texts with about 99-percent accuracy that serve as "browse" or
+"reference" versions of written or printed originals.  End users who need
+faithful copies or perfect renditions must refer to accompanying sets of
+digital facsimile images or consult copies of the originals in a nearby
+library or archive.  American Memory staff argued that the high cost of
+producing 100-percent accurate copies would prevent LC from offering
+access to large parts of its collections.
+
+
+THE MACHINE-READABLE TEXT:  METHODS OF CONVERSION
+
+Although the Workshop did not include a systematic examination of the
+methods for converting texts from paper (or from facsimile images) into
+machine-readable form, nevertheless, various speakers touched upon this
+matter.  For example, WEIBEL reported that OCLC has experimented with a
+merging of multiple optical character recognition systems that will
+reduce errors from an unacceptable rate of 5 characters out of every
+l,000 to an unacceptable rate of 2 characters out of every l,000.
+
+Pamela ANDRE presented an overview of NAL's Text Digitization Program and
+Judith ZIDAR discussed the technical details.  ZIDAR explained how NAL
+purchased hardware and software capable of performing optical character
+recognition (OCR) and text conversion and used its own staff to convert
+texts.  The process, ZIDAR said, required extensive editing and project
+staff found themselves considering alternatives, including rekeying
+and/or creating abstracts or summaries of texts.  NAL reckoned costs at
+$7 per page.  By way of contrast, Ricky ERWAY explained that American
+Memory had decided from the start to contract out conversion to external
+service bureaus.  The criteria used to select these contractors were cost
+and quality of results, as opposed to methods of conversion.  ERWAY noted
+that historical documents or books often do not lend themselves to OCR. 
+Bound materials represent a special problem.  In her experience, quality
+control--inspecting incoming materials, counting errors in samples--posed
+the most time-consuming aspect of contracting out conversion.  ERWAY
+reckoned American Memory's costs at $4 per page, but cautioned that fewer
+cost-elements had been included than in NAL's figure.
+
+
+OPTIONS FOR DISSEMINATION
+
+The topic of dissemination proper emerged at various points during the
+Workshop.  At the session devoted to national and international computer
+networks, LYNCH, Howard BESSER, Ronald LARSEN, and Edwin BROWNRIGG
+highlighted the virtues of Internet today and of the network that will
+evolve from Internet.  Listeners could discern in these narratives a
+vision of an information democracy in which millions of citizens freely
+find and use what they need.  LYNCH noted that a lack of standards
+inhibits disseminating multimedia on the network, a topic also discussed
+by BESSER.  LARSEN addressed the issues of network scalability and
+modularity and commented upon the difficulty of anticipating the effects
+of growth in orders of magnitude.  BROWNRIGG talked about the ability of
+packet radio to provide certain links in a network without the need for
+wiring.  However, the presenters also called attention to the
+shortcomings and incongruities of present-day computer networks.  For
+example:  1) Network use is growing dramatically, but much network
+traffic consists of personal communication (E-mail).  2) Large bodies of
+information are available, but a user's ability to search across their
+entirety is limited.  3) There are significant resources for science and
+technology, but few network sources provide content in the humanities. 
+4) Machine-readable texts are commonplace, but the capability of the
+system to deal with images (let alone other media formats) lags behind. 
+A glimpse of a multimedia future for networks, however, was provided by
+Maria LEBRON in her overview of the Online Journal of Current Clinical
+Trials (OJCCT), and the process of scholarly publishing on-line.   
+
+The contrasting form of the CD-ROM disk was never systematically
+analyzed, but attendees could glean an impression from several of the
+show-and-tell presentations.  The Perseus and American Memory examples
+demonstrated recently published disks, while the descriptions of the
+IBYCUS version of the Papers of George Washington and Chadwyck-Healey's
+Patrologia Latina Database (PLD) told of disks to come.  According to
+Eric CALALUCA, PLD's principal focus has been on converting Jacques-Paul
+Migne's definitive collection of Latin texts to machine-readable form. 
+Although everyone could share the network advocates' enthusiasm for an
+on-line future, the possibility of rolling up one's sleeves for a session
+with a CD-ROM containing both textual materials and a powerful retrieval
+engine made the disk seem an appealing vessel indeed.  The overall
+discussion suggested that the transition from CD-ROM to on-line networked
+access may prove far slower and more difficult than has been anticipated.
+
+
+WHO ARE THE USERS AND WHAT DO THEY DO?
+
+Although concerned with the technicalities of production, the Workshop
+never lost sight of the purposes and uses of electronic versions of
+textual materials.  As noted above, those interested in imaging discussed
+the problematical matter of digital preservation, while the TEI proponents
+described how machine-readable texts can be used in research.  This latter
+topic received thorough treatment in the paper read by Avra MICHELSON.
+She placed the phenomenon of electronic texts within the context of
+broader trends in information technology and scholarly communication.
+
+Among other things, MICHELSON described on-line conferences that
+represent a vigorous and important intellectual forum for certain
+disciplines.  Internet now carries more than 700 conferences, with about
+80 percent of these devoted to topics in the social sciences and the
+humanities.  Other scholars use on-line networks for "distance learning." 
+Meanwhile, there has been a tremendous growth in end-user computing;
+professors today are less likely than their predecessors to ask the
+campus computer center to process their data.  Electronic texts are one
+key to these sophisticated applications, MICHELSON reported, and more and
+more scholars in the humanities now work in an on-line environment. 
+Toward the end of the Workshop, Michael LESK presented a corollary to
+MICHELSON's talk, reporting the results of an experiment that compared
+the work of one group of chemistry students using traditional printed
+texts and two groups using electronic sources.  The experiment
+demonstrated that in the event one does not know what to read, one needs
+the electronic systems; the electronic systems hold no advantage at the
+moment if one knows what to read, but neither do they impose a penalty.
+
+DALY provided an anecdotal account of the revolutionizing impact of the
+new technology on his previous methods of research in the field of classics.
+His account, by extrapolation, served to illustrate in part the arguments
+made by MICHELSON concerning the positive effects of the sudden and radical
+transformation being wrought in the ways scholars work.
+
+Susan VECCIA and Joanne FREEMAN delineated the use of electronic
+materials outside the university.  The most interesting aspect of their
+use, FREEMAN said, could be seen as a paradox:  teachers in elementary
+and secondary schools requested access to primary source materials but,
+at the same time, found that "primariness" itself made these materials
+difficult for their students to use.
+
+
+OTHER TOPICS
+
+Marybeth PETERS reviewed copyright law in the United States and offered
+advice during a lively discussion of this subject.  But uncertainty
+remains concerning the price of copyright in a digital medium, because a
+solution remains to be worked out concerning management and synthesis of
+copyrighted and out-of-copyright pieces of a database.
+
+As moderator of the final session of the Workshop, Prosser GIFFORD directed
+discussion to future courses of action and the potential role of LC in
+advancing them.  Among the recommendations that emerged were the following:
+
+     * Workshop participants should 1) begin to think about working
+     with image material, but structure and digitize it in such a
+     way that at a later stage it can be interpreted into text, and
+     2) find a common way to build text and images together so that
+     they can be used jointly at some stage in the future, with
+     appropriate network support, because that is how users will want
+     to access these materials.  The Library might encourage attempts
+     to bring together people who are working on texts and images.
+
+     * A network version of American Memory should be developed or
+     consideration should be given to making the data in it
+     available to people interested in doing network multimedia. 
+     Given the current dearth of digital data that is appealing and
+     unencumbered by extremely complex rights problems, developing a
+     network version of American Memory could do much to help make
+     network multimedia a reality.
+
+     * Concerning the thorny issue of electronic deposit, LC should
+     initiate a catalytic process in terms of distributed
+     responsibility, that is, bring together the distributed
+     organizations and set up a study group to look at all the
+     issues related to electronic deposit and see where we as a
+     nation should move.  For example, LC might attempt to persuade
+     one major library in each state to deal with its state
+     equivalent publisher, which might produce a cooperative project
+     that would be equitably distributed around the country, and one
+     in which LC would be dealing with a minimal number of publishers
+     and minimal copyright problems.  LC must also deal with the
+     concept of on-line publishing, determining, among other things,
+     how serials such as OJCCT might be deposited for copyright.
+
+     * Since a number of projects are planning to carry out
+     preservation by creating digital images that will end up in
+     on-line or near-line storage at some institution, LC might play
+     a helpful role, at least in the near term, by accelerating how
+     to catalog that information into the Research Library Information
+     Network (RLIN) and then into OCLC, so that it would be accessible.
+     This would reduce the possibility of multiple institutions digitizing
+     the same work. 
+
+
+CONCLUSION
+
+The Workshop was valuable because it brought together partisans from
+various groups and provided an occasion to compare goals and methods. 
+The more committed partisans frequently communicate with others in their
+groups, but less often across group boundaries.  The Workshop was also
+valuable to attendees--including those involved with American Memory--who
+came less committed to particular approaches or concepts.  These
+attendees learned a great deal, and plan to select and employ elements of
+imaging, text-coding, and networked distribution that suit their
+respective projects and purposes.
+
+Still, reality rears its ugly head:  no breakthrough has been achieved. 
+On the imaging side, one confronts a proliferation of competing
+data-interchange standards and a lack of consensus on the role of digital
+facsimiles in preservation.  In the realm of machine-readable texts, one
+encounters a reasonably mature standard but methodological difficulties
+and high costs.  These latter problems, of course, represent a special
+impediment to the desire, as it is sometimes expressed in the popular
+press, "to put the [contents of the] Library of Congress on line."  In
+the words of one participant, there was "no solution to the economic
+problems--the projects that are out there are surviving, but it is going
+to be a lot of work to transform the information industry, and so far the
+investment to do that is not forthcoming" (LESK, per litteras).
+
+
+               ***   ***   ***   ******   ***   ***   ***
+
+
+                               PROCEEDINGS
+
+
+WELCOME
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+GIFFORD * Origin of Workshop in current Librarian's desire to make LC's
+collections more widely available * Desiderata arising from the prospect
+of greater interconnectedness *
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+After welcoming participants on behalf of the Library of Congress,
+American Memory (AM), and the National Demonstration Lab, Prosser
+GIFFORD, director for scholarly programs, Library of Congress, located
+the origin of the Workshop on Electronic Texts in a conversation he had
+had considerably more than a year ago with Carl FLEISCHHAUER concerning
+some of the issues faced by AM.  On the assumption that numerous other
+people were asking the same questions, the decision was made to bring
+together as many of these people as possible to ask the same questions
+together.  In a deeper sense, GIFFORD said, the origin of the Workshop
+lay in the desire of the current Librarian of Congress, James H. 
+Billington, to make the collections of the Library, especially those
+offering unique or unusual testimony on aspects of the American
+experience, available to a much wider circle of users than those few
+people who can come to Washington to use them.  This meant that the
+emphasis of AM, from the outset, has been on archival collections of the
+basic material, and on making these collections themselves available,
+rather than selected or heavily edited products.
+
+From AM's emphasis followed the questions with which the Workshop began: 
+who will use these materials, and in what form will they wish to use
+them.  But an even larger issue deserving mention, in GIFFORD's view, was
+the phenomenal growth in Internet connectivity.  He expressed the hope
+that the prospect of greater interconnectedness than ever before would
+lead to:  1) much more cooperative and mutually supportive endeavors; 2)
+development of systems of shared and distributed responsibilities to
+avoid duplication and to ensure accuracy and preservation of unique
+materials; and 3) agreement on the necessary standards and development of
+the appropriate directories and indices to make navigation
+straightforward among the varied resources that are, and increasingly
+will be, available.  In this connection, GIFFORD requested that
+participants reflect from the outset upon the sorts of outcomes they
+thought the Workshop might have.  Did those present constitute a group
+with sufficient common interests to propose a next step or next steps,
+and if so, what might those be?  They would return to these questions the
+following afternoon.
+
+                                 ******
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+FLEISCHHAUER * Core of Workshop concerns preparation and production of
+materials * Special challenge in conversion of textual materials *
+Quality versus quantity * Do the several groups represented share common
+interests? *
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Carl FLEISCHHAUER, coordinator, American Memory, Library of Congress,
+emphasized that he would attempt to represent the people who perform some
+of the work of converting or preparing  materials and that the core of
+the Workshop had to do with preparation and production.  FLEISCHHAUER
+then drew a distinction between the long term, when many things would be
+available and connected in the ways that GIFFORD described, and the short
+term, in which AM not only has wrestled with the issue of what is the
+best course to pursue but also has faced a variety of technical
+challenges.
+
+FLEISCHHAUER remarked AM's endeavors to deal with a wide range of library
+formats, such as motion picture collections, sound-recording collections,
+and pictorial collections of various sorts, especially collections of
+photographs.  In the course of these efforts, AM kept coming back to
+textual materials--manuscripts or rare printed matter, bound materials,
+etc.  Text posed the greatest conversion challenge of all.  Thus, the
+genesis of the Workshop, which reflects the problems faced by AM.  These
+problems include physical problems.  For example, those in the library
+and archive business deal with collections made up of fragile and rare
+manuscript items, bound materials, especially the notoriously brittle
+bound materials of the late nineteenth century.  These are precious
+cultural artifacts, however, as well as interesting sources of
+information, and LC desires to retain and conserve them.  AM needs to
+handle things without damaging them.  Guillotining a book to run it
+through a sheet feeder must be avoided at all costs.
+
+Beyond physical problems, issues pertaining to quality arose.  For
+example, the desire to provide users with a searchable text is affected
+by the question of acceptable level of accuracy.  One hundred percent
+accuracy is tremendously expensive.  On the other hand, the output of
+optical character recognition (OCR) can be tremendously inaccurate. 
+Although AM has attempted to find a middle ground, uncertainty persists
+as to whether or not it has discovered the right solution.
+
+Questions of quality arose concerning images as well.  FLEISCHHAUER
+contrasted the extremely high level of quality of the digital images in
+the Cornell Xerox Project with AM's efforts to provide a browse-quality
+or access-quality image, as opposed to an archival or preservation image. 
+FLEISCHHAUER therefore welcomed the opportunity to compare notes.
+
+FLEISCHHAUER observed in passing that conversations he had had about
+networks have begun to signal that for various forms of media a
+determination may be made that there is a browse-quality item, or a
+distribution-and-access-quality item that may coexist in some systems
+with a higher quality archival item that would be inconvenient to send
+through the network because of its size.  FLEISCHHAUER referred, of
+course, to images more than to searchable text.
+
+As AM considered those questions, several conceptual issues arose:  ought
+AM occasionally to reproduce materials entirely through an image set, at
+other times, entirely through a text set, and in some cases, a mix? 
+There probably would be times when the historical authenticity of an
+artifact would require that its image be used.  An image might be
+desirable as a recourse for users if one could not provide 100-percent
+accurate text.  Again, AM wondered, as a practical matter, if a
+distinction could be drawn between rare printed matter that might exist
+in multiple collections--that is, in ten or fifteen libraries.  In such
+cases, the need for perfect reproduction would be less than for unique
+items.  Implicit in his remarks, FLEISCHHAUER conceded, was the admission
+that AM has been tilting strongly towards quantity and drawing back a
+little from perfect quality.  That is, it seemed to AM that society would
+be better served if more things were distributed by LC--even if they were
+not quite perfect--than if fewer things, perfectly represented, were
+distributed.  This was stated as a proposition to be tested, with
+responses to be gathered from users.
+
+In thinking about issues related to reproduction of materials and seeing
+other people engaged in parallel activities, AM deemed it useful to
+convene a conference.  Hence, the Workshop.  FLEISCHHAUER thereupon
+surveyed the several groups represented:  1) the world of images (image
+users and image makers); 2) the world of text and scholarship and, within
+this group, those concerned with language--FLEISCHHAUER confessed to finding
+delightful irony in the fact that some of the most advanced thinkers on
+computerized texts are those dealing with ancient Greek and Roman materials;
+3) the network world; and 4) the general world of library science, which
+includes people interested in preservation and cataloging.
+
+FLEISCHHAUER concluded his remarks with special thanks to the David and
+Lucile Packard Foundation for its support of the meeting, the American
+Memory group, the Office for Scholarly Programs, the National
+Demonstration Lab, and the Office of Special Events.  He expressed the
+hope that David Woodley Packard might be able to attend, noting that
+Packard's work and the work of the foundation had sponsored a number of
+projects in the text area.
+
+                                 ******
+
+SESSION I.  CONTENT IN A NEW FORM:   WHO WILL USE IT AND WHAT WILL THEY DO?
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+DALY * Acknowledgements * A new Latin authors disk *  Effects of the new
+technology on previous methods of research *       
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Serving as moderator, James DALY acknowledged the generosity of all the
+presenters for giving of their time, counsel, and patience in planning
+the Workshop, as well as of members of the American Memory project and
+other Library of Congress staff, and the David and Lucile Packard
+Foundation and its executive director, Colburn S. Wilbur.
+
+DALY then recounted his visit in March to the Center for Electronic Texts
+in the Humanities (CETH) and the Department of Classics at Rutgers
+University, where an old friend, Lowell Edmunds, introduced him to the
+department's IBYCUS scholarly personal computer, and, in particular, the
+new Latin CD-ROM, containing, among other things, almost all classical
+Latin literary texts through A.D. 200.  Packard Humanities Institute
+(PHI), Los Altos, California, released this disk late in 1991, with a
+nominal triennial licensing fee.
+
+Playing with the disk for an hour or so at Rutgers brought home to DALY
+at once the revolutionizing impact of the new technology on his previous
+methods of research.  Had this disk been available two or three years
+earlier, DALY contended, when he was engaged in preparing a commentary on
+Book 10 of Virgil's Aeneid for Cambridge University Press, he would not
+have required a forty-eight-square-foot table on which to spread the
+numerous, most frequently consulted items, including some ten or twelve
+concordances to key Latin authors, an almost equal number of lexica to
+authors who lacked concordances, and where either lexica or concordances
+were lacking, numerous editions of authors antedating and postdating Virgil.
+
+Nor, when checking each of the average six to seven words contained in
+the Virgilian hexameter for its usage elsewhere in Virgil's works or
+other Latin authors, would DALY have had to maintain the laborious
+mechanical process of flipping through these concordances, lexica, and
+editions each time.  Nor would he have had to frequent as often the
+Milton S. Eisenhower Library at the Johns Hopkins University to consult
+the Thesaurus Linguae Latinae.  Instead of devoting countless hours, or
+the bulk of his research time, to gathering data concerning Virgil's use
+of words, DALY--now freed by PHI's Latin authors disk from the
+tyrannical, yet in some ways paradoxically happy scholarly drudgery--
+would have been able to devote that same bulk of time to analyzing and
+interpreting Virgilian verbal usage.
+
+Citing Theodore Brunner, Gregory Crane, Elli MYLONAS, and Avra MICHELSON,
+DALY argued that this reversal in his style of work, made possible by the
+new technology, would perhaps have resulted in better, more productive
+research.  Indeed, even in the course of his browsing the Latin authors
+disk at Rutgers, its powerful search, retrieval, and highlighting
+capabilities suggested to him several new avenues of research into
+Virgil's use of sound effects.  This anecdotal account, DALY maintained,
+may serve to illustrate in part the sudden and radical transformation
+being wrought in the ways scholars work.
+
+                                 ******
+
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+MICHELSON * Elements related to scholarship and technology * Electronic
+texts within the context of broader trends within information technology
+and scholarly communication * Evaluation of the prospects for the use of
+electronic texts * Relationship of electronic texts to processes of
+scholarly communication in humanities research * New exchange formats
+created by scholars * Projects initiated to increase scholarly access to
+converted text * Trend toward making electronic resources available
+through research and education networks * Changes taking place in
+scholarly communication among humanities scholars * Network-mediated
+scholarship transforming traditional scholarly practices * Key
+information technology trends affecting the conduct of scholarly
+communication over the next decade * The trend toward end-user computing
+* The trend toward greater connectivity * Effects of these trends * Key
+transformations taking place * Summary of principal arguments *
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Avra MICHELSON, Archival Research and Evaluation Staff, National Archives
+and Records Administration (NARA), argued that establishing who will use
+electronic texts and what they will use them for involves a consideration
+of both information technology and scholarship trends.  This
+consideration includes several elements related to scholarship and
+technology:  1) the key trends in information technology that are most
+relevant to scholarship; 2) the key trends in the use of currently
+available technology by scholars in the nonscientific community; and 3)
+the relationship between these two very distinct but interrelated trends. 
+The investment in understanding this relationship being made by
+information providers, technologists, and public policy developers, as
+well as by scholars themselves, seems to be pervasive and growing,
+MICHELSON contended.  She drew on collaborative work with Jeff Rothenberg
+on the scholarly use of technology.
+
+MICHELSON sought to place the phenomenon of electronic texts within the
+context of broader trends within information technology and scholarly
+communication.  She argued that electronic texts are of most use to
+researchers to the extent that the researchers' working context (i.e.,
+their relevant bibliographic sources, collegial feedback, analytic tools,
+notes, drafts, etc.), along with their field's primary and secondary
+sources, also is accessible in electronic form and can be integrated in
+ways that are unique to the on-line environment.
+
+Evaluation of the prospects for the use of electronic texts includes two
+elements:  1) an examination of the ways in which researchers currently
+are using electronic texts along with other electronic resources, and 2)
+an analysis of key information technology trends that are affecting the
+long-term conduct of scholarly communication.  MICHELSON limited her
+discussion of the use of electronic texts to the practices of humanists
+and noted that the scientific community was outside the panel's overview.
+
+MICHELSON examined the nature of the current relationship of electronic
+texts in particular, and electronic resources in general, to what she
+maintained were, essentially, five processes of scholarly communication
+in humanities research.  Researchers 1) identify sources, 2) communicate
+with their colleagues, 3) interpret and analyze data, 4) disseminate
+their research findings, and 5) prepare curricula to instruct the next
+generation of scholars and students.  This examination would produce a
+clearer understanding of the synergy among these five processes that
+fuels the tendency of the use of electronic resources for one process to
+stimulate its use for other processes of scholarly communication.
+
+For the first process of scholarly communication, the identification of
+sources, MICHELSON remarked the opportunity scholars now enjoy to
+supplement traditional word-of-mouth searches for sources among their
+colleagues with new forms of electronic searching.  So, for example,
+instead of having to visit the library, researchers are able to explore
+descriptions of holdings in their offices.  Furthermore, if their own
+institutions' holdings prove insufficient, scholars can access more than
+200 major American library catalogues over Internet, including the
+universities of California, Michigan, Pennsylvania, and Wisconsin. 
+Direct access to the bibliographic databases offers intellectual
+empowerment to scholars by presenting a comprehensive means of browsing
+through libraries from their homes and offices at their convenience.
+
+The second process of communication involves communication among
+scholars.  Beyond the most common methods of communication, scholars are
+using E-mail and a variety of new electronic communications formats
+derived from it for further academic interchange.  E-mail exchanges are
+growing at an astonishing rate, reportedly 15 percent a month.  They
+currently constitute approximately half the traffic on research and
+education networks.  Moreover, the global spread of E-mail has been so
+rapid that it is now possible for American scholars to use it to
+communicate with colleagues in close to 140 other countries.
+
+Other new exchange formats created by scholars and operating on Internet
+include more than 700 conferences, with about 80 percent of these devoted
+to topics in the social sciences and humanities.  The rate of growth of
+these scholarly electronic conferences also is astonishing.  From l990 to
+l991, 200 new conferences were identified on Internet.  From October 1991
+to June 1992, an additional 150 conferences in the social sciences and
+humanities were added to this directory of listings.  Scholars have
+established conferences in virtually every field, within every different
+discipline.  For example, there are currently close to 600 active social
+science and humanities  conferences on topics such as art and
+architecture, ethnomusicology, folklore, Japanese culture, medical
+education, and gifted and talented education.  The appeal to scholars of
+communicating through these conferences is that, unlike any other medium,
+electronic conferences today provide a forum for global communication
+with peers at the front end of the research process.
+
+Interpretation and analysis of sources constitutes the third process of
+scholarly communication that MICHELSON discussed in terms of texts and
+textual resources.  The methods used to analyze sources fall somewhere on
+a continuum from quantitative analysis to qualitative analysis. 
+Typically, evidence is culled and evaluated using methods drawn from both
+ends of this continuum.  At one end, quantitative analysis involves the
+use of mathematical processes such as a count of frequencies and
+distributions of occurrences or, on a higher level, regression analysis. 
+At the other end of the continuum, qualitative analysis typically
+involves nonmathematical processes oriented toward language
+interpretation or the building of theory.  Aspects of this work involve
+the processing--either manual or computational--of large and sometimes
+massive amounts of textual sources, although the use of nontextual
+sources as evidence, such as photographs, sound recordings, film footage,
+and artifacts, is significant as well.
+
+Scholars have discovered that many of the methods of interpretation and
+analysis that are related to both quantitative and qualitative methods
+are processes that can be performed by computers.  For example, computers
+can count.  They can count brush strokes used in a Rembrandt painting or
+perform regression analysis for understanding cause and effect.  By means
+of advanced technologies, computers can recognize patterns, analyze text,
+and model concepts.  Furthermore, computers can complete these processes
+faster with more sources and with greater precision than scholars who
+must rely on manual interpretation of data.  But if scholars are to use
+computers for these processes, source materials must be in a form
+amenable to computer-assisted analysis.  For this reason many scholars,
+once they have identified the sources that are key to their research, are
+converting them to machine-readable form.  Thus, a representative example
+of the numerous textual conversion projects organized by scholars around
+the world in recent years to support computational text analysis is the
+TLG, the Thesaurus Linguae Graecae.  This project is devoted to
+converting the extant ancient texts of classical Greece.  (Editor's note: 
+according to the TLG Newsletter of May l992, TLG was in use in thirty-two
+different countries.  This figure updates MICHELSON's previous count by one.)
+
+The scholars performing these conversions have been asked to recognize
+that the electronic sources they are converting for one use possess value
+for other research purposes as well.  As a result, during the past few
+years, humanities scholars have initiated a number of projects to
+increase scholarly access to converted text.  So, for example, the Text
+Encoding Initiative (TEI), about which more is said later in the program,
+was established as an effort by scholars to determine standard elements
+and methods for encoding machine-readable text for electronic exchange. 
+In a second effort to facilitate the sharing of converted text, scholars
+have created a new institution, the Center for Electronic Texts in the
+Humanities (CETH).  The center estimates that there are 8,000 series of
+source texts in the humanities that have been converted to
+machine-readable form worldwide.  CETH is undertaking an international
+search for converted text in the humanities, compiling it into an
+electronic library, and preparing bibliographic descriptions of the
+sources for the Research Libraries Information Network's (RLIN)
+machine-readable data file.  The library profession has begun to initiate
+large conversion projects as well, such as American Memory.
+
+While scholars have been making converted text available to one another,
+typically on disk or on CD-ROM, the clear trend is toward making these
+resources available through research and education networks.  Thus, the
+American and French Research on the Treasury of the French Language
+(ARTFL) and the Dante Project are already available on Internet. 
+MICHELSON summarized this section on interpretation and analysis by
+noting that:  1) increasing numbers of humanities scholars in the library
+community are recognizing the importance to the advancement of
+scholarship of retrospective conversion of source materials in the arts
+and humanities; and 2) there is a growing realization that making the
+sources available on research and education networks maximizes their
+usefulness for the analysis performed by humanities scholars.
+
+The fourth process of scholarly communication is dissemination of
+research findings, that is, publication.  Scholars are using existing
+research and education networks to engineer a new type of publication: 
+scholarly-controlled journals that are electronically produced and
+disseminated.  Although such journals are still emerging as a
+communication format, their number has grown, from approximately twelve
+to thirty-six during the past year (July 1991 to June 1992).  Most of
+these electronic scholarly journals are devoted to topics in the
+humanities.  As with network conferences, scholarly enthusiasm for these
+electronic journals stems from the medium's unique ability to advance
+scholarship in a way that no other medium can do by supporting global
+feedback and interchange, practically in real time, early in the research
+process.  Beyond scholarly journals, MICHELSON remarked the delivery of
+commercial full-text products, such as articles in professional journals,
+newsletters, magazines, wire services, and reference sources.  These are
+being delivered via on-line local library catalogues, especially through
+CD-ROMs.  Furthermore, according to MICHELSON, there is general optimism
+that the copyright and fees issues impeding the delivery of full text on
+existing research and education networks soon will be resolved.
+
+The final process of scholarly communication is curriculum development
+and instruction, and this involves the use of computer information
+technologies in two areas.  The first is the development of
+computer-oriented instructional tools, which includes simulations,
+multimedia applications, and computer tools that are used to assist in
+the analysis of sources in the classroom, etc.  The Perseus Project, a
+database that provides a multimedia curriculum on classical Greek
+civilization, is a good example of the way in which entire curricula are
+being recast using information technologies.  It is anticipated that the
+current difficulty in exchanging electronically computer-based
+instructional software, which in turn makes it difficult for one scholar
+to build upon the work of others, will be resolved before too long. 
+Stand-alone curricular applications that involve electronic text will be
+sharable through networks, reinforcing their significance as intellectual
+products as well as instructional tools.
+
+The second aspect of electronic learning involves the use of research and
+education networks for distance education programs.  Such programs
+interactively link teachers with students in geographically scattered
+locations and rely on the availability of electronic instructional
+resources.  Distance education programs are gaining wide appeal among
+state departments of education because of their demonstrated capacity to
+bring advanced specialized course work and an array of experts to many
+classrooms.  A recent report found that at least 32 states operated at
+least one statewide network for education in 1991, with networks under
+development in many of the remaining states.
+
+MICHELSON summarized this section by noting two striking changes taking
+place in scholarly communication among humanities scholars.  First is the
+extent to which electronic text in particular, and electronic resources
+in general, are being infused into each of the five processes described
+above.  As mentioned earlier, there is a certain synergy at work here. 
+The use of electronic resources for one process tends to stimulate its
+use for other processes, because the chief course of movement is toward a
+comprehensive on-line working context for humanities scholars that
+includes on-line availability of key bibliographies, scholarly feedback,
+sources, analytical tools, and publications.  MICHELSON noted further
+that the movement toward a comprehensive on-line working context for
+humanities scholars is not new.  In fact, it has been underway for more
+than forty years in the humanities, since Father Roberto Busa began
+developing an electronic concordance of the works of Saint Thomas Aquinas
+in 1949.  What we are witnessing today, MICHELSON contended, is not the
+beginning of this on-line transition but, for at least some humanities
+scholars, the turning point in the transition from a print to an
+electronic working context.  Coinciding with the on-line transition, the
+second striking change is the extent to which research and education
+networks are becoming the new medium of scholarly communication.  The
+existing Internet and the pending National Education and Research Network
+(NREN) represent the new meeting ground where scholars are going for
+bibliographic information, scholarly dialogue and feedback, the most
+current publications in their field, and high-level educational
+offerings.  Traditional scholarly practices are undergoing tremendous
+transformations as a result of the emergence and growing prominence of
+what is called network-mediated scholarship.
+
+MICHELSON next turned to the second element of the framework she proposed
+at the outset of her talk for evaluating the prospects for electronic
+text, namely the key information technology trends affecting the conduct
+of scholarly communication over the next decade:  1) end-user computing
+and 2) connectivity.
+
+End-user computing means that the person touching the keyboard, or
+performing computations, is the same as the person who initiates or
+consumes the computation.  The emergence of personal computers, along
+with a host of other forces, such as ubiquitous computing, advances in
+interface design, and the on-line transition, is prompting the consumers
+of computation to do their own computing, and is thus rendering obsolete
+the traditional distinction between end users and ultimate users.
+
+The trend toward end-user computing is significant to consideration of
+the prospects for electronic texts because it means that researchers are
+becoming more adept at doing their own computations and, thus, more
+competent in the use of electronic media.  By avoiding programmer
+intermediaries, computation is becoming central to the researcher's
+thought process.  This direct involvement in computing is changing the
+researcher's perspective on the nature of research itself, that is, the
+kinds of questions that can be posed, the analytical methodologies that
+can be used, the types and amount of sources that are appropriate for
+analyses, and the form in which findings are presented.  The trend toward
+end-user computing means that, increasingly, electronic media and
+computation are being infused into all processes of humanities
+scholarship, inspiring remarkable transformations in scholarly
+communication.
+
+The trend toward greater connectivity suggests that researchers are using
+computation increasingly in network environments.  Connectivity is
+important to scholarship because it erases the distance that separates
+students from teachers and scholars from their colleagues, while allowing
+users to access remote databases, share information in many different
+media, connect to their working context wherever they are, and
+collaborate in all phases of research.
+
+The combination of the trend toward end-user computing and the trend
+toward connectivity suggests that the scholarly use of electronic
+resources, already evident among some researchers, will soon become an
+established feature of scholarship.  The effects of these trends, along
+with ongoing changes in scholarly practices, point to a future in which
+humanities researchers will use computation and electronic communication
+to help them formulate ideas, access sources, perform research,
+collaborate with colleagues, seek peer review, publish and disseminate
+results, and engage in many other professional and educational activities.
+
+In summary, MICHELSON emphasized four points:  1) A portion of humanities
+scholars already consider electronic texts the preferred format for
+analysis and dissemination.  2) Scholars are using these electronic
+texts, in conjunction with other electronic resources, in all the
+processes of scholarly communication.  3) The humanities scholars'
+working context is in the process of changing from print technology to
+electronic technology, in many ways mirroring transformations that have
+occurred or are occurring within the scientific community.  4) These
+changes are occurring in conjunction with the development of a new
+communication medium:  research and education networks that are
+characterized by their capacity to advance scholarship in a wholly unique
+way.
+
+MICHELSON also reiterated her three principal arguments:  l) Electronic
+texts are best understood in terms of the relationship to other
+electronic resources and the growing prominence of network-mediated
+scholarship.  2) The prospects for electronic texts lie in their capacity
+to be integrated into the on-line network of electronic resources that
+comprise the new working context for scholars.  3) Retrospective conversion
+of portions of the scholarly record should be a key strategy as information
+providers respond to changes in scholarly communication practices.
+
+                                 ******
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+VECCIA * AM's evaluation project and public users of electronic resources
+* AM and its design * Site selection and evaluating the Macintosh
+implementation of AM * Characteristics of the six public libraries
+selected * Characteristics of AM's users in these libraries * Principal
+ways AM is being used *
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Susan VECCIA, team leader, and Joanne FREEMAN, associate coordinator,
+American Memory, Library of Congress, gave a joint presentation.  First,
+by way of introduction, VECCIA explained her and FREEMAN's roles in
+American Memory (AM).  Serving principally as an observer, VECCIA has
+assisted with the evaluation project of AM, placing AM collections in a
+variety of different sites around the country and helping to organize and
+implement that project.  FREEMAN has been an associate coordinator of AM
+and has been involved principally with the interpretative materials,
+preparing some of the electronic exhibits and printed historical
+information that accompanies AM and that is requested by users.  VECCIA
+and FREEMAN shared anecdotal observations concerning AM with public users
+of electronic resources.  Notwithstanding a fairly structured evaluation
+in progress, both VECCIA and FREEMAN chose not to report on specifics in
+terms of numbers, etc., because they felt it was too early in the
+evaluation project to do so.
+
+AM is an electronic archive of primary source materials from the Library
+of Congress, selected collections representing a variety of formats--
+photographs, graphic arts, recorded sound, motion pictures, broadsides,
+and soon, pamphlets and books.  In terms of the design of this system,
+the interpretative exhibits have been kept separate from the primary
+resources, with good reason.  Accompanying this collection are printed
+documentation and user guides, as well as guides that FREEMAN prepared for
+teachers so that they may begin using the content of the system at once.
+
+VECCIA described the evaluation project before talking about the public
+users of AM, limiting her remarks to public libraries, because FREEMAN
+would talk more specifically about schools from kindergarten to twelfth
+grade (K-12).   Having started in spring 1991, the evaluation currently
+involves testing of the Macintosh implementation of AM.  Since the
+primary goal of this evaluation is to determine the most appropriate
+audience or audiences for AM, very different sites were selected.  This
+makes evaluation difficult because of the varying degrees of technology
+literacy among the sites.  AM is situated in forty-four locations, of
+which six are public libraries and sixteen are schools.  Represented
+among the schools are elementary, junior high, and high schools.
+District offices also are involved in the evaluation, which will
+conclude in summer 1993.
+
+VECCIA focused the remainder of her talk on the six public libraries, one
+of which doubles as a state library.  They represent a range of
+geographic areas and a range of demographic characteristics.  For
+example, three are located in urban settings, two in rural settings, and
+one in a suburban setting.  A range of technical expertise is to be found
+among these facilities as well.  For example, one is an "Apple library of
+the future," while two others are rural one-room libraries--in one, AM
+sits at the front desk next to a tractor manual.
+
+All public libraries have been extremely enthusiastic, supportive, and
+appreciative of the work that AM has been doing.  VECCIA characterized
+various users:  Most users in public libraries describe themselves as
+general readers; of the students who use AM in the public libraries,
+those in fourth grade and above seem most interested.  Public libraries
+in rural sites tend to attract retired people, who have been highly
+receptive to AM.  Users tend to fall into two additional categories: 
+people interested in the content and historical connotations of these
+primary resources, and those fascinated by the technology.  The format
+receiving the most comments has been motion pictures.  The adult users in
+public libraries are more comfortable with IBM computers, whereas young
+people seem comfortable with either IBM or Macintosh, although most of
+them seem to come from a Macintosh background.  This same tendency is
+found in the schools.
+
+What kinds of things do users do with AM?  In a public library there are
+two main goals or ways that AM is being used:  as an individual learning
+tool, and as a leisure activity.  Adult learning was one area that VECCIA
+would highlight as a possible application for a tool such as AM.  She
+described a patron of a rural public library who comes in every day on
+his lunch hour and literally reads AM, methodically going through the
+collection image by image.  At the end of his hour he makes an electronic
+bookmark, puts it in his pocket, and returns to work.  The next day he
+comes in and resumes where he left off.  Interestingly, this man had
+never been in the library before he used AM.  In another small, rural
+library, the coordinator reports that AM is a popular activity for some
+of the older, retired people in the community, who ordinarily would not
+use "those things,"--computers.  Another example of adult learning in
+public libraries is book groups, one of which, in particular, is using AM
+as part of its reading on industrialization, integration, and urbanization
+in the early 1900s.
+
+One library reports that a family is using AM to help educate their
+children.  In another instance, individuals from a local museum came in
+to use AM to prepare an exhibit on toys of the past.  These two examples
+emphasize the mission of the public library as a cultural institution,
+reaching out to people who do not have the same resources available to
+those who live in a metropolitan area or have access to a major library. 
+One rural library reports that junior high school students in large
+numbers came in one afternoon to use AM for entertainment.  A number of
+public libraries reported great interest among postcard collectors in the
+Detroit collection, which was essentially a collection of images used on
+postcards around the turn of the century.  Train buffs are similarly
+interested because that was a time of great interest in railroading. 
+People, it was found, relate to things that they know of firsthand.  For
+example, in both rural public libraries where AM was made available,
+observers reported that the older people with personal remembrances of
+the turn of the century were gravitating to the Detroit collection. 
+These examples served to underscore MICHELSON's observation re the
+integration of electronic tools and ideas--that people learn best when
+the material relates to something they know.
+
+VECCIA made the final point that in many cases AM serves as a
+public-relations tool for the public libraries that are testing it.  In
+one case, AM is being used as a vehicle to secure additional funding for
+the library.  In another case, AM has served as an inspiration to the
+staff of a major local public library in the South to think about ways to
+make its own collection of photographs more accessible to the public.
+
+                                  ******
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+FREEMAN * AM and archival electronic resources in a school environment *
+Questions concerning context * Questions concerning the electronic format
+itself * Computer anxiety * Access and availability of the system *
+Hardware * Strengths gained through the use of archival resources in
+schools *
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Reiterating an observation made by VECCIA, that AM is an archival
+resource made up of primary materials with very little interpretation,
+FREEMAN stated that the project has attempted to bridge the gap between
+these bare primary materials and a school environment, and in that cause
+has created guided introductions to AM collections.  Loud demand from the
+educational community,  chiefly from teachers working with the upper
+grades of elementary school through high school, greeted the announcement
+that AM would be tested around the country.
+
+FREEMAN reported not only on what was learned about AM in a school
+environment, but also on several universal questions that were raised
+concerning archival electronic resources in schools.  She discussed
+several strengths of this type of material in a school environment as
+opposed to a highly structured resource that offers a limited number of
+paths to follow.
+
+FREEMAN first raised several questions about using AM in a school
+environment.  There is often some difficulty in developing a sense of
+what the system contains.  Many students sit down at a computer resource
+and assume that, because AM comes from the Library of Congress, all of
+American history is now at their fingertips.  As a result of that sort of
+mistaken judgment, some students are known to conclude that AM contains
+nothing of use to them when they look for one or two things and do not
+find them.  It is difficult to discover that middle ground where one has
+a sense of what the system contains.  Some students grope toward the idea
+of an archive, a new idea to them, since they have not previously
+experienced what it means to have access to a vast body of somewhat
+random information.
+
+Other questions raised by FREEMAN concerned the electronic format itself. 
+For instance, in a school environment it is often difficult both for
+teachers and students to gain a sense of what it is they are viewing. 
+They understand that it is a visual image, but they do not necessarily
+know that it is a postcard from the turn of the century, a panoramic
+photograph, or even machine-readable text of an eighteenth-century
+broadside, a twentieth-century printed book, or a nineteenth-century
+diary.  That distinction is often difficult for people in a school
+environment to grasp.  Because of that, it occasionally becomes difficult
+to draw conclusions from what one is viewing.
+
+FREEMAN also noted the obvious fear of the computer, which constitutes a
+difficulty in using an electronic resource.  Though students in general
+did not suffer from this anxiety, several older students feared that they
+were computer-illiterate, an assumption that became self-fulfilling when
+they searched for something but failed to find it.  FREEMAN said she
+believed that some teachers also fear computer resources, because they
+believe they lack complete control.  FREEMAN related the example of
+teachers shooing away students because it was not their time to use the
+system.  This was a case in which the situation had to be extremely
+structured so that the teachers would not feel that they had lost their
+grasp on what the system contained.
+
+A final question raised by FREEMAN concerned access and availability of
+the system.  She noted the occasional existence of a gap in communication
+between school librarians and teachers.  Often AM sits in a school
+library and the librarian is the person responsible for monitoring the
+system.  Teachers do not always take into their world new library
+resources about which the librarian is excited.  Indeed, at the sites
+where AM had been used most effectively within a library, the librarian
+was required to go to specific teachers and instruct them in its use.  As
+a result, several AM sites will have in-service sessions over a summer,
+in the hope that perhaps, with a more individualized link, teachers will
+be more likely to use the resource.
+
+A related issue in the school context concerned the number of
+workstations available at any one location.  Centralization of equipment
+at the district level, with teachers invited to download things and walk
+away with them, proved unsuccessful because the hours these offices were
+open were also school hours.
+
+Another issue was hardware.  As VECCIA observed, a range of sites exists,
+some technologically advanced and others essentially acquiring their
+first computer for the primary purpose of using it in conjunction with
+AM's testing.  Users at technologically sophisticated sites want even
+more sophisticated hardware, so that they can perform even more
+sophisticated tasks with the materials in AM.  But once they acquire a
+newer piece of hardware, they must learn how to use that also; at an
+unsophisticated site it takes an extremely long time simply to become
+accustomed to the computer, not to mention the program offered with the
+computer.  All of these small issues raise one large question, namely,
+are systems like AM truly rewarding in a school environment, or do they
+simply act as innovative toys that do little more than spark interest?
+
+FREEMAN contended that the evaluation project has revealed several strengths
+that were gained through the use of archival resources in schools, including:
+
+     * Psychic rewards from using AM as a vast, rich database, with
+     teachers assigning various projects to students--oral presentations,
+     written reports, a documentary, a turn-of-the-century newspaper--
+     projects that start with the materials in AM but are completed using
+     other resources; AM thus is used as a research tool in conjunction
+     with other electronic resources, as well as with books and items in
+     the library where the system is set up.
+
+     * Students are acquiring computer literacy in a humanities context.
+
+     * This sort of system is overcoming the isolation between disciplines
+     that often exists in schools.  For example, many English teachers are
+     requiring their students to write papers on historical topics
+     represented in AM.  Numerous teachers have reported that their
+     students are learning critical thinking skills using the system.
+
+     * On a broader level, AM is introducing primary materials, not only
+     to students but also to teachers, in an environment where often
+     simply none exist--an exciting thing for the students because it
+     helps them learn to conduct research, to interpret, and to draw
+     their own conclusions.  In learning to conduct research and what it
+     means, students are motivated to seek knowledge.  That relates to
+     another positive outcome--a high level of personal involvement of
+     students with the materials in this system and greater motivation to
+     conduct their own research and draw their own conclusions.
+
+     * Perhaps the most ironic strength of these kinds of archival
+     electronic resources is that many of the teachers AM interviewed
+     were desperate, it is no exaggeration to say, not only for primary
+     materials but for unstructured primary materials.  These would, they
+     thought, foster personally motivated research, exploration, and
+     excitement in their students.  Indeed, these materials have done
+     just that.  Ironically, however, this lack of structure produces
+     some of the confusion to which the newness of these kinds of
+     resources may also contribute.  The key to effective use of archival
+     products in a school environment is a clear, effective introduction
+     to the system and to what it contains. 
+
+                                 ******
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+DISCUSSION * Nothing known, quantitatively, about the number of
+humanities scholars who must see the original versus those who would
+settle for an edited transcript, or about the ways in which humanities
+scholars are using information technology * Firm conclusions concerning
+the manner and extent of the use of supporting materials in print
+provided by AM to await completion of evaluative study * A listener's
+reflections on additional applications of electronic texts * Role of
+electronic resources in teaching elementary research skills to students *
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+During the discussion that followed the presentations by MICHELSON,
+VECCIA, and FREEMAN, additional points emerged.
+
+LESK asked if MICHELSON could give any quantitative estimate of the
+number of humanities scholars who must see or want to see the original,
+or the best possible version of the material, versus those who typically
+would settle for an edited transcript.  While unable to provide a figure,
+she offered her impressions as an archivist who has done some reference
+work and has discussed this issue with other archivists who perform
+reference, that those who use archives and those who use primary sources
+for what would be considered very high-level scholarly research, as
+opposed to, say, undergraduate papers, were few in number, especially
+given the public interest in using primary sources to conduct
+genealogical or avocational research and the kind of professional
+research done by people in private industry or the federal government. 
+More important in MICHELSON's view was that, quantitatively, nothing is
+known about the ways in which, for example, humanities scholars are using
+information technology.  No studies exist to offer guidance in creating
+strategies.  The most recent study was conducted in 1985 by the American
+Council of Learned Societies (ACLS), and what it showed was that 50
+percent of humanities scholars at that time were using computers.  That
+constitutes the extent of our knowledge.
+
+Concerning AM's strategy for orienting people toward the scope of
+electronic resources, FREEMAN could offer no hard conclusions at this
+point, because she and her colleagues were still waiting to see,
+particularly in the schools, what has been made of their efforts.  Within
+the system, however, AM has provided what are called electronic exhibits-
+-such as introductions to time periods and materials--and these are
+intended to offer a student user a sense of what a broadside is  and what
+it might tell her or him.  But FREEMAN conceded that the project staff
+would have to talk with students next year, after teachers have had a
+summer to use the materials, and attempt to discover what the students
+were learning from the materials.  In addition, FREEMAN described
+supporting materials in print provided by AM at the request of local
+teachers during a meeting held at LC.  These included time lines,
+bibliographies, and other materials that could be reproduced on a
+photocopier in a classroom.  Teachers could walk away with and use these,
+and in this way gain a better understanding of the contents.  But again,
+reaching firm conclusions concerning the manner and extent of their use
+would have to wait until next year.
+
+As to the changes she saw occurring at the National Archives and Records
+Administration (NARA) as a result of the increasing emphasis on
+technology in scholarly research, MICHELSON stated that NARA at this
+point was absorbing the report by her and Jeff Rothenberg addressing
+strategies for the archival profession in general, although not for the
+National Archives specifically.  NARA is just beginning to establish its
+role and what it can do.  In terms of changes and initiatives that NARA
+can take, no clear response could be given at this time.
+
+GREENFIELD remarked two trends mentioned in the session.  Reflecting on
+DALY's opening comments on how he could have used a Latin collection of
+text in an electronic form, he said that at first he thought most scholars
+would be unwilling to do that.  But as he thought of that in terms of the
+original meaning of research--that is, having already mastered these texts,
+researching them for critical and comparative purposes--for the first time,
+the electronic format made a lot of sense.  GREENFIELD could envision
+growing numbers of scholars learning the new technologies for that very
+aspect of their scholarship and for convenience's sake.
+
+Listening to VECCIA and FREEMAN, GREENFIELD thought of an additional
+application of electronic texts.  He realized that AM could be used as a
+guide to lead someone to original sources.  Students cannot be expected
+to have mastered these sources, things they have never known about
+before.  Thus, AM is leading them, in theory, to a vast body of
+information and giving them a superficial overview of it, enabling them
+to select parts of it.  GREENFIELD asked if any evidence exists that this
+resource will indeed teach the new user, the K-12 students, how to do
+research.  Scholars already know how to do research and are applying
+these new tools.  But he wondered why students would go beyond picking
+out things that were most exciting to them.
+
+FREEMAN conceded the correctness of GREENFIELD's observation as applied
+to a school environment.  The risk is that a student would sit down at a
+system, play with it, find some things of interest, and then walk away. 
+But in the relatively controlled situation of a school library, much will
+depend on the instructions a teacher or a librarian gives a student.  She
+viewed the situation not as one of fine-tuning research skills but of
+involving students at a personal level in understanding and researching
+things.  Given the guidance one can receive at school, it then becomes
+possible to teach elementary research skills to students, which in fact
+one particular librarian said she was teaching her fifth graders. 
+FREEMAN concluded that introducing the idea of following one's own path
+of inquiry, which is essentially what research entails, involves more
+than teaching specific skills.  To these comments VECCIA added the
+observation that the individual teacher and the use of a creative
+resource, rather than AM itself, seemed to make the key difference.
+Some schools and some teachers are making excellent use of the nature
+of critical thinking and teaching skills, she said.
+
+Concurring with these remarks, DALY closed the session with the thought that
+the more that producers produced for teachers and for scholars to use with
+their students, the more successful their electronic products would prove.
+
+                                 ******
+
+SESSION II.  SHOW AND TELL
+
+Jacqueline HESS, director, National Demonstration Laboratory, served as
+moderator of the "show-and-tell" session.  She noted that a
+question-and-answer period would follow each presentation.
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+MYLONAS * Overview and content of Perseus * Perseus' primary materials
+exist in a system-independent, archival form * A concession * Textual
+aspects of Perseus * Tools to use with the Greek text * Prepared indices
+and full-text searches in Perseus * English-Greek word search leads to
+close study of words and concepts * Navigating Perseus by tracing down
+indices * Using the iconography to perform research *
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Elli MYLONAS, managing editor, Perseus Project, Harvard University, first
+gave an overview of Perseus, a large, collaborative effort based at
+Harvard University but with contributors and collaborators located at
+numerous universities and colleges in the United States (e.g., Bowdoin,
+Maryland, Pomona, Chicago, Virginia).  Funded primarily by the
+Annenberg/CPB Project, with additional funding from Apple, Harvard, and
+the Packard Humanities Institute, among others, Perseus is a multimedia,
+hypertextual database for teaching and research on classical Greek
+civilization, which was released in February 1992 in version 1.0 and
+distributed by Yale University Press.
+
+Consisting entirely of primary materials, Perseus includes ancient Greek
+texts and translations of those texts; catalog entries--that is, museum
+catalog entries, not library catalog entries--on vases, sites, coins,
+sculpture, and archaeological objects; maps; and a dictionary, among
+other sources.  The number of objects and the objects for which catalog
+entries exist are accompanied by thousands of color images, which
+constitute a major feature of the database.  Perseus contains
+approximately 30 megabytes of text, an amount that will double in
+subsequent versions.  In addition to these primary materials, the Perseus
+Project has been building tools for using them, making access and
+navigation easier, the goal being to build part of the electronic
+environment discussed earlier in the morning in which students or
+scholars can work with their sources.
+
+The demonstration of Perseus will show only a fraction of the real work
+that has gone into it, because the project had to face the dilemma of
+what to enter when putting something into machine-readable form:  should
+one aim for very high quality or make concessions in order to get the
+material in?  Since Perseus decided to opt for very high quality, all of
+its primary materials exist in a system-independent--insofar as it is
+possible to be system-independent--archival form.  Deciding what that
+archival form would be and attaining it required much work and thought. 
+For example, all the texts are marked up in SGML, which will be made
+compatible with the guidelines of the Text Encoding Initiative (TEI) when
+they are issued.
+
+Drawings are postscript files, not meeting international standards, but
+at least designed to go across platforms.  Images, or rather the real
+archival forms, consist of the best available slides, which are being
+digitized.  Much of the catalog material exists in database form--a form
+that the average user could use, manipulate, and display on a personal
+computer, but only at great cost.  Thus, this is where the concession
+comes in:  All of this rich, well-marked-up information is stripped of
+much of its content; the images are converted into bit-maps and the text
+into small formatted chunks.  All this information can then be imported
+into HyperCard and run on a mid-range Macintosh, which is what Perseus
+users have.  This fact has made it possible for Perseus to attain wide
+use fairly rapidly.  Without those archival forms the HyperCard version
+being demonstrated could not be made easily, and the project could not
+have the potential to move to other forms and machines and software as
+they appear, none of which information is in Perseus on the CD.
+
+Of the numerous multimedia aspects of Perseus, MYLONAS focused on the
+textual.  Part of what makes Perseus such a pleasure to use, MYLONAS
+said, is this effort at seamless integration and the ability to move
+around both visual and textual material.  Perseus also made the decision
+not to attempt to interpret its material any more than one interprets by
+selecting.  But, MYLONAS emphasized, Perseus is not courseware:  No
+syllabus exists.  There is no effort to define how one teaches a topic
+using Perseus, although the project may eventually collect papers by
+people who have used it to teach.  Rather, Perseus aims to provide
+primary material in a kind of electronic library, an electronic sandbox,
+so to say, in which students and scholars who are working on this
+material can explore by themselves.  With that, MYLONAS demonstrated
+Perseus, beginning with the Perseus gateway, the first thing one sees
+upon opening Perseus--an effort in part to solve the contextualizing
+problem--which tells the user what the system contains.
+
+MYLONAS demonstrated only a very small portion, beginning with primary
+texts and running off the CD-ROM.  Having selected Aeschylus' Prometheus
+Bound, which was viewable in Greek and English pretty much in the same
+segments together, MYLONAS demonstrated tools to use with the Greek text,
+something not possible with a book:  looking up the dictionary entry form
+of an unfamiliar word in Greek after subjecting it to Perseus'
+morphological analysis for all the texts.  After finding out about a
+word, a user may then decide to see if it is used anywhere else in Greek. 
+Because vast amounts of indexing support all of the primary material, one
+can find out where else all forms of a particular Greek word appear--
+often not a trivial matter because Greek is highly inflected.  Further,
+since the story of Prometheus has to do with the origins of sacrifice, a
+user may wish to study and explore sacrifice in Greek literature; by
+typing sacrifice into a small window, a user goes to the English-Greek
+word list--something one cannot do without the computer (Perseus has
+indexed the definitions of its dictionary)--the string sacrifice appears
+in the definitions of these sixty-five words.  One may then find out
+where any of those words is used in the work(s) of a particular author. 
+The English definitions are not lemmatized.
+
+All of the indices driving this kind of usage were originally devised for
+speed, MYLONAS observed; in other words, all that kind of information--
+all forms of all words, where they exist, the dictionary form they belong
+to--were collected into databases, which will expedite searching.  Then
+it was discovered that one can do things searching in these databases
+that could not be done searching in the full texts.  Thus, although there
+are full-text searches in Perseus, much of the work is done behind the
+scenes, using prepared indices.  Re the indexing that is done behind the
+scenes, MYLONAS pointed out that without the SGML forms of the text, it
+could not be done effectively.  Much of this indexing is based on the
+structures that are made explicit by the SGML tagging.
+
+It was found that one of the things many of Perseus' non-Greek-reading
+users do is start from the dictionary and then move into the close study
+of words and concepts via this kind of English-Greek word search, by which
+means they might select a concept.  This exercise has been assigned to
+students in core courses at Harvard--to study a concept by looking for the
+English word in the dictionary, finding the Greek words, and then finding
+the words in the Greek but, of course, reading across in the English.
+That tells them a great deal about what a translation means as well.
+
+Should one also wish to see images that have to do with sacrifice, that
+person would go to the object key word search, which allows one to
+perform a similar kind of index retrieval on the database of
+archaeological objects.  Without words, pictures are useless; Perseus has
+not reached the point where it can do much with images that are not
+cataloged.  Thus, although it is possible in Perseus with text and images
+to navigate by knowing where one wants to end up--for example, a
+red-figure vase from the Boston Museum of Fine Arts--one can perform this
+kind of navigation very easily by tracing down indices.  MYLONAS
+illustrated several generic scenes of sacrifice on vases.  The features
+demonstrated derived from Perseus 1.0; version 2.0 will implement even
+better means of retrieval.
+
+MYLONAS closed by looking at one of the pictures and noting again that
+one can do a great deal of research using the iconography as well as the
+texts.  For instance, students in a core course at Harvard this year were
+highly interested in Greek concepts of foreigners and representations of
+non-Greeks.  So they performed a great deal of research, both with texts
+(e.g., Herodotus) and with iconography on vases and coins, on how the
+Greeks portrayed non-Greeks.  At the same time, art historians who study
+iconography were also interested, and were able to use this material.
+
+                                 ******
+
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+DISCUSSION * Indexing and searchability of all English words in Perseus *
+Several features of Perseus 1.0 * Several levels of customization
+possible * Perseus used for general education * Perseus' effects on
+education * Contextual information in Perseus * Main challenge and
+emphasis of Perseus *
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+Several points emerged in the discussion that followed MYLONAS's presentation.
+
+Although MYLONAS had not demonstrated Perseus' ability to cross-search
+documents, she confirmed that all English words in Perseus are indexed
+and can be searched.  So, for example, sacrifice could have been searched
+in all texts, the historical essay, and all the catalogue entries with
+their descriptions--in short, in all of Perseus.
+
+Boolean logic is not in Perseus 1.0 but will

<TRUNCATED>

Mime
View raw message