ETANA:
Electronic Tools and Ancient Near Eastern Archives


ETANA Project Description

Draft: January 2, 2001

ETANA is a cooperative venture of a consortium of scholarly societies and universities to develop and maintain a comprehensive Internet site for the study of the ancient Near East. Academic, library and technical staff of the partner organizations will collaborate to share intellectual and technical resources in the development of the project. The founding partners are:

ETANA will include the permanent archiving, dissemination and generation of both front- and back-end stages of scholarly knowledge such as archeological excavation reports, editions of ancient and modern texts, high definition images of ancient documents, core early monographs, dictionaries, journals, and reports in the public domain, a portal to ANE web resources, an electronic commons where scholars in the field can share data and images, and eventually an electronic publishing effort for "born digital" publications. ETANA will also collect and/or develop software required for the production of the Internet site in core areas identified by the planning committees and outlined herewith. Vanderbilt's library will serve as the host technical site and grant administrator.

  1. The Archeologist's Tool Kit (ATK) (year one and continuing)

    Archeologists working throughout the Near East create field reports of excavations, some in hard-copy that are eventually digitized. These become the basis for seasonal and final reports that often take years to be published, at considerable expense that typically strains the project's very limited staff and financial resources.

    Working with field archeologists who direct projects in the Near East and elsewhere, ETANA will develop both standards and software programs for capturing original data electronically in the field. Such data include database and spreadsheet records, photos, drawings, videos, audio records, statistical programs, CAD, GIS, and GPS information, and other types that may be specific to individual projects. ETANA software programs will also be designed, so that the original field data can migrate though the stages of research to dissemination, thereby eliminating the need to re-code data. This will expedite processes and shorten the time between fieldwork and dissemination of archeological results. In order to accomplish these storage and dissemination goals, ETANA will create a repository and archive for archeological data and select appropriate database environments that will interface with its software programs for posting field data and tools for data retrieval and analysis. The software programs will allow the repository to be searched and queried via the World Wide Web and will enable the reports to be disseminated in various electronic media including the Internet.

    As a departure point, ETANA will invite existing archeological projects to share information and data that is ready to migrate to a standardized system. ETANA member, Case Western Reserve University, will participate in this process by offering developments from its digital archive to ETANA as a pilot project. One of CWRU's pilot projects is an archeological site, Tell Nimrim, with an electronic version, Virtual Nimrim (VN), soon to be running on UNIX and Oracle. ETANA expects this to be only one of several projects that it will use in order to develop and test standards that will have wide acceptance among scholars in ANE and possibly to the broader study of archeology.

    The infrastructure to support the ATK will be developed jointly by CWRU and Vanderbilt and hosted at Vanderbilt. The long-term objective of ETANA is for the ATK to become a powerful tool for collecting field data according to standards that make cross-site comparison possible, plus a searchable Internet database of archeological data for the entire field of ancient Near Eastern archeology. In addition, ETANA will become the dissemination source not only for these projects, but also others that may wish to participate in only a portion of ETANA services such as the dissemination phase. An example is the digital electronic reports produced under the auspices of the American Schools of Oriental Research (ASOR) and administered by its Committee on Archeological Policy (CAP) and Committee on Publications (COP). This learned society comprises member-archaeologists, anthropologists, historians, and others working throughout the Near East and Mediterranean. ASOR will also be valuable for soliciting projects, establishing peer review panels, vetting proposals, and assuring that reports are accessible and disseminating to the scholarly community and general public.

  2. The Scholar's Commons (year one project)

    Many fields are currently developing preprint archives that allow scholars to communicate rapidly with their peers by posting drafts or near final copies to accessible and generally free servers. Submissions to these preprint archives are not reviewed, edited, or in anyway monitored. It is felt that providing a preprint archive in ancient Near Eastern Studies is not desirable due to the nature of publication in this discipline. Instead, there is an expressed need for a common space for scholars to share and make accessible a wide variety of data, images, maps, texts, etc., to their peers as raw material for scholarship. ETANA and Vanderbilt will provide such a space for the use of by scholars.

  3. ABZU the Portal (year one project)

    ABZU, hosted by the library of the Oriental Institute, is the current portal of choice for those working in studies of the ancient Near Eastern studies. ETANA will work with ABZU to form a coherent whole between the portal, AZBU, and ETANA, a content site. A more powerful database and server architecture will be developed and installed to make ABZU more robust and easier to maintain.

    The current ABZU site consists of a matrix of static Web pages. The site is rich in content, but unwieldy to maintain. We expect to extract all the content from these Web pages to populate a database of resources relevant to the field of ANE. A Web interface will be created to this database to provide scholars easy access to these resources through a variety of browse and search options. Another Web interface will be developed to allow scholars in the field to add new content to the ABZU database. This interface would also include features that allow an authorized editor to review submissions before they become part of the active database.

  4. Early Core Text Digital Conversion (year one and continuing)

    In a discipline that is both archeological and text-based, attention to previously published texts in the discipline is appropriate. In Ancient Near Eastern studies, many of the earliest publications still have significant value for scholarly research. Unfortunately, most of these publications are held by only a handful of libraries, severely limiting their use for research and teaching. In our discussions, ANE scholars in the planning group determined that digital access to their volumes would be of significant benefit. The result will be that a scholar or student anywhere in the world will be able to access a substantial body of significant ANE resources that never before existed as a coherent whole. The benefits to scholarship and pedagogy are anticipated, as new approaches to text research emerge.

    The vision of this portion of the project is to make available a sufficiently large number of texts to make the ETANA resources transformative for teaching and research. Certainly not all of the texts published prior to the 20th century qualify for "core" texts from the perspective of ANE scholars. For this reason, all texts selected will be approved by a review committee appointed by the ETANA board. They will work from bibliographic lists arranged by general geographical/historical subject areas, covering archaeology, environment, population, social institutions, history, economy, technology, religion, language and writing. (**see note at end of this section.)

    In the judgment of these participants, the ability to view the publications online was more significant than the ability to search them by full-text keyword. But further investigation of costs of OCR conversion showed that complete and thorough treatment of the texts would be less expensive than first thought (see budget figures below.) Therefore, we propose to include full OCR conversion and proofing in our model. We are investigating the appropriate best-practice software for both the scanning/OCR work and the public web delivery mechanism.

    We propose to contract for the scanning of a large number of selected titles in the public domain (author's death date prior to 1926, or edited works published prior to 1901) in a high-resolution format currently accepted as a standard for "digital preservation" to be hosted on the ETANA site at Vanderbilt. These high-resolution images would be stored for the long-term, held for a future day in which full-text conversion may become less expensive. In addition, the images would contain appropriate structural, administrative, and technical metadata to aid in their long-term viability as a preservation medium. Best practice standards will be followed for quality control. We propose to create high-resolution images in TIFF format of sufficient quality and resolution to preserve this content digitally and to store these digital image files on media that will be refreshed at appropriate intervals to ensure permanent accessibility. We will also convert these high-resolution images to a standard format that can be easily read and retrieved over the Internet. Along with the images will be searchable ASCII text, provided through OCR technology as mentioned above.

    Acquiring these texts for scanning will be managed through lending arrangements with ETANA partner institutions. Texts not available from partner institutions will be acquired through Inter-Library Loan, as the texts will be scanned intact. Particularly fragile pieces will need approval from owning libraries; the grant-supported nature of this project should ease that process of approval.

    Estimating cost per volume

    In 1999, the University of Michigan was asked to establish digital conversion costs including scanning, OCR conversion/proofing, encoding/proofing, and initial online storage. These costs were reviewed by Cornell and the University of Virginia, and all three institutions were in close agreement on the validity of the figures. (See: Moving Theory Into Practice: Digital Imaging for Libraries and Archives, Anne R. Kenney, Oya Y. Reiger, eds. Mountain View CA: Research Libraries Group, 2000.)

    The total costs for a 300 page book ranged from $391.60-$538.60 (per page price at $1.40-$1.89.) If we account for inflation, the complexity of ANE texts (ancient languages), and our need to scan the texts instact, our proposal will select the higher figure of $540.

    Cataloging costs for these texts was omitted from the Michigan/Cornell/Virginia figures. Based upon the Cornell University Department of Preservation's application of the RLG Worksheet for Estimating Digital Reformatting Costs, we estimate the cost of cataloging modification of existing OCLC records to be $10 per volume. We will add an additional $10 per volume to accommodate Dublin Core metadata entry as well, bring the full cataloging costs to $20 per volume.

    Costs not included in above estimates

    The total cost per volume (300 pages) estimate for the ETANA Early Core Texts Digital Conversion project component is $560, or $1.86 per page.

    Estimating the number of volumes

    Based upon an exhaustive search of OCLC for public domain titles on the focused area of ancient Iraq, we were able to identify approximately 50 titles that would be included in a title selection list. Based upon that review, we estimate (broadly) that 1000 titles are possible candidates for conversion. We expect that the selection panel of scholars can reduce that number, if funds are not available for the complete list conversion.

    Total cost of Early Core Text conversion

    1000 titles in final selection list @ $560 per volume = $560,000

    ** NOTE: Geographic/historical areas of coverage will include the following (scholars will certainly modify this list as the selection process proceeds):

    • Anatolia: Hittite Empire, Uratu, Phrygia
    • Iran: Media, Persian Empire, Achaemenid Dynasty, Parthia
    • Egypt: Early Dynastic Period, Old Kingdom, 1st Intermediate Period, Middle Kingdom, 2nd Intermediate Period, New Kingdom, Armana Period, Hyksos, 3rd Intermediate Period
    • Mesopotamia: Sumer and Akkad, Sargonic Dynasty, Ur III Dynasty, Assyria, Babylon, Mari, Nuzi, Uruk, Murashu Archives
    • Palestine: Ancient Israel, Syria, Ugarit, Amorite Culture, Canaanite Culture, Nubia

  5. High Definition Images (year one and continuing)

    The Inscriptofact Project sponsored by the Library at the University of Southern California holds 250,000 high definition images of clay tablets, papyri, and other ancient written artifacts. The scanning and metadata creation and sophisticated retrieval software is currently under development. ETANA would plan to create permanent links to images held by the Inscriptofact Project from images used in previously published materials and from new publications published by the ETANA member societies. For instance, some volumes digitized in the Early Core Text Digital Project, might be enhanced by linking from an image in that publication to the corresponding image in the Inscriptofact database.

    The high definition images as conceived in their project are of a size requiring Internet 2 speeds for reasonable downloading. The likely interface with ETANA would focus on development of an architecture that would allow for permanent URLs to lower resolution images extracted from their database.

  6. Born Digital Electronic Publications (year two and continuing)

    Because of the worldwide membership of the various organizations in ETANA, fostering digital publications will allow for the rapid dissemination and access to ANE literature and data to all regions of the world. Merging the current print publications of these societies into a single searchable database, would allow for greater comprehensive access. This is especially true when access to traditional publications is enhanced by access to high definition images and archeological data.

    The member scholarly societies of ETANA all can benefit from developing an infrastructure whereby electronic editions archived by ETANA could also serve as the basis for print publication. This infrastructure will provide typeset quality, page image printing at a cost which is at least as economical as the current print run technology. The ETANA archive could provide a "dark archive" of print ready editions for "one off" sale by association publishers, while providing an "open" archive to text in XML markup edition on the web.

  7. Training and Developmental Efforts (on-going)

    ETANA will provide funding for representatives to attend meetings among ANE scholars on topics relevant to the development of ETANA. Representatives will participate in conversations among archeologists to develop standards and commonly acceptable structures for the collection of archeological field data and electronic publication of research reports. Other conversations, already underway in early stages such as defining standards for the expansion of UNICODE to non-alphabetic languages relevant to ANE studies will also be supported. ETANA will host training sessions at professional meetings of ANE scholars on the use digital materials in their teaching and research.

  8. Archival Preservation

    ETANA will apply the latest acceptable standards of custodial and managerial responsibility for the electronic site/archive through migration and mirror sites, and other developing technological solutions. CWRU will provide a primary mirror site for storage and archiving of all digital data in the ETANA collection. Archived data in this site will include uncompressed files of the highest resolution available. Uncompressed TIFF images are the current format of choice for visual materials- for text, 600 dpi bitonal group of images. This site would be regularly and frequently backed up and refreshed through an automated system. Emulation and/or migration to newer generations of technology must be accompanied by both manual checks as well as checks for accurate software, proven to detect errors. Technical staff at CWRU will assist with the development of such software or in the modification and enhancement of existing verification software. As the preservation of digital data is still the subject of much controversy, CWRU will document experience with preservation strategies in terms of cost, scalability, and technical feasibility with the goal of gaining a comprehensive view of the economics of long-term preservation of scholarly materials in electronic formats.

  9. Standards and Access

    The resources developed by the ETANA project will be constructed according to the latest standards for interoperability such as those of the OpenArchives (archive metadata) and W3C (data exchange/preservation and web access). Significant portions of those resources, such as the ATK, the Scholars Commons, and the Core Texts segments of ETANA will be freely open to scholars. Publisher agreements will govern access to materials held in the Born Digital portion of the project.

Still to come: