The GO Model   

 

The GO Model is a web-based database-driven collection management system developed by the BSCIT group in the Berkeley Natural History Museums. Its name refers to the rapid development time and fast web access enabled by its design. It is currently being used for digital image and document collections, specimen collections, and for a variety of natural history research projects. A list of systems that currently use the GO Model is below.

The GO model uses open source software such as Apache and MySQL, and the source code for GO model systems is available. Click Here for the paleotology database used by the University of California Museum of Paleontology. Please contact BSCIT for more information.

For a schematic diagram showing the overall design of the GO Model, see GO Model: Systems, Applications & Services (PDF).

Technology
The GO Model was designed to be web-based, and it is optimized for web-based queries and updates. All access is via a web browser. These interfaces, while flexible and powerful, are intentionally kept as simple as possible in order to support the widest range of platforms and browsers. The GO model supports public browsing and queries as well as password-protected administrative functionality such as data entry and editing, reporting, label printing, and inventory tracking. Because the system is web-based, it can be used anywhere a web browser and the Internet are available. This allows for distributed data entry, including in the field, and sharing of data, tools and resources with collaborators. The system consists of four main components: 1) a simple data model, 2) an application layer, 3) a web server, and 4) a relational database.

The Data Model
In order to provide the fastest possible response on the web, we have found that the database structures for storing data, also called the data model or the schema, must be constructed using as simple a design as possible. The more complex a data structure becomes, and the more it is normalized (i.e., the data is divided up among multiple tables by aggregating common fields), the worse the performance becomes as the size of the database grows. Some systems with complex data structures get around this problem by caching data in simpler structures, or by providing pre-selected "views" of the data, or by placing restrictions on the types of queries and the fields that can be queried. The GO Model instead "flattens" out the data structure as much as possible, and normalizes only when it is advantageous in terms of response time and/or ongoing data maintenance.

This strategy not only minimizes response time, but it also makes the database more adaptable to change, because a simpler structure is easier to maintain and make changes to. Data models are very dynamic, especially in a research setting, so ease of modification is critical.

Each system that uses the GO model has its own customized schema in order to accommodate application-specific data. In addition, most GO Model systems share data structures and data for commonly-used data, such as taxonomy, geographic names, and the like.

These PDF diagrams illustate the data models for some of the systems that use the GO Model:

The Application Layer
The application layer (often called the "middleware" or the "business logic") is a suite of shared libraries and utilities that process queries, display results, create forms and interfaces, validate and process data, and perform many other functions. The GO Model application layer is written primarily in Perl, although Java, javascript, and C are also used for specialized applications such as mapping and image processing. All applications are run by the web server out of cgi-bin. The application layer consists of utilities and applications that all GO Model systems share, as well as customized modules for specific systems and projects. Most web pages that users see are generated dynamically by these Perl scripts. This design allows core functionality and commonly-used utilities to be shared among many different types of collections, while also supporting the finely-grained customization needed for individual projects. When a new system is designed using the GO Model, only its unique attributes and customizations need to be developed, resulting in rapid development and a fast start-up time. Because the system is entirely web-based, it easily accommodates multi-directional interactions with other web-based services and resources. Some examples of this are listed below.

The Web Server
The GO Model systems currently use the Apache webserver for Red Hat Linux. However, no special httpd configurations are required, so any webserver that supports cgi programming and that has a fairly high response rate would suffice. The webserver managed by BSCIT runs on the same computer as the MySQL database server, although this is more for convenience than necessity. The BSCIT webserver also serves home websites for most of the museums in the Berkeley Natural History Museums and hosts a number of other software systems and research projects. As of Fall 2007, the BSCIT webserver processes around 150,000 unique database queries per day and serves more than 1.5 million files a day, mostly JPG images (see Database Query Statistics for details.)

The Database
The main function of the GO Model database is to store data and indexes, and support basic SQL such as select, update, and insert. Nearly all other functionality is pushed up to the application layer. Because of this design, the GO Model does not rely on a particular database vendor. Currently most GO Model collections are using MySQL, but the GO model was developed originally for Postgres and it also ran for several years using Informix. Any database that supports core SQL functionality can be used. This flexibility allows GO Model systems to easily adapt to new technologies as they become available, or to accommodate administrative needs for a particular database vendor.


Systems that Use the GO Model

  • AmphibiaWeb A resource for global amphibian decline, includes species accounts, photos, maps, and dynamically-generated content from other resources for more than 6,000 species world-wide.
  • Berkeley BioKeys A tool for non-scientists to identify California plants and animals.
  • CalPhotos An image database of more than 150,000 photos of plants, animals, and other natural history subjects; includes image upload and annotation, provides web-based services to other resources.
  • Digital MVZ Field Notebooks Scanned field notebooks dating back to the early 20th century from the Museum of Vertbrate Zoology.
  • DocuBase A digital document management system for journal articles, reprints, and the like. Includes scanning, OCR and databasing of archival documents as well as support for multiple digital formats.
  • Ecological Flora Database A database of ecological characteristics of California plants including life history, phenology, morphology and other traits.
  • Essig Specimen Database The collection management system for specimens in the Essig Museum of Entomology.
  • UCMP Specimen Database The collection management system for specimens in the University of California Museum of Paleontology
  • Moorea Biocode A "field application" system that allows scientists to input collecting events, specimen records and photos in the field, query and issue reports, upload data for DNA processing, and more.

History
The GO Model had its earliest roots in a Computer Science research project in 1993-94 within the Database Research Group at UC Berkeley, which also developed Postgres. A new image database was needed to support research on content-based retrieval, and the schema was to be based on one already in use for digital documents. This image database grew, became a web-based database with the explosion of the World Wide Web, and by 1998 was called CalPhotos and was a part of the NSF-sponsored Digital Libraries Initiative. See History of CalPhotos for more information about this. The Digital Library Project at Berkeley continued to develop CalPhotos, and used it as basis for other systems such as a document database (now called DocuBase), a California Plants observation database (now called Calflora.org) and a number of specimen databases for collections within the Berkeley Natural History Museums (BNHM). In 2005, the Digital Library Project ended, and some of its software engineers moved to the BNHM to continue supporting these systems and to develop new systems using the GO Model.


BSCIT   University of California, Berkeley    Berkeley Natural History Museums
Copyright © 1995-2011 UC Regents. All Rights Reserved.      Questions and Comments
Last updated: Oct 14, 2008