The GO Model is a web-based database-driven collection
management system developed by the BSCIT group in
the Berkeley Natural History Museums.
Its name refers to the rapid development time and
fast web access enabled by its design. It is currently being used for
digital image and document collections, specimen
collections, and for a variety of
natural history research projects. A list of systems that currently use
the GO Model is below.
The GO model uses open source
software such as Apache and MySQL, and the
source code for GO model systems is available.
Click Here for the paleotology
database used by the University of California Museum of Paleontology.
Please
contact BSCIT for more information.
For a schematic diagram showing the
overall design of the GO Model, see
GO Model: Systems, Applications & Services (PDF).
Technology
The GO Model was designed to be web-based, and it is
optimized for web-based queries and updates. All access
is via a web browser.
These interfaces, while flexible and powerful,
are intentionally kept as
simple as possible in
order to support the widest range of platforms and browsers.
The GO model supports public browsing and queries as well as
password-protected administrative functionality such as data entry and editing,
reporting, label printing, and inventory tracking.
Because the system is web-based, it can be used
anywhere a web browser and the Internet are available.
This allows for distributed data entry, including in the field,
and sharing of data, tools and resources with collaborators.
The system consists of four main components:
1) a simple data model,
2) an application layer,
3) a web server, and 4) a relational database.
The Data Model
In order to provide the fastest possible response on the web,
we have found that the database structures
for storing data, also called the data model or the schema,
must be constructed using as simple a design as possible.
The more complex a data structure becomes,
and the more it is normalized (i.e., the data is divided up among
multiple tables by aggregating common fields), the worse the
performance becomes as the size of the database grows.
Some systems with complex data structures
get around this problem by caching data in simpler structures,
or by providing pre-selected "views"
of the data, or by placing restrictions on the types of queries and the
fields that can be queried. The GO Model instead "flattens" out the data
structure as much as possible, and normalizes only when it is
advantageous in terms of response time and/or ongoing data maintenance.
This strategy not
only minimizes response time, but it also makes the database more
adaptable to change, because a simpler structure
is easier to maintain and make changes to. Data models are very dynamic,
especially in a research setting, so ease of modification
is critical.
Each system that uses the GO model has its own customized schema in order to
accommodate application-specific data. In addition, most GO Model systems share
data structures and data for commonly-used data, such as taxonomy,
geographic names, and the like.
These PDF diagrams illustate the
data models for some of the systems that use the GO Model:
The Application Layer
The application layer (often called the "middleware" or the
"business logic") is a suite of shared libraries and utilities
that process queries, display results,
create forms and interfaces, validate and process data,
and perform many other functions. The GO Model application
layer is written primarily in Perl, although
Java, javascript, and C are also used
for specialized applications
such as mapping and image processing. All applications are run by the web server
out of cgi-bin. The application layer consists of utilities and applications that
all GO Model systems share, as well as customized modules for
specific systems and projects. Most web pages that users see
are generated dynamically by these Perl scripts.
This design allows core functionality and commonly-used utilities
to be shared among
many different types of collections,
while also supporting the finely-grained customization needed for
individual projects. When a new system is designed using the GO Model,
only its unique attributes and customizations need to be developed,
resulting in rapid development and a fast start-up time.
Because the system is entirely web-based, it easily accommodates
multi-directional interactions with other web-based services and resources.
Some examples of this are listed below.
The Web Server
The GO Model systems currently use the Apache webserver for Red Hat Linux.
However, no special httpd configurations are required, so any webserver that
supports cgi programming and that has a fairly high response rate would suffice.
The webserver managed by BSCIT runs on the same computer
as the MySQL database server, although this is more for convenience
than necessity. The BSCIT webserver
also serves home websites for most of the
museums in the Berkeley Natural History Museums
and hosts a number of other software systems and research projects.
As of Fall 2007, the BSCIT webserver
processes around 150,000 unique database queries per
day and serves more than 1.5 million files a day, mostly JPG images (see Database Query Statistics for details.)
The Database
The main function of the GO Model database is to
store data and indexes, and support basic SQL such as select,
update, and insert. Nearly all other functionality is pushed up to the application layer.
Because of this design,
the GO Model does not rely on a particular database vendor.
Currently most GO Model collections are using MySQL, but
the GO model was developed originally for Postgres
and it also ran for several years using Informix. Any database
that supports core SQL functionality can be used.
This flexibility allows GO Model systems to easily adapt to new
technologies as they become available, or to accommodate
administrative needs for a particular database vendor.
Systems that Use the GO Model
- AmphibiaWeb
A resource for global amphibian decline,
includes species accounts, photos, maps, and
dynamically-generated content from other resources for
more than 6,000 species world-wide.
- Berkeley BioKeys A
tool for non-scientists to identify California plants and animals.
- CalPhotos An image
database of more than 150,000 photos of plants, animals, and other
natural history subjects; includes image upload and annotation,
provides web-based services to other resources.
- Digital MVZ Field Notebooks Scanned
field notebooks dating back to the early 20th century from the
Museum of Vertbrate Zoology.
- DocuBase A digital
document management system for journal articles, reprints, and the
like. Includes scanning, OCR and databasing of archival documents
as well as support for multiple digital formats.
- Ecological Flora Database
A database of ecological characteristics of California plants including life history, phenology, morphology and other traits.
- Essig Specimen Database
The collection management system for specimens in the Essig Museum
of Entomology.
- UCMP Specimen Database
The collection management system for specimens in the University of
California Museum of Paleontology
- Moorea Biocode
A "field application" system that allows scientists to input
collecting events, specimen records and photos in the field, query
and issue reports, upload data for DNA processing, and more.
History
The GO Model had its earliest roots in a Computer Science research project
in 1993-94 within the Database Research Group
at UC Berkeley, which also developed Postgres. A new image database was
needed to support research on content-based retrieval, and the
schema was to be based on one already in use for digital documents.
This image database grew, became a web-based database with the
explosion of the World Wide Web, and by 1998 was called CalPhotos and
was a part of the NSF-sponsored Digital Libraries Initiative. See
History of CalPhotos
for more information about this. The Digital Library Project at Berkeley
continued to develop CalPhotos, and used it as basis for other systems
such as a document database (now called DocuBase), a California
Plants observation database (now called Calflora.org) and
a number of specimen databases for collections within the Berkeley Natural
History Museums (BNHM). In 2005, the Digital Library Project ended, and some of
its software engineers moved to the BNHM to continue supporting these
systems and to develop new systems using the GO Model.