Technical Information
Data
- Documents follow the Text Encoding Initiative P5 standard.
- Relationships follow the RDF standard, using our own schema.
Technologies
- Framework: Ruby on Rails 4
- Search and Browse functionality: Apache Solr
- RDF Querying: Apache Jena Fuseki
- Data Querying and Transformation: XSLT Scripts (powered by Saxon 9 HE)
- Data Indexing: Ruby Scripts
- Visualizations: InfoVis Toolkit
Process
Data Creation
First, photographs were taken of all documents. Since these photos did not serve an archival purpose, these were not taken according to any specifications.
Each document was encoded from the photographs, including in depth encoding of people, places, outcomes, relations, and more.
Separately, a CSV file was created that marks the relationships which can be inferred from documents. This process is not automated because of the complexity of relationships and the way they are recorded.
To index the data into Solr, a script goes through each TEI file and creates a static HTML view. A separate script creates XML suitable for ingest into Solr, and a Ruby script posts these to the Solr database. Much of the linking between documents happens during this process. For instance, a case document marks the related case, so we add the people mentioned in the case document to the case file.
In order to create the RDF for querying, a script is run on the CSV which specifies the relationships and pulls data from a TEI personography to create a triple file. Separately, the metadata specialist creates an RDF file to describe and constrain the data in the triple file.
Website Creation
The first version of the OSCYS website was written in Apache Cocoon. The current version of the site is written in Ruby on Rails.
After data is created, it is indexed into Solr, and the Rails website calls on Solr for most data operations, such as listings for the browse pages, pulling metadata for the document pages, and pulling most of the information for case pages.
To populate relationships on people pages and the visualizations, a call is made to Apache Jena Fuseki, which returns JSON which is then reformatted as needed to work with InfoVis Toolkit.
HTML views of the TEI files are preprocessed, and served as needed from a file system.