Science in the Making

Digirati are working with The Royal Society to produce a pilot for The Royal Society Journal Collection: Science in the Making, a new platform for archival items related to 350 years of published material in the Philosophical Transactions, the oldest continuously published science journal in the world. This new archive will include manuscripts, marginalia, illustrations, and correspondence that so far have only been available to people visiting the Society. The platform has a wide potential audience, from academics to schoolchildren investigating the emergence of modern science through the source material in the Society’s archives.

This blog will report on our progress as we work together to build the pilot platform. We’ll be combining two open standards – IIIF and Web Annotations – to present the archive material and enable contribution by archivists and the public alike. We’ll be surfacing the content in the Omeka S platform, and using some of the components of the DLCS to provide a standards-based back end for annotation, search, image and text services. This combination of technologies allows for rapid progress and flexibility as we research, experiment, develop and test the evolving platform.

The Royal Society have already identified many case studies and types of user story; for the pilot we will be concentrating on three case studies that will best test the evolving platform:

  • A scientific history of colours, from Newton to Maxwell [1664–1860].
  • Thomas Henry Huxley: author, communicator, referee, editor and secretary.
  • The Scientific Advantages of Antarctic Expedition [1887–1913]

 

Advertisements

Data Model and API

July 2017; updated Nov 8 2017

After some productive UX work, we need to start making things happen in software. This means we need to think about how that work informs a data model. We must address the requirements that emerged from the UX workshops as well as The Royal Society’s stated aims for the site.

The model doesn’t have to be 100% right from the start. We need to see how people use the site, and how developers use the API (that means us too). That’s the point of the pilot. But we need to start with a model that works for exploration by end users, for web editorial activity, and for the addition of new knowledge by both the public and The Royal Society. We also need to keep in mind how the model accommodates other library and archive content, from The Royal Society and elsewhere. The pilot content is from the Royal Society’s Journal Collection archive, but an important aim is to incorporate objects from other collections internal and external to The Royal Society.

Requirements from workshops

The work so far suggests that the model and API need to support user experiences that include:

Navigation of content via aggregating pages

  • Things with the same topic (keyword, subject heading); can be from user-generated tags as well as existing descriptive metadata
  • Things in the same date range, or place
  • People (and other agents) associated with an object; and from people, more things

Navigation of content through extensive linking

  • Huxley is the referee of this manuscript, Rainey its its author, it became the published journal article at this DOI…
  • Links allow users to follow threads through webs of relationships between content and agents

Visualisations

  • Timelines
  • Maps
  • Graphs of relationships between objects and people (or more generally, agents) – the Web of Discourse

Contribution of new content by end users

  • Transcriptions of individual views (distinct from a whole item; a page of a letter)
  • Tags (to start with, assigning subject headings from LCSH)
  • Comments, narrative

 

web-of-discourse
The web of discourse – the model supports the projection of results into a graph, here showing correspondence for items tagged with the topic “Colour”

Who are the users of the model?

Who benefits from the the model? Who are its users and where and how do they use it? It helps to think of three categories (there are more and subtler distinctions).

Users unaware of the model – people interacting with the content, on the web. These users might sense the influence of the model in the design, information architecture and user experience of the site, but have no need for a formal understanding or description of it.

Aware users – editorial and other content creation or management activities conducted by Royal Society staff happen in the context of the model, but the abstract model itself is meaningless until expressed in software that staff use – which means editorial or curatorial user experience. We’re not just transforming source metadata direct to the web, we have a CMS as well. The source metadata is augmented and enhanced by editorial processes and user contributions in that CMS. Some content has no augmentation, some content has extensive additional content created.

We don’t want the choice of CMS to dictate a conceptual data model, but if that conceptual data model has no usable alignment with a content scheme in a CMS it won’t work – it may be beautiful, but you have to build content with it; define content types, make it easy for editors to make new instances of content and link them together. The editorial user sees the model expressed as content management processes. The model finds expression in the content types and fields of the CMS. The CMS appearance of the model doesn’t have to be identical to the published description of the model. Content workflow, organisation and other practical considerations alters the expression of the model in a CMS.

Developers who might explore the APIs or even read some documentation about the model and its expression in APIs. Developers means us, as makers of the site for the first type of user. If the model doesn’t work for us then why should anyone else be expected to use it! Developers also means other people building things we haven’t thought of.

Activities, Roles and Agents

Our first decision is about our approach to modelling the relationship between the archival items and people.

Archive item RR/5/75 is a referee report by Lord Kelvin on Dynamical theory of the electromagnetic field by James Clerk Maxwell, which is item PT/72/7. At least two people are involved with this item. Correspondence has authors and recipients. Submitted material can have multiple referees. Photographs have subjects and photographers. People have roles that describe their relationship to archive items.

So far we have the following roles:

roles

Kelvin and Maxwell are agents that have some role in the lifecycle of the object. Rather than choosing some relationships as direct properties of an object, we adopt an event-driven approach where the relationship between objects and agents is indirect; it goes via an Activity, which is where a person and a role come together in relation to an object. If we are lucky, we might have information on when and/or where the activity took place:

RS Data Model - activity

This approach supports the kind of narrative, navigation and visualisations suggested by the UX work. It is important that it works with sparse data, however. A rich web of multiple roles involved with an object is great, but at the other end of the scale we have objects with just one person connected via one role, with no time or place information. This still needs to drive a good user experience.

Editing the model for the web

All the CMS screen shots in this post are from Omeka S, where the the archival information is enriched with editorial content. The names of classes are not the same as in our conceptual model – Units are ArchiveItem, Agent is Person, Activity is PublicationAction – we’re feeling our way from the specific to the general. The content management experience is important too, otherwise there won’t be much for the end users to look at. Editors don’t write triples to a triple store, they edit content in the context of a content model.

Items:

items

An item:

oak-bush-legs

The expression of activity as managed content:

linked-res

An activity (PublicationAction), here in edit mode:

publication-action

The PublicationAction item in Omeka S links to a Role item and an ArchiveItem item (the Activity has a Role and a Unit). We don’t have the time or the place for this activity, those fields are blank.

Aboutness, Ofness and Onness: how we use IIIF

One important feature of our approach is a strong opinion about the role of the IIIF Presentation API. It’s far more than a standardised way of delivering pictures of the manuscript pages, drawings and photographs. The IIIF Presentation API is our digital surrogate for the materiality of the object. IIIF makes this representation a two-way street. Other can people can tell us things about the object in IIIF-space. IIIF is for presenting the object and all its content (e.g., images and text transcription), and establishes the shared space in which site users (and anyone else on the web) makes assertions (e.g., adds content of or about the object).

We think there is a clear division between the Royal-Society-specific data model for a given item (the main subject of this post), and the standardised IIIF representation of the item, its digital surrogate. That’s what people are looking at. It’s clear which model is responsible for what. The bespoke data model provides the aboutness, IIIF provides the ofness and onness. I’m using the terms aboutness and ofness in a particular way that needs clarification:

Content about the object tells you who wrote the letter, when they did it, what size it is, that it’s about physics, and optics in particular, and theory of colour. This is descriptive metadata. It’s what most APIs in cultural heritage are about.

Content of the object, in our terms, is the detailed presentation of a digital surrogate, in shared annotation space. It’s for painting pixels on the screen to show manuscript pages, it’s about the page-level or line-level or word-level transcription of manuscript material, it’s about any other content or data that can play a role in presenting the materiality of the object. In IIIF terms, this means annotations with the painting motivation; that is, annotations that render content in the shared canvas space established by the IIIF representation.

Content on the object is every other type of annotation, which includes commentary, notes, links to articles, blog posts, translations, editorial content, narrative description. As the representation of the physical object in shared digital space, the IIIF manifest is the integration hub for content. The IIIF canvases are the most obvious carriers of content, but annotations can associate any IIIF resource with content – manifests, ranges, canvases. Painting annotations (ofness) only target canvases; other annotations (onness) can target anything in IIIF.

RS Data Model - Page 1

There might well be an overlap between the content linked from the descriptive metadata about the unit, and the IIIF resources. Some information is available in both representations, but with a different purpose. The identification of subject is a link to a topic page; this subject is a property of the unit, and a tagging or classifying annotation on the manifest.

All three (about, of and on) are required to build a rich content environment. A traditional metadata API offers little for the representation of the object it describes. It’s information about the object with the material – the library card, not the book on the shelf. With IIIF, we can load our virtual equivalents of Post-It notes, interpretative essays and commentary on the IIIF object, the carrier of that content in shared web space, addressing it and its parts as precisely as we need.

Allocating content to the IIIF representation makes a lot of modelling head-scratching go away. For example, transcription could range from the single word of a ship name in a photograph of the Discovery (NAE/1/15) to the full text of a printed work. This is some of the content of the object that we wish to make available. The ofness model (the IIIF Presentation API) provides us with the means of associating whatever content that culture and context deems important for an object with an abstract representation of the space it occupies.

This approach, crucially, puts all the information you have that can possibly be considered of or on into a interoperable domain, where it can work with other content, and other software such as annotation servers and clients.

Relationship to other models

The starting material for Science in the Making is the Society’s Journal Collection, which is archival material conforming to ISAD(G). The archive hierarchy and its finding aids are completely unused in the launch of the pilot, so the model does not address “part of” relationships except at the item-image level (or manifest-canvas in IIIF terms). This is a conscious omission, we’d like to see more real world use before addressing membership relations to other intellectual entities. We don’t need it straight away because in the pilot, navigation is by topic and other aggregations rather than by archival structure. Whether at the API level or at the user interface level, navigation by both mechanisms is interesting and important to explore.

Roles and Activities could be mapped from relators in MARC21, if we need to integrate library material into the model. There may or may not be enough information to give temporal or spatial coverage to an individual activity, even if more than one can be generated from the record. This is fine, the activities do not require temporal or spatial information to participate in networks of discovery.

The CIDOC-CRM is echoed in the event-driven approach to describing the material but there is no attempt to use any of it. The use cases for exploration and presentation of the resources suggest an event-driven approach in the data model, it falls out naturally from the UX. This means the model gives access to the processes (in time and space) through which the content came to be. In a static representation (a typical bibliographic description), this information may be present but hard to get at. In contrast, museums are more used to describing objects as dynamic processes, with lifecycles and provenance.

The Europeana Data Model allows for an object-centric description and an event-centric description, even for the same object. There are elements of this approach here.

It would be an interesting exercise to describe the history of an archival unit using ActivityStreams, which is a W3C Technical Recommendation from the Social Web Working Group. ActivityStreams has many applications beyond modelling social interactions on Facebook, but the interactions between humans that make up the lifecycle of an intellectual object are social interactions, and the model could work well for this.

Acknowledgements

The recasting of the terms aboutness and ofness in IIIF terms was inspired by this comment: https://github.com/IIIF/api/issues/1224#issuecomment-324713788. I have added “on-ness”.

Web pages and viewers, meet things with content!

Digitised objects are not all the same size. Users want different things from them. What to do?

Descriptions of things live in library, archive and museum catalogues. Sometimes, each thing in the catalogue has its own page on the web.

Things get digitised. Now the institution has a digital version of the thing where there was once just a catalogue record. Now there might be a lot of content to show for each digitised thing in the catalogue. At least one image, possibly thousands. There could be transcriptions of the text in those images, and all sorts of other related content produced by digitisation. There could be content contributed only because the thing is now digitised. There can be huge variation in the amount of content associated with a thing, or the amount of content that might later accumulate for it over time.

Catalogue records are generally about this same size as each other, and any variation in the size of a catalogue record doesn’t really matter anyway. If there’s more metadata, it’s just a longer web page. But the things themselves are not all the same size. One thing might be scrap of paper, another thing might be a 36 volume encyclopaedia. One thing might be a single page letter, another thing might be the contents of a large archival box.

What was once a user experience challenge about how to present similar-looking things (catalogue records, API endpoints, blobs of data) now becomes a challenge about how to represent wildly different things (representations of objects and their associated content).

Consider these examples of catalogue records and their digital representations, from the Royal Society and elsewhere.

1 – Stomachs

stomachs.jpg

The physical artifact is a pencil drawing of the stomachs of a turtle and a shark side by side. Two smaller drawings have been glued on top, one of a frog’s stomach and one of a snake’s stomach. The three drawings are separately created things, and have each been given their own identifiers in the archival hierarchy – there are three separate records, PT/73/1/20, PT/73/1/21, and PT/73/1/22.

There is only one image, because there was only one thing to photograph: the sheet of paper on which the turtle and shark stomachs were drawn and the other two attached. If we presented these three archive records as distinct resources on the web, each presentation would show the same image. [1] We could crop the image differently in the three cases, but this would hurt the user’s experience of looking at the object. In a viewer, you’d want to see the whole piece of paper, just as you would if your interest in a drawing of a snake’s stomach led you to to the archive in person and you had the sheet in front of you on the table. Although there are three catalogue records, they are all about different parts of one real world object.

2 – A letter

Referee’s report by Charles Robert Darwin on a paper by [William Benjamin] Carpenter, [‘Researches on the Foraminifera’]

darwin-letter-1.jpgdarwin-letter-2.jpgdarwin-letter-3.jpgdarwin-letter-4.jpg

This is a four page letter, and a transcript is available. It makes sense to represent this object as a single reading experience in a viewer, it corresponds naturally to a human concept of a thing.[2]

3 – A book

This example is A Tour in Wales, first published in 1778, here shown in the National Library of Wales: https://viewer.library.wales/4692237. The viewer owns most of the real estate of the web page, to avoid confusion about information conveyed about the object on the page (none in this case) and information conveyed within the viewer (all of it).

pennant_uv.jpg

Other content and metadata that isn’t rendered by the viewer would have to live elsewhere, on a different page. [3]

4 – Sets of photographs

Sometimes a catalogue record describes more than one photograph. In this example from the Wellcome library, one archival record (and hence one web page with one viewer on it) comprises 73 assorted photographs and diagrams.

photographs.jpg

https://wellcomelibrary.org/item/b2005127x

The record level item is a set of sometimes quite distinct things. A user can share a deep link to a particularly interesting photograph, but there’s no concept of that photograph as an independently addressable thing in its own right (at least, not for the user, not at the level of web pages). That photograph doesn’t have its own web page, somewhere additional content about it can live. Perhaps this photograph should have its own page, whereas any one of millions of book pages presented in the same way should not. But as far as the presentation goes it is exactly the same kind of thing as a book page; it is an image in a sequence rendered by a viewer. [4]

In The Royal Society pilot project each photograph generally has its own catalogue identifier and therefore (in our current thinking) its own item page. One of the case studies we will be looking at is Polar Exploration. For each of these photographs we have a whole web page, as long as we like, to say what we want about it and present its relationship with the rest of the world.

The different decisions about cataloguing the photographs in these two examples (which may have as much to do with time constraints and reasonable workload as much as institutional practice) determines the prominence of an individual photograph on the web, its identity as a thing of interest in its own right, in a way that was irrelevant for physical visitors to the library requesting to look at the material in person. When there is one web page per catalogue entry, the user experience is quite different for photographs catalogued as singletons and photographs catalogued as members of a set. And therefore the significance of the object on the web is different in the two cases.

5 – A manuscript from the Newton Papers

newton-snowflakes.jpg

This is one page from Newton’s early drafts of the Principia. The archival item is a 1560 page manuscript.

The volume of content that each image in the manuscript could carry is huge. Some of the Royal Society material is like this. There is a lot to say about each image, a lot of links inbound and outbound, a lot of commentary, a lot of tagging. The user experience must satisfy the needs of users who want to look at the image content, and the needs of those who want the wealth of surrounding information. Each image of the manuscript has the user interface requirements of a web page.[5]

https://cudl.lib.cam.ac.uk/view/MS-ADD-03958/4

What is the focus of attention?

If something isn’t digitised, a visitor’s gaze can only alight on text on web pages that look pretty much the same regardless of the the thing they describe. But if the thing is digitised, there’s all sorts of places to look. There may be thousands of distinct views to see, or texts to read. When you can see objects, they look different from each other.

The user experience is usually addressed by putting the object in a box – a viewer – and including the viewer on a web page. This might be the catalogue page, or something directly linked from it. This can work very well, especially in large digitised collections and for book-like things, because the transition from web pages to enclosed navigation in a viewer makes sense for the object in that context. On the object’s page, the user explores the visible content of the object within its box. This allows the web page in which the viewer sits to delegate any potentially complex interior structure or content of an object to the viewer application. The page is for item-level descriptive metadata and content, and if the user wants otherwise her focus of attention has to stay in the viewer when interacting with the object. There’s a clear boundary between the concerns of the hosting web page and the concerns of a viewer-of-objects. If the hosting web site is a library catalogue (or a web site that behaves in a catalogue-like way), then this separation of concerns works; the site’s purpose is to help the user find the object, so now let her look at it. The web page is about the thing; how a user interacts with the constituent parts of the thing is an interaction between the user and the viewer application, not an interaction between the user and the item page.

What about user experiences that are not catalogue-like? It’s not always just about finding the object THEN having a look at it. The institution might have a lot more to say about some objects that others. And the same object can have a different focus of attention in different contexts. A digitised version of Newton’s Principia or the manuscript of Middlemarch may be served well in a book reading UI in a catalogue-like user interface, but have a different presentation entirely in an interface made for close reading, annotation or study of that object, where every image has a story to tell, or a single paragraph could be the subject of an essay.

There’s quite a lot going on on any one page of Newton’s notes and drafts for the Principia (https://cudl.lib.cam.ac.uk/view/MS-ADD-03965/9) or in notebooks:

newton-snowflakes

The amount of content, and its variety, suggests a web page for every image if we want to provide transcripts, commentary and ask users to contribute too. There’s simply too much going on to expect a generic book-reading interface to do what we want. We’re going to need to craft something extra.

In the Science in the Making project, we have archival items (catalogue records), and we think there will be a page for each item. We’re not, at this stage, required to handle book-like material in the same user interface. We want to show a lot of content for some of the things in the collection. Some of that content is about the item, and some of it is specific to individual item images. A transcriptions of a page of a letter, tags of people in a page, commentary on figures – these are examples of content that belongs to a particular image. We think we have enough of this content, and enough variation in user requirements, that each canvas also needs to have its own web page, it’s own place on the web that is about that image and carries some content about that image. But that means competition for the user’s attention in the information architecture of the site. Are we introducing a tension between a page for the object, and a page for each of its images. They have different requirements.

The item page carries content about the object. Its title, its description, its author, and links to other people associated with it through various roles. The item has links to other items. The item usually has descriptive metadata because the item was the thing that was catalogued.

A child page for each image gives us room for transcripts, direct tags on photographs or drawings or mentions of entities in text, commentary, explanation, notes and other annotated content. It’s a different kind of page.

Here are some wireframes for an item page for a multi image object, a letter:

item-multiple.jpg

This page has object level information, the letter’s relationships to people and other objects. But in some treatments it also has the image of the first page visible, and a transcription of that first page. In those treatments the first page of the letter is favoured; we need additional UI to get to other image pages.

What happens on a child page? Is there any object level information there? Just the image-specific content?

The tension between these two different types of page is most clear when there is only one image for an item. Is that going to lead to confusion? For the user, what’s the difference between these two pages? They both appear to be about the same drawing, but in different ways. Why am I seeing two different pages for the same thing? If there is one image, do we merge the functionality of an object page and an image page? Would that also lead to confusion?

What happens when we don’t have any content (other than an image) for a page? Especially if it’s the only page and we separate out object functionality on one page and image functionality on another?

Is there anything wrong with offering multiple ways of viewing an object? To have an enclosed viewer if the user wants to just read through the letter, but provide the complexities and potential of item page and image pages for closer study and exploration? If we’re describing the objects using open standards then it’s easy for us to offer multiple ways of viewing the material, but is that desirable? If so, do we offer all the options all the time, for any object, or do we favour particular presentations for particular types of material? We don’t know what the user’s intention is or their relationship with the material. What to one user is just another page of a book is to another the foundation of a thesis.

The pilot project will allow us to put some of these ideas to the test.

The focus of attention is different…

  • The focus of attention is different for different kinds of object.
  • The focus of attention is different for the same object in different contexts.
  • The focus of attention is different for different people.
  • The focus of attention is different for the same object in the same context at different times, depending on how much additional content is available

In the examples earlier, A Tour in Wales happens to be one of eight volumes, and all of them are part of a crowdsourcing project where they are presented in a website with one page per image. It’s the same resource driving the user interface in both cases [6]. We’re providing a different environment to view it in, with very different functionality, for viewing and contributing complex additional structured content. The Royal Society content feels more like this, because of the amount of functionality available per image. In the Pennant example the difference between the page for the volume and the page for each image is obvious – the volume page has thumbnails linking to the image pages.

So, we have some competing UX requirements to balance in producing a template driven presentation of the Science in the Making archive material, even though we don’t need to tackle things like books and newspapers. These tensions come from the different size of the material, and the different amount of content. What other issues are there?

Apart from the tension between image and item pages there is another potential problem with presenting archive material outside of the archival context. We’re taking records that were created as members of a hierarchy, and reusing them in a site where navigation is driven by topics. We’re taking items out of a tree and expecting them to work well in a web. Or rather, we’re taking some items out of a tree – the pilot material is not the whole archive. We think that this will work for the pilot material, but that assumption is one that needs testing with user research as the pilot develops and starts to be used.

The four “focus of attention” assertions made above are hypotheses for user testing once the pilot is ready. Possible outcomes could be that a single set of templates might not be enough to cover all these cases or that we might need to compromise the UX in one direction. User types can range widely from a layperson who happens to be just a bit curious on the subject to a seasoned researcher. A single UX/UI approach to very different information needs might be possible, and might be achieved only after a few cycles of design and user testing.

Footnotes – IIIF implementation

[1] In IIIF terms, there could be one manifest for each of these three archival records, and each manifest would contain one canvas annotated by the same image in each case. We could use IIIF fragment selectors to crop the image differently in the three cases, but this would hurt the user’s experience of looking at the object. Although there are three catalogue records, it’s really just one object, it’s really just one manifest. Here, three catalogue records might all point to different parts of the same IIIF manifest, because the IIIF manifest is modelling the physical object, the sheet of drawing paper.

[2] The letter is naturally one IIIF manifest, one manifest corresponding to the one catalogue record.

[3] The book is naturally modelled as a IIIF manifest, a single reading experience.

[4] A 300 page book and a 73-image set of photographs are both described by IIIF manifests and rendered in a viewer. IIIF gives us extra presentation metadata to help convey the bookness of the first object – enough information about pagination to ensure a reading experience is an accurate representation of the physical object in a viewer that observes IIIF viewing hints. But the decision about the individuality of an image within a sequence is an information architecture and user experience decision, made in two steps: when modelling the objects in IIIF, and when deciding how to render them to the user.

[5] In IIIF terms, each of the canvases of each manuscript of the Newton Papers could be the carrier for content linked via the mechanism of annotation. Transcription, commentary, tags, citations, links outbound and inbound, descriptions – these are all content that a web page for a single image could convey to the user, and also content that we might want to capture from users on that page.

[6] It’s exactly the same IIIF manifest that drives the Universal Viewer on the main site, the Library only need publish it once.

Second UX Workshop, 27 April

This time, the project team assembled at Digirati London HQ. Using the ideas and sketches we produced in the initial workshop, Diego Lago (our head of UX) had prepared some wireframes to help us explore elements of the user interface. As always, the wireframes are not polished designs, but devices to drive thinking about what things appear on a page and where they go.

The home page is a busy mixture of content with many calls-to-action, and we had a think about what should be curated, what should be completely dynamic, and whether some content is a mixture of both.

home

The team wanted the content at the top to be wholly curated, but some aggregations could be a mixture, with favoured material more likely to appear but always a random element, to encourage discovery of new things.

The problem of titles

Existing archive material rarely comes with snappy, web-ready descriptions that entice users. Clickbait was not foremost in the archivists minds when the record was created. Some archive titles are very long and might not reveal much in the first few words, others are succinct, but aren’t necessary a complete description when taken out of their context in an archival hierarchy. It turns out that all the material we are looking at for the pilot has reasonably clear descriptions at the item level, such as “The knoll and slopes of Mount Terror from sea ice”. But they are still sometimes long: “P. A. M. Dirac’s second referee report on D. R. Bates, A. Fundaminsky, H. S. W. Massey and J. W. Leech’s paper ‘Excitation and Ionization of Atoms by Electron Impact-The Born and Oppenheimer Approximations'”. In this example, the clickbait that triggers you might be the subject (the Born-Oppenheimer approximation), but it appears right at the end. If the user interface seeks balance by truncating long titles with an ellipsis, the user might not have enough of a reason to click on the truncated link.

We talked a bit about the possibility of editorial intervention – to associate a more web friendly title with an item when manually selecting it as a feature. This has its attractions, although it does introduce more workflow. You would still get a mixture of long and short titles when the UI is partly generated dynamically. There may be a concern over the integrity of the title.

For the pilot we decided that we would probably stick to the archival description and see how it goes.

Topics

As mentioned in the previous workshop post, topics are for the most part entirely machine-generated pages (or rather, they are generated by aggregating existing metadata, annotations and content).

We took an example of a topic as entity – a person, Thomas Henry Huxley:

topic

We talked through what would be machine generated and what would be curated. Huxley will be one of our three case studies, our enhanced topic pages to which we can add anything we like beyond the default aggregations that all topic pages get. For the most part though, we used Huxley as a stand in for any topic page.

We talked about the extent to which external sources can provide content, for example the image and text coming from Wikipedia. This topic features a timeline, and we do have some information that could be used to automatically populate it. But it would be variable across different people pages; for some people we would have an interesting timeline, for others, a sparse one.

People are connected to archive material through a role, and this topic page uses that information to generate links to content under the headings “Huxley the author”, “Huxley the referee”. As in other areas of this project, the workshop suggested many interesting things that could be done with the content which are worth exploring and discussing – while keeping an eye on the time and budget for the pilot!

It’s good that we all appreciate that it is a pilot, and that we’re not going to get the right answers straight away. The pilot is just the first iteration of the platform, so we have to be selective about what we decide to develop, reach a decision quickly and build it fast to see what happens.

When we develop the Huxley topic page as a case study, we can add in some more visualisations of correspondence, we can add editorial content more suitable to the page’s role in the platform, and other content and functionality – but we’ll get to that a bit later!

The item

Now to the most difficult part. How to present the archival item?

item

We had a really interesting discussion around this, and it raised user experience challenges that affect presentation of digitised material and its associated content across many cultural heritage projects. It concerns the focus of the user’s attention when looking at archive (and in fact, any) digitised material. I go into detail in another blog post [LINK TO FOLLOW] .

The Science in the Making project has a special focus – it presents the particular archive material associated with the published Philosophical Transactions of the Royal Society, and we verified that in all cases for the material in the pilot, there will be published material on the journal platform that the archival material should link to. This will often be a journal article, but it can be other published material such as editorial and letters. The user can always go from the archival material to the journal article on the Highwire platform – but how do they get back? We can do our best to drive traffic from the archive material to the journal platform, but what does the user journey look like for someone exploring across both resources?

Next steps

Now we have some work to do. We’re using the Omeka S collection management system for the pilot, along with modules we’ve developed for other projects.

  • Put all the archival images in a hosted instance of the DLCS platform, so they have IIIF Image API endpoints
  • Build just enough of a model in Omeka S to produce first implementations of the wireframes
  • Generate IIIF manifests for each of the archival items, and create Omeka items from them
  • Translate the metadata we do have into annotations to be stored in our annotation server
  • Index everything in our IIIF-aware search server

First off, let’s get all the items visible, which means IIIF.

First UX Workshop

We started the project formally on 21 April with a UX workshop run by Diego Lago, Digirati’s head of User Experience.

We’re not starting from scratch. The archival items are already described in the Society’s CALM archive management system, and we have access to metadata that links an archival item with the content that was published in a journal. If we have have some drawings, or a manuscript, or a referee’s letter concerning publication, we know what article or other editorial in a published journal it relates to, so we can link to it.

We also know something of people, and their roles in relation to individual items. We know that Person A was a referee, Person B was an engraver. We have some keywords for each item, and sometimes know a location associated with each item. We want topic-driven navigation for serendipitous discovery, and it is clear that concepts such as Person or Keyword will be expressed as topics (or entities) with a page for every Person, Place or Keyword that we know about. But we don’t know a great deal more about what our topics are yet.

We have seen examples of the archive material we have to work with, and we have a goal that the platform we build should enrich that material, accruing more content and metadata over time from the actions of curators and users. We know that the platform goal is to showcase the archival material associated with journal articles, for a mixed audience.

Everyone brings their existing mental model of what we think we are building. Some of us are not well acquainted with the content, and that model is hazy. Others know the content in great detail. We don’t have a shared understanding of the domain model – or we have different conceptions of it. The first task of the day was to start working on that shared understanding.

Working alone together

The workshop alternated between exercises that we each did on our own, and group discussion.

We started by each writing down all the elements we could think of that would go into making the site. Every scrap of content that could be labelled, or any concept associated with content that might have a bearing on what the site shows and how it shows it. Things that may become pages and links.

post-its

The results of this exercise are a mixture of the concrete and the abstract, the class and the instance. Where one of us writes “Person” another writes “Author, Illustrator, Photographer, Witness, Professor, Student”. Common concepts emerge:

Person, Place, Subject, Role, Comment, Manuscript, Marginalia, Correspondence, Image, Archival Unit, Journal Issue, Journal Article, Topic, External web site, Referee’s reports, Date, Institution, Meeting, Front matter, Formula, Letter, Recipient, Sender…

We then spent time talking through the content concepts, to turn them into a first attempt at a model.

1-DSCF3273

We began to discuss the platform’s viewpoint. What takes centre stage in the model? What is the site about? It needs to present the digitised archive material, but in doing that what (if anything) does it reflect of the existing archival hierarchy? You can already search CALM and see results (but no digitised content). We decided that the platform is not an alternative front-end to CALM with added digitised content. The archival hierarchy may not be helpful to users when the archive is only partially digitised, and probably shouldn’t play any significant role in the platform. Having said that, there may be times when text from the archival description is worth showing in the platform.

“Journal Article” and “Journal” appeared as key concepts in the domain model, but they are not content we will show directly. We link to them on an external site. A link to a journal article from the site doesn’t necessarily guarantee a journey back.

Person emerged strongly as a coordinating concept, more so than other classes of topic. Unlike other topics, people have specific types of relationship to content in the form of Roles that are often different over time. Huxley is the author of one manuscript and the referee of another.

An archival item has relationships to topics. An archival item has one or more images. It also has other content – transcripts, commentary, other annotations, and will invite more from users. The individual image content of an item also has some of these relationships. The entire archival item may be connected to a person by authorship, but one image within it might be connected to a person because it depicts a particular Arctic explorer. Some items have relationships to other items, especially for correspondence.

model

Another conversation was around case studies. These are the special “hero” content identified by the Society as important areas for the site to promote in the pilot. But what is a case study? Is it a special kind of content? Or is it just a regular topic page that has been embellished, customised, extended with extra content or functionality? Can any topic page become a case study just by devoting some extra editorial or development attention to it? Is an item page ever a case study?

The conclusion we came to was that (for now at least) any topic page is a potential case study, it’s not a special separate kind of entity. And they are always topic pages, never item pages.

By lunchtime we had reached a model that we felt was a useful starting point for the platform. We were getting hungry!

After lunch we did another exercise, divided into three parts. We each chose a particular page or aspect of the site from one of:

  • The presentation of an archival item
  • A topic page
  • The platform home page

We then had 20 minutes to write notes on how we would implement a user interface. Diego asked us to consider points such as every landing page is a home page and almost all things are really collections of other things.

6-DSCF3279

Then we had a frenetic interlude in which we each folded a piece of A4 paper three times to make 8 panels. Diego gave us 8 minutes to fill these panels with UI ideas for our chosen area, looking at whole pages, or particular details of the interface. This is a divergent design activity know as Crazy 8s that aims at exploring alternative solutions to a problem in a very short period of time.

Having warmed up our sketching muscles, we then spent a further 30 minutes individually producing a more detailed treatment, the solution sketch:

3-DSCF33034-DSCF3304

We then had a review of everyone’s contribution in detail, by each having a good look at all the sketches. Before asking the authors of the sketches to reveal themselves, Diego gave his interpretation of each of the designs, so that the author could hear someone else’s understanding of the presented user interface.

It’s quite tricky to keep your identity secret while someone else is talking about your work, but we mostly succeeded. Once we had done all this for all the designs, we had a further session talking through them all as a group and making sure we understood them.

Then, the voting:

5-DSCF3309

Each of us had 24 stickers that we could allocate to any designs, or particular features of the designs, as often as we felt. It was important to do this after the discussion rather than before.

We can now take the domain model and the expression of that domain model in the most popular UI treatments and create some wireframes. Part 2 of the workshop will be on Thursday, when we will get together again to look at the wireframes and see if they work for everyone.

The images

We want to get the digitised archival material available in IIIF form as soon as possible, so that it is available as we develop the model and user interface further. But that’s the subject of another post…