Marta Cichoń

LIBRARY DATA VISIBILITY AND RE-USE: POSSIBILITIES EMERGING FROM THE NATIONAL LIBRARY DESCRIPTORS PROJECT

Introduction

As the competitive advantage that can be achieved by increasing data visibility through services like Google or Bing gets more and more recognised, libraries – as service and data providers – are facing challenges similar to those of other organisations that want to benefit from big data trends. Within these trends, broadening the access to data and enabling the integration of information from multiple data sources are commonly understood necessities. Since researchers nowadays usually want to take advantage of the possibilities provided by aggregated resources, which in turn allow them to obtain the productivity (and often research results) in an incomparably more efficient way, the requirement to organise library data as semantically related datasets becomes more and more imperative. Achieving this goal requires aggregating and combining data from different sources. The problem, which many libraries are exposed to, is that the integration of their data is prevented by legacy systems and incompatible standards and formats. The National Library of Poland currently stores its data – like many other libraries – in proprietary formats such us MARC 21 understood generally by the library community and nobody beside it, which makes this data not easily reusable with other data stores accessible through the Web.

The National Library Descriptors project launched at the “National Library Descriptors” Conference on April 20–21, 2015, aims first and foremost at providing better data access by creating additional access points within the original National Library dataset of the Bibliographic Database. This aim is to be achieved by providing better data granularity and segmentation within this dataset stored in the MARC 21 Format by using some additional MARC 21 properties – as-yet unused (or even unavailable within the format) – in bibliographic records, such as “Audience Characteristics” (MARC 21 field 385), “Creator/Contributor Characteristics” (MARC 21 field 386) or “Time Period of Creation” (MARC 21 field 388). In the same way, in order to create additional access points to the authority data, a set of additional attributes has been defined based on the additional MARC 21 fields added to the MARC 21 Format by the Library of Congress, such as “Associated Place” (MARC 21 field 370), “Field of Activity” (MARC 21 field 372), “Associated Group” (MARC 21 field 373), “Occupation” (MARC 21 field 374), or “Gender” (MARC 21 field 375).

Looking ahead, the project implies not only providing and maintaining these additional access points within a library catalogue, but also using the newly added attributes as additional relations – previously unavailable due to the lack of authority controlled access points – between various entities stored within the database (such as Personal Names, Organisations, Geographic Names and Publications). Through simultaneous efforts toward developing the Linked Data model corresponding with the National Library’s set of bibliographic data and publishing this data as the open RDF dataset, it is currently regarded that these additional relations could in the future provide further advantages in combining the National Library dataset with other datasets available on the Web, as they can be relatively easily mapped to properties and classes defined in the commonly used Semantic Web ontologies, in comparison to the data expressed in the information retrieval language currently used by the National Library. They may also provide additional contexts for possible NER tool implementation that could be applied to the digitised content of the National Digital Library. For the very same reason, the National Library Descriptors project implies simplification of the controlled vocabulary used as the subject headings in the bibliographic database. It is understood that the transition to an effectively integrated dataset requires accessible data structures and modelling of the data within the RDF schema in order to enable interlinking with other data stores. The National Library Descriptors project is expected to provide a gradual approach to facilitate that result.