Marta Cichoń
Library Data Visibility and Re-use: Possibilities Emerging from the National Library Descriptors Project

One of the methods of data re-use is rearranging and restructuring it as if designing it from scratch in order to enable data extension, which makes it suitable for further and multi-way re-use.8 This approach has been adopted during the design phase of the National Library Descriptors project. How exactly it might influence the re-use of the corresponding data will be explained further.

As modern science continues its exponential growth in complexity and scope, the need for more collaboration among scientists at different institutions in various subareas and across scientific disciplines is becoming increasingly important. Researchers working at one level of analysis may need to find and explore results from another level, another part of the field, or from a completely different scientific area.9 One of the difficulties yet to be overcome for providing better access to these results is legacy systems with incompatible standards and formats, which often prevent the integration of data. The implementation of successive technologies over decades has scattered the metadata of libraries, archives and museums across multiple databases, spreadsheets, and even unstructured word processing documents. For business continuity reasons, legacy and newly introduced technologies often coexist in parallel. Even in cases where a superseded technology is completely abandoned, relics of the former tool can often be found in the content, which has been migrated to the new application.10 It has been a truth generally acknowledged, and for quite a long time, that scientists are becoming increasingly reliant on Internet resources for supporting their research endeavours. In the search for a domain-specific Web site or a paper on a particular topic, web-engines can do a phenomenal job of sorting through billions of possibilities and identifying potentially useful search results. As a result, the Web has become indispensable for supporting traditional communication within various knowledge disciplines as well as serving the needs of scientists within their own disciplinary boundaries.11 However, there is still the “Invisible Web”, otherwise known as “Deep Web”, a term coined for searchable databases that consist of the component parts of the Web which are hard or impossible to find through prominent search engines and directories. While these items remain outside the scope of traditional Web search tools, they still reside “on” the Web. Generally, the Invisible Web is comprised  of records in databases, with most library databases among these. The dramatic term “invisible” is meant to underscore the importance of realising that there is more to the Web than a Meta-crawler search might reveal.12 The more these compartments of the Web are linked to existing open datasets in order to facilitate access to these compartments, the less invisible the resources become, and the richer and more productive our experience using the Web for research. In order to link data across the
Web we need to be able to interconnect data across independent islands. The word “islands” is used here to emphasise that each information system is modelled for its particular needs and application domain, resulting in systems that cannot shake hands with one another in an automated manner. Obviously, it is easy to embed a link in a collection database which points to the record of a similar object from another institution. But this requires knowledge of how to access the database of the other institution, what fields are used to describe the object. Once the record is found to which the resource can be linked, the URL needs to be embedded manually within the record of the original database. These actions cannot be performed reasonably for all of a library’s collections items. Therefore, there is a need to think about how the linking process can be automated.13 Libraries have amassed an enormous amount of machine-readable data about their own collections, both physical and electronic, over the past 50 years. However, this data is currently in proprietary formats understood only by the library community and is not easily reusable with other data stores or across the Web.14

The rise of the Web obliged libraries and other culture curating institutions to increase the pace of their standardisation efforts for metadata
schemes and controlled vocabularies, which were initiated after the use of databases for cataloguing and indexing in the 1970s and 1980s. At the same time, budget cuts and fast-growing collections are currently obliging information providers to explore automated methods to provide access to resources simply because libraries are now expected to obtain and provide more value out of the metadata patrimony they have been building up over decades. 


8  V. Mayer-Schönberger, K. Cukier, op. cit., p. 146.

9  J. Hendler, ‘Science and the Semantic Web’, Science, 24/01/2013, vol. 299, issue 5606, pp. 520–521, http://science.sciencemag.org/content/299/5606/520.full [access: 23/03/2016].

10  S. van Hooland, R. Verborgh, Linked Data for Libraries, Archives and Museum: How to clean, link and publish your metadata, London 2014, pp. 1–2.
11  J. Hendler, op. cit., p. 520.
12  K. R. Diaz, ‘The Invisible Web: Navigating the Web outside Traditional Search Engines’, Reference & User Services Quarterly, vol. 40, no. 2, 2000, pp. 131–134, http://kb.osu.edu/dspace/handle/1811/44703 [access: 23/03/2016].
13  S. van Hooland, R. Verborgh, op. cit., pp. 49–50.
14  M. Teets, M. Goldner, ‘Libraries’ Role in Curating and Exposing Big Data’, Future Internet, vol. 5 (3), 2013, pp. 429–438, www.mdpi.com/1999-5903/5/3/429 [access: 23/03/2016].