Marta Cichoń
Library Data Visibility and Re-use: Possibilities Emerging from the National Library Descriptors Project

CURRENT DATA TRENDS

Management of information and knowledge has been transformed in recent decades. In addition to the shift towards digital management of
information, movements adopting and advocating open approaches to share these digital resources have been emerging. At this point it seems commonly understood that all groups of information consumers are interested in the instrumental value of open access to data, but open data is just one aspect of the “data revolutions” taking place nowadays.1 Currently, Internet datasets are created on a large scale, often in a non-standard way. Additionally, the development of IT infrastructure allows for their arbitrary expansion. Most of the data collected in this way constitute open and widely available datasets, which become priceless sources of information for further processing and analysis.2 Precise processing of large datasets is considered a major challenge in information management and data analysis, as well as in related areas. Throughout the greater part of human history, only small portions of data could be analysed due to lack of appropriate tools that would allow the acquisition, organisation, storage and processing of information effectively. Despite an enormous shift in the approach to data collection and distribution, there are still lots of legacy practices in this field, resulting from past practices and institutional
structures that assume that access to information is limited.3

While access to data still needs to be broadened, achieving immense benefits from this information requires combining data from different
sources – often from organisations that have no history of sharing data at scale. In the era of “Big Data” the challenge lies not only in processing large-scale sources of data, but also in shifting the paradigm of data collection, taking into consideration the possibility of acquiring and integrating multiple types of data from various, often very different sources. Connecting diverse sources of data in order to achieve synergy effects leading to the production of new information and, consequently, new knowledge, is already a recognised challenge for the immediate future. Linking “traditional” data sources, such as public and research data, with new sources of data, such as various web services, might be a unique opportunity for complex exploration of social and cultural behaviours and newly emerging phenomena. While the benefits of such synergy are most present in the social sciences, they are also definitely feasible in most other areas of research. Nevertheless, to take full advantage of the prospectively linked data sources, some difficulties still need to be overcome.4
Still, although we continue to be constrained by limited resources, profiting from all the available data is already reasonable and viable in a growing number of cases in domains where it has not been feasible so far. Often, the latent value of information might be discovered only by linking one dataset with another one, as already mentioned, even when they seem entirely divergent at first sight. Such an approach allows for the creation of innovative solutions based on blending data in novel ways.5

Researchers are nowadays overwhelmed by vast amounts of information. This information can come from many distributed sources, and in
many cases it is far beyond what we can deal with on our own. As a result, there is an increasing demand for automated and semi-automated systems that sort through and assimilate this informational excess, allowing for further re-use by machines or people. The high-level goal is to create an assimilator that would act as an intermediary between humans and information. The assimilator would get queries from a human and then gather information from all relevant sources by culling through it as accurately as possible. Such a system would combine all that bears usefully on what the human wants to know, and provide the human with a coherent solution that corresponds to their intent.6 Currently available technology allows for accessing databases through specialised Web interfaces. Still, more and more researchers want to use collections as a whole, mining and organising the information in alternative ways.7

In the present world of data, the sum of information is more valuable than any of its parts, and the same rule may be applied to linked datasets. Nowadays, Internet users are already familiar with mashup web services, which are basically web sites presenting information from at least two or more sources in an innovative way, often displaying data in a visual form, making it even more accessible to the broader public.


1  T. Davies, D. Edwards, ‘Emerging Implications of Open and Linked Data for Knowledge Sharing in Development’, IDS Bulletin, vol. 43 (5), 2012, pp. 117–127, http://dx.doi.org/10.1111/ j.1759-5436.2012.00372.x [access: 23/03/2016].
2  Internet: publiczne bazy danych i Big data [Internet: public databases and Big data], ed. G. Szpor, Warszawa 2014, p. 52.
3  V. Mayer-Schönberger, K. Cukier, Big data: rewolucja, która zmieni nasze myślenie, pracę i życie [Big data: a revolution that will transform how we live, work and think], Warszawa 2014, p. 36.
4  Internet: publiczne bazy danych i Big data, op. cit., p. 55.

5  V. Mayer-Schönberger, K. Cukier, op. cit., pp. 55, 144.
6  H. Haidarian Shahri, On the Foundations of Data Interoperability and Semantic Search on the Web, dissertation, University of Maryland, 2011, p. 9, http://hdl.handle.net/1903/11798 [access: 23/03/2016].
7  L. Johnston, ‘Digital Collections as Big Data’, Digital Preservation Meeting, Library of Congress, 2012, www.digitalpreservation.gov/meetings/documents/ndiipp12/BigData_Johnston_DP12.pdf [access: 23/03/2016].