AI and libraries series: the age of artificially intelligent search

Libraries themselves are well-positioned to conceive, implement, and use new systems for the production of knowledge.

News Publications

Photo: LightFieldStudios/Evanto Elements.
Photo: LightFieldStudios/Evanto Elements.

For much of the second half of the twentieth century, libraries have been pioneers in the adoption of computerised search. As my colleagues described in our report Custodians and Midwives, developed in partnership with the National Library of Australia, even in the early days when search functionalities themselves were rudimentary, libraries sought their own implementations of electronic organisation and retrieval as ways to navigate existing collection catalogues. As early as 1954, experiments to computerize manual card catalogue systems began. New information management and classification techniques emerged as the result of libraries’ demands for cataloguing, acquisitions, and circulation to be interlocking systems. By the 1960s, electronic library records were coming ‘on-line’, becoming reviewable in close to real time.

Libraries have certainly benefited from the ways computing helped rationalise and render more efficient their indexing and storage. Old collections of card catalogues gave way gradually but irrevocably to electronic databases over the course of post-war computing development. Accessing books or other print material on the shelves no longer demanded manually pulling out small drawers arranged by author, title, and subject before flipping through decks of index cards until information on the right item was located. The new process promised faster and more precise retrieval. One British library reported improvement in the accuracy of author information from 75 per cent to over 90 per cent as they moved to an automated system. The mid-century library was on its way to becoming what librarian Verner W. Clapp described as ‘an organisation seeking control over flows of information’. Librarians would in turn become experts on the mechanics of how items of the collection flowed through processes of storage, preservation, and retrieval. From this perspective, Clapp argued, a library collection is best understood as data points and library staff as engineers of incoming and outgoing data flows.

Yet, in many ways, these new electronic search and management methods sacrificed serendipity for speed and accident for accuracy. The laboured process of finding books manually had meant chancing across some unrelated or unexpected work that could go on to inspire a more unusual perspective on things. With the efficiency of electronic retrieval, the exact or close matching of the search term to the information in the database substantially reduced the likelihood of these ‘fortuitous’ bibliographic encounters. The room for unstructured discovery shrank to getting lost among the shelves. As early as 1945, Vannevar Bush, then dreaming of the forerunner of today’s desktop computer, had imagined the “artificiality” of indexing for finding meaningful insights in large information stores. For Bush, early twentieth century indexing worked on principles of specificity and granularity, ordering items by strings of subclasses. In contrast, he suggested, the human mind works more associatively, drawing connections within a network of different ideas. Thus, an alternate approach to indexing would be to create “associate trails” as a means of organising information, a possible next step in the development of the ‘search engine’.

More than half a century on, computerised search itself has undergone its own transformation. If public-interest library institutions were at the forefront of early search, now private for-profit corporations are the advance guard. The worldwide web, with its petabytes of data, is radically less bounded than any modest library curation. In returning results at scale, predictive search engines now deploy artificial intelligence (specifically Large Language Models (LLMs)). As an experience, search now feels far less rigid than a slow batch processing through catalogue databases. Semantic search is the remit of these new algorithms, a term that echoes Bush’s deeper metaphor that machines might one day be equivalent to brains. AI search promises to ‘understand’, in the way that humans organically do, the implied, contextual, adjacent, possible, and unintended meanings of what we say to each other as well as our computers and databases.

What prospects, then, do these changes hold for libraries? Alongside the move towards statistical inference in search, the reality of books themselves being ‘digitised’ is the combined sea-change with which libraries must now contend. Machine-actionable collections (MAC), rather than merely indexes of those collections, are now the objects over which artificially intelligent search is set to crawl. Libraries are investing in three meta-types of MAC—digital or digitised collection items; metadata; and newly created data arising out of augmenting or analysing the first two data types. MAC today consists of both structured and unstructured datasets generated by OCR and machine transcription as text, images, maps, and metadata. What management of these new dynamic systems of storage and retrieval, preserving and lending, navigation and curation might be required?

In a speculative vignette developed in the Custodians and Midwives report, we proposed that future search capability might themselves be programmed to generate serendipitous encounters for library users. If, in 2020, the line between search and recommendation is already disappearing, it is perhaps not a significant leap in the imagination to suggest the arrival of ‘anticipation engines’ that ‘pre-empt rather than respond to inquiries’ in 2040. Anticipation engines, scanning reader’s body language and facial expressions and assessing them as data nodes and statistical preferences, might serve up titles that they were not looking for—opening up new and unexpected encounters and experiences. These experimental, non-commercially driven models of search would be the mainstay of public-interest institutions working to safeguard common access to and open production of knowledge.

Libraries themselves, lacking the deep pockets of industry and of state, are nevertheless well-positioned to conceive, implement, and use these new systems for the production of knowledge. Already, GLAM institutions are hosting fellowship programmes for researchers focused on their collections, and more are sharing their prototypes and exemplars of computational analysis—including topic modelling, machine-learning and visualisation tools—with the public. Standards, formats, and protocols for open data and data sharing are being agreed and implemented across the sector. The prospect of artificially intelligent search, with its statistically powered ‘understanding’ of which words are more likely to concatenate than others, has the potential to generate more sense-making rather than cross-referencing practices across ‘messy’ datasets. Libraries, acting in the public interest, might even commission LLM search engines that introduce an element of mathematical randomness into its outputs, so as to introduce back into the browsing experience that element of serendipitous discovery otherwise disabled by rigid organisational systems.

You are on Aboriginal land.

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

arrow-left bars search times