RESAW 2025: Report from UK Web Archive Colleagues
Introduction
The RESAW (Research Infrastructure for the Study of Archived Web) 2025 conference took place at the University of Siegen in Germany. It was organized by the Collaborative Research Centre 1187 “Media of Cooperation” at the University of Siegen in cooperation with the Centre for Contemporary and Digital History (C²DH) at the University of Luxembourg.
This was a special conference as the organisers of this, past and and future conferences had a special presentation (it included cake and balloons) to mark ten years since the first RESAW conference was held in Aarhus, Denmark. They all paid tribute to Niels Brügger from Aarhus University who founded RESAW and helped develop the RESAW community.
The conference theme, “The Datafied Web” explored this theme from a historical perspective. The call for papers stated that “we would like to explore the historical roots, trends, and trajectories that shaped the data-driven paradigm in web development and to examine the genealogies of the datafied and metrified web”. The opening panel discussion aimed to define what is meant by “the datified web”.
UK Web Archive colleagues from Bodleian Libraries, the British Library and National Library of Scotland attended the Web Archiving Conference. There was a packed programme with a variety of presentation forms and workshops that shared best practices and innovative projects in the world of web archiving. In this blog post they report highlights of their conference experience.
Reflections
Helena Byrne - Curator of Web Archives - British Library
I was part of the panel called Web archives practices along with colleagues from the Portuguese and Belgian web archive. My presentation, Lessons learnt from preparing collections as data: the UK Web Archive experience, gave an overview of the project that spanned from October 2022 to November 2024 to develop a framework for publishing UK Web Archive curated collections as data.
There were so many great presentations and panels at this conference that it is hard to just pick one highlight. The opening panel discussion defining “the datified web” raised lots of interesting points. In this panel Anne Helmond made the important point that “while the front-end of the web has changed dramatically, the back-end has undergone a deeper transformation” and the study of the web requires a mix of methodologies and resources. Another session that stood out was the panel on Past Metrics. We were reminded in this session about the visitor counters that used to be popular on early versions of websites. This was especially poignant as just a few days before this presentation I received an enquiry about a website and when I used the Memento Time Travel search function to view if any other web archive’s held a copy of it. I found one copy from its earlier years. This version had a prominent visitor counter and evoked a nostalgic response as I’d realised I hadn’t seen one for many years and had forgotten about this feature.
Beatrice Cannelli - Curatorial and Policy Research Officer (Algorithmic Archive Project) - Bodleian Libraries
At this year’s RESAW conference, my colleague Pierre Marshall and I organised a workshop titled “Towards an ‘Algorithmic Archive’: Developing Collaborative Approaches to Persistent Social and Algorithmic Data Services for Researchers”. The workshop brought together diverse perspectives from practitioners and researchers working with social media data, fostering discussions regarding the development of sustainable strategies to collect social media platforms. The workshop was a valuable opportunity to gather insights for the Algorithmic Archive project, particularly regarding issues and expectations related to short- and long-term access to social media data.
Among the many engaging sessions, I found the one on “the challenges of archival practices” particularly interesting. Using the case of the web archive at the Aix-Marseille University, the panellists underscored the importance of encouraging critical engagement with issues researchers face, such as data ethics, data surveillance and archival responsibility, especially when dealing with potentially sensitive web archived data. Similarly, the panel of “Data Regimes” reflected on the complexity of data stewardship, where open data policies often clash with ethical concerns, especially when dealing with sensitive content like social media data. This often leaves researchers and librarians to navigate these grey areas without clear guidance, raising questions about reuse and long-term preservation.
Pierre Marshall - Technical Research Officer (Algorithmic Archive Project) - Bodleian Libraries
Vasco Rato gave an overview of arquivo.pt’s API. Arquivo.pt runs a CDX(J) server, and about half of the traffic to the archive comes from the API. Rato mentioned that sometimes people _ask_ for WARCs, but what they really want is just the text or media content of a page. It would be a better user experience to provide text or image search directly through the API. The CDX(J) server also helps anyone wanting to page through the archive without downloading the whole thing. Most researchers don't have the capacity to store and process 1.5PB of WARC files.
Helge Holzmann of the Internet Archive ran a workshop on the Archives Research Compute Hub (ARCH) service. Holzmann talked us through a series of recipes for the ArchiveSpark library, intended to make it easier for researchers to run data-centric queries against items in the Internet Archive. Besides the content of the workshop, I appreciated Holzmann's use of 2000s-era retro web graphics to illustrate his presentation. We are all here for the datafied web, but beyond the data I'm happy to celebrate the art of the early web.
The BnF also presented their Skyblogs collection, including work on parsing the page markup (back) into a data model for analysis across the corpus.
The common theme I took from these sessions is that there's a lot to learn from making large web datasets usefully available to academics. Hopefully next year Beatrice and I will be back with some examples of what internet researchers could do with our planned social media archive.
Andrea Kocsis - Chancellor’s Fellow in Humanities Informatics, University of Edinburgh/ The National Librarian’s Fellow in Digital Scholarship 2024-45, The National Library of Scotland
I was glad to present our work on web archive engagement with Leontien Talboom, where we discussed how to support not only traditional readers and computational users, but also the digitally curious who often fall between categories. I also shared a glimpse into the creative process behind Digital Ghosts, the web archive exhibition I’m currently developing with artist Dorsey Kaufmann and the National Library of Scotland, which will take place in November at Inspace in Edinburgh.
One of the talks that stayed with me was Ian Milligan’s reflection on the ethical challenges of crowdsourced digital archives in the context of 9/11. I plan to bring this ethical dilemma of accessibility, metadata, and data protection into my teaching next year in Future Libraries and Archives at the Edinburgh Futures Institute. The most inspiring talk for me, though, was Nanna Bonde Thylstrup’s keynote on data loss. Her interdisciplinary framing - drawing equally from humanities, sociology, and STEM - challenged the usual discourse of data loss as an evolutionary narrative and instead reframed it as a question of digital politics and infrastructure. Overall, RESAW was inspiring both intellectually and as a generous, thoughtful community of dedicated netpreservers.
Conclusion
Attending the RESAW conference is a great opportunity to exchange ideas, learn about innovative research projects, and foster collaborations in the field of web archive studies. The UK Web Archive colleagues contributed significantly through presentations and active participation in other sessions. Participation at conferences in this manner supports the recognition and reuse of the UK Web Archive collections as a significant resource in the wider academic discourse on web archiving. We look forward to participating in the next edition of the conference which will take place in June 2027 at the University of Groningen, the Centre for Media and Journalism Studies & Centre for Digital Humanities. The theme for 2027 is “Engaging Public Internet Histories: New Ways of Telling the Story of & with the Web”. So keep an eye out for the call for papers for the seventh RESAW conference in 2026.