UK Web Archive blog

Introduction

News and views from the British Library’s web archiving team and guests. Posts about the public UK Web Archive, and since April 2013, about web archiving as part as non-print legal deposit. Editor-in-chief: Jason Webber. Read more

20 June 2025

RESAW 2025: Report from UK Web Archive Colleagues

RESAW 2025 Conference Banner
RESAW 2025 Conference Banner

Introduction

The RESAW (Research Infrastructure for the Study of Archived Web) 2025 conference took place at the University of Siegen in Germany. It was organized by the Collaborative Research Centre 1187 “Media of Cooperation” at the University of Siegen in cooperation with the Centre for Contemporary and Digital History (C²DH) at the University of Luxembourg.

This was a special conference as the organisers of this, past and and future conferences had a special presentation (it included cake and balloons) to mark ten years since the first RESAW conference was held in Aarhus, Denmark. They all paid tribute to Niels Brügger from Aarhus University who founded RESAW and helped develop the RESAW community.

The conference theme, “The Datafied Web” explored this theme from a historical perspective. The call for papers stated that “we would like to explore the historical roots, trends, and trajectories that shaped the data-driven paradigm in web development and to examine the genealogies of the datafied and metrified web”. The opening panel discussion aimed to define what is meant by “the datified web”.

UK Web Archive colleagues from Bodleian Libraries, the British Library and National Library of Scotland attended the Web Archiving Conference. There was a packed programme with a variety of presentation forms and workshops that shared best practices and innovative projects in the world of web archiving. In this blog post they report highlights of their conference experience.

Reflections

Helena Byrne - Curator of Web Archives - British Library 

I was part of the panel called Web archives practices along with colleagues from the Portuguese and Belgian web archive. My presentation, Lessons learnt from preparing collections as data: the UK Web Archive experience, gave an overview of the project that spanned from October 2022 to November 2024 to develop a framework for publishing UK Web Archive curated collections as data

There were so many great presentations and panels at this conference that it is hard to just pick one highlight. The opening panel discussion defining “the datified web” raised lots of interesting points. In this panel Anne Helmond made the important point that “while the front-end of the web has changed dramatically, the back-end has undergone a deeper transformation” and the study of the web requires a mix of methodologies and resources. Another session that stood out was the panel on Past Metrics. We were reminded in this session about the visitor counters that used to be popular on early versions of websites. This was especially poignant as just a few days before this presentation I received an enquiry about a website and when I used the Memento Time Travel search function to view if any other web archive’s held a copy of it. I found one copy from its earlier years. This version had a prominent visitor counter and evoked a nostalgic response as I’d realised I hadn’t seen one for many years and had forgotten about this feature.

Beatrice Cannelli - Curatorial and Policy Research Officer (Algorithmic Archive Project) - Bodleian Libraries

At this year’s RESAW conference, my colleague Pierre Marshall and I organised a workshop titled “Towards an ‘Algorithmic Archive’: Developing Collaborative Approaches to Persistent Social and Algorithmic Data Services for Researchers”. The workshop brought together diverse perspectives from practitioners and researchers working with social media data, fostering discussions regarding the development of sustainable strategies to collect social media platforms. The workshop was a valuable opportunity to gather insights for the Algorithmic Archive project, particularly regarding issues and expectations related to short- and long-term access to social media data. 

Among the many engaging sessions, I found the one on “the challenges of archival practices” particularly interesting. Using the case of the web archive at the Aix-Marseille University, the panellists underscored the importance of encouraging critical engagement with issues researchers face, such as data ethics, data surveillance and archival responsibility, especially when dealing with potentially sensitive web archived data. Similarly, the panel of “Data Regimes” reflected on the complexity of data stewardship, where open data policies often clash with ethical concerns, especially when dealing with sensitive content like social media data. This often leaves researchers and librarians to navigate these grey areas without clear guidance, raising questions about reuse and long-term preservation.

Pierre Marshall - Technical Research Officer (Algorithmic Archive Project) - Bodleian Libraries

Vasco Rato gave an overview of arquivo.pt’s API. Arquivo.pt runs a CDX(J) server, and about half of the traffic to the archive comes from the API. Rato mentioned that sometimes people _ask_ for WARCs, but what they really want is just the text or media content of a page. It would be a better user experience to provide text or image search directly through the API. The CDX(J) server also helps anyone wanting to page through the archive without downloading the whole thing. Most researchers don't have the capacity to store and process 1.5PB of WARC files.

Helge Holzmann of the Internet Archive ran a workshop on the Archives Research Compute Hub (ARCH) service. Holzmann talked us through a series of recipes for the ArchiveSpark library, intended to make it easier for researchers to run data-centric queries against items in the Internet Archive. Besides the content of the workshop, I appreciated Holzmann's use of 2000s-era retro web graphics to illustrate his presentation. We are all here for the datafied web, but beyond the data I'm happy to celebrate the art of the early web.

The BnF also presented their Skyblogs collection, including work on parsing the page markup (back) into a data model for analysis across the corpus.

The common theme I took from these sessions is that there's a lot to learn from making large web datasets usefully available to academics. Hopefully next year Beatrice and I will be back with some examples of what internet researchers could do with our planned social media archive.

Andrea Kocsis - Chancellor’s Fellow in Humanities Informatics, University of Edinburgh/ The National Librarian’s Fellow in Digital Scholarship 2024-45, The National Library of Scotland

I was glad to present our work on web archive engagement with Leontien Talboom, where we discussed how to support not only traditional readers and computational users, but also the digitally curious who often fall between categories. I also shared a glimpse into the creative process behind Digital Ghosts, the web archive exhibition I’m currently developing with artist Dorsey Kaufmann and the National Library of Scotland, which will take place in November at Inspace in Edinburgh.

One of the talks that stayed with me was Ian Milligan’s reflection on the ethical challenges of crowdsourced digital archives in the context of 9/11. I plan to bring this ethical dilemma of accessibility, metadata, and data protection into my teaching next year in Future Libraries and Archives at the Edinburgh Futures Institute. The most inspiring talk for me, though, was Nanna Bonde Thylstrup’s keynote on data loss. Her interdisciplinary framing - drawing equally from humanities, sociology, and STEM - challenged the usual discourse of data loss as an evolutionary narrative and instead reframed it as a question of digital politics and infrastructure. Overall, RESAW was inspiring both intellectually and as a generous, thoughtful community of dedicated netpreservers.

Conclusion

Attending the RESAW conference is a great opportunity to exchange ideas, learn about innovative research projects, and foster collaborations in the field of web archive studies. The UK Web Archive colleagues contributed significantly through presentations and active participation in other sessions. Participation at conferences in this manner supports the recognition and reuse of the UK Web Archive collections as a significant resource in the wider academic discourse on web archiving. We look forward to participating in the next edition of the conference which will take place in June 2027 at the University of Groningen, the Centre for Media and Journalism Studies & Centre for Digital Humanities. The theme for 2027 is “Engaging Public Internet Histories: New Ways of Telling the Story of & with the Web”. So keep an eye out for the call for papers for the seventh RESAW conference in 2026.

12 May 2025

IIPC Web Archiving Conference 2025: Report from UK Web Archive Colleagues

 

IIPC GA & WAC 2025 Banner
IIPC GA & WAC 2025 Banner

 

Introduction

This year’s IIPC General Assembly and Web Archiving Conference took place at the National Library of Norway in Oslo

Many UK Web Archive colleagues from Bodleian Libraries, the British Library, Cambridge University Library and National Library of Scotland attended the Web Archiving Conference both as delegates and presenters. There was a packed programme with a variety of presentation forms and workshops that shared best practices and innovative projects in the world of web archiving. In this blog post they report highlights of their conference experience.

Reflections

Leontien Talboom – Technical Analyst - Cambridge University Libraries

This was my third time attending the WAC conference, but my first time visiting Oslo. It was great to reconnect with colleagues and hear about the range of projects currently happening across the community.

I found the update from Chris Royds and Tom Storrar on the UKGWA particularly interesting, especially their work on using Retrieval-Augmented Generation (RAG) to take into account takedown policy processes. The Poster Slam session also provided a good overview of the diverse work taking place in the field.

Together with Andrea Kocsis, I presented some of our recent work on improving access to web archives for different types of users, including readers, the digitally curious, and data users. This builds on previous work, and it was useful to share it in the context of web archives. We’ve also recently published an article on this, which is available here.

Overall, it was a valuable experience, and I appreciated the chance to hear from others and share some of our own work.

Andrea Kocsis - Fellow - National Library of Scotland

Our work covered how user research segmentation in web archives can reshape the way we engage with digital collections. Our talk focused on the power of metadata to create more intuitive and accessible experiences for different audiences. For digital researchers, we highlighted the potential of datasheets for datasets via the case study of the Archive of Tomorrow project, while for the digitally curious, we suggested using Jupyter notebooks with pre-processed enhanced metadata to make exploration easier, introducing the outcomes of The National Librarian’s Research Fellowship in Digital Scholarship 2024-25. For the general reader, we discussed the role of storytelling in turning web archives into something more than just data or collection. We also had the exciting opportunity to announce the “Digital Ghosts - Exploring Scotland’s Heritage on the Web” exhibition we are curating in November 2025 in Edinburgh, bringing together tactile artwork and Scottish web heritage in a fresh, dynamic way. The discussions we had about building inclusive, user-focused web archives were energising and reaffirmed how essential accessibility is for the future of these collections.

Eilidh MacGlone - Web Archivist - National Library of Scotland

The IIPC General Assembly and the conference in Oslo was an opportunity to think again about how the National Library of Scotland contributes to the consortium and the benefit we gain from our membership. IIPC’s events, some available to the public, are a key international membership body for web archiving and a key collecting area for us. Asking questions of the people who maintain tools I use (and recommend to the public!) is something I really value, along with the ability to meet and make plans for better services (watch this space!). A high point was being in the audience for Dr Andrea Kocsis talk, who was the Librarian’s Scholar this year. She presented work to enhance data originally created by my Collections and Research colleague, Trevor Thomson, aiming to help researchers discover content at scale, within the legal deposit environment. I am excited to experience the exhibition, which will physically express some of what we collect, with the artist Dorsey Bromwell Kaufmann at the Being Human Festival held in Edinburgh later this year.

Beatrice Cannelli - Curatorial and Policy Research Officer - Bodleian Libraries

This was my second time attending the IIPC WAC Conference, and once again, it was a fantastic opportunity to connect with colleagues from around the world and gain insights into current developments in the field.

At this year’s conference, I had the pleasure of participating as a speaker in the panel titled Beyond Preservation: Engaging Audiences and Researchers with Web Archives, organised by Eveline Vlassenroot, Peter Mechant, Friedel Geeraert, and Christina Vandendyck. Together with my fellow panellists—Cui Cui, Andrea Kocsis, and Anders Klindt Myrvoll—we explored how web archives can better engage with a broad range of users. Through case studies and collaborative initiatives, we highlighted effective ways in which archives are fostering connections with researchers, communities, and the wider public. The panel sparked valuable discussion on how web archives can enable innovative research methodologies and promote greater public involvement.

Given my particular interest in social media archiving, it is no surprise that one of the sessions that I particularly enjoyed was Curating Social Media. This session offered a rich overview of projects and initiatives in this area, featuring presentations from the British Library, the National Library of Singapore, the National Library and National Archives of Luxembourg, and the National Archives of the Netherlands. I left the session inspired by the diversity of approaches and full of new ideas and perspectives, many of which will certainly be considered in the context of the Algorithmic Archive project I’m currently working on at the Bodleian Libraries.

Gil Hoggarth - Web Archive Technical Lead - British Library

After an earlier potential weather warning, the Oslo conference was held in the National Library of Norway's main building in both nice weather and a warm welcome! It was great to hear the presentations, short talks and general conversation from the Web Archiving community on a wide range of topics - and to catch up with our previous Technical Lead, Andy Jackson. The progress made (or at least in development) by numerous institutions was impressive, from the ever-present quality assurance investigations and technical workshops, to new approaches and new large scale projects - including the host's Building a Research Infrastructure for the Norwegian Web Archive programme. I presented an overview of the impact of the cyber-attack on the British Library and prompted people to consider such an awful event as likely to change an institution's culture as well as its technology. The event ended with a thought provoking insight into how web data can be used by AI to identify public debate in online forums.

Caylin Smith - Head of Digital Preservation - Cambridge University Libraries  

This WAC marked my third time attending the conference, and it’s continued to deliver valuable contributions to the web archiving discipline. I’m part of the Digital Preservation Coaltiion’s Carbon Footprint Task Group, so I attended the talks in the Sustainability session. All of the speakers provided helpful guidance and resources for how to take a sustainable approach to capturing online content and providing access. At CUL, my colleagues and I are factoring the carbon footprint for digital services into the new services we’re setting up for the libraries’ digital collections. The Curating Social Media session was full of useful lessons learned for archiving social media accounts, including government officials and the general public.

Cui Cui - PhD Researcher/ Customer services and Circulation Librarian - University of Sheffield/ Bodleian Libraries University of Oxford

I have been working on participatory web archiving practices for over 5 years as a part time research student, and attending IIPC conference always marks a milestone in my research journey. This year is particularly important as I shared the preliminary findings from interviews with web archivists, researchers and community members. I feel honoured to be invited to a discussion panel to exchange ideas with colleagues and audience, from which I learned so much about archivists’ aspirations and practices. Receiving feedback and listening to practical challenges shared by field experts was incredibly valuable and encouraging. Although I am not a web archivist myself, I could genuinely feel a sense of belonging within the community! I returned feeling inspired and energised, with fresh perspectives and renewed motivation to continue my journey, despite sometimes feeling I have taken too long to complete my research! 

The conference featured numerous high-quality presentations, which I believe are valuable to other professionals. Some practices were innovative and highlighted unique web archiving practices that could be also applicable to other fields of library and archive professions. The closing keynote, Quantifying Complexity: Using Web Data to Decode Online Public Debate, has showcased how web data can be essential in understanding public discourse. It also addressed how marginalised communities could be “silenced” in online debates. The web sphere is a complex space, as I pointed out in my presentation, and it brings another layer of challenge when web archivists work toward a more diverse and representative collection development policy, 

Helena Byrne - Curator of Web Archives - British Library

This year I presented a summary of the National Olympic and Paralympic Committees as well as the 2024 Summer Olympic and Paralympic Games collections at the IIPC General Assembly. The General Assembly was on Tuesday 8th April and the main conference was held on Wednesday 9th and Thursday 10th. On day one of the conference I co-facilitated a workshop on Web Archive Collections as Data. This workshop is part of a series of workshops to gather insights into what support is needed to be able to apply the Glam Labs Collections as Data Checklist to web archive content. The first of these workshops was held at DHNB 2025.

As always there were so many good presentations at the conference and lots of corridor conversations that could lead to future collaborative projects. I chaired the Lightning Talk Session #3. This was a great mix of projects ranging from evaluating web archive workflows and addressing English language bias in tools. The last presentation in this session was “What you see no one saw”. This project aims to capture the diversity of web experiences, particularly in relation to web-based advertisements. It is really important that web archives can reflect the diversity of experiences that different people have on the web. However, the project is funded by IMLS and they had the funding withdrawn in the recent restructure of government funding in the US, so it will be interesting to see how it can progress. 

Nicola Bingham - Lead Curator of Web Archives - British Library 

Last month, I had the pleasure of attending my 11th IIPC Web Archiving Conference, hosted this year by the National Library of Norway in Oslo. This was my first time in Norway—and what a fantastic setting it was for such a dynamic and engaging event.

This year’s conference was particularly meaningful for me as I chaired my final session as co-chair of the IIPC’s Content Development Group (CDG), a role I’ve held since 2018. It’s been an incredibly rewarding experience, and although I’m stepping down from the position, I’ll still be involved—after all, no one really retires from the CDG! The group is in excellent hands, with Shereen Tay (National Library of Singapore), Anaïs Crinière-Boizet (Bibliothèque nationale de France), and Melissa Wertheimer (Library of Congress) taking the reins as co-chairs.

I also had the opportunity to present alongside our British Library colleague Jennie Grimshaw in a session titled Innovative Web Archiving Amid Crisis: Leveraging Browsertrix and Hybrid Working Models to Capture the UK General Election 2024. We shared our experience of using a hybrid model to archive the upcoming general election—marking a milestone as it was the first time we used the Browsertrix tool to capture social media content.

The conference was, as always, a space of learning, collaboration, and inspiration. I’m grateful for the opportunity to contribute, to reflect on my time with the CDG, and to look ahead to the evolving landscape of web archiving.

Conclusion

The IIPC General Assembly and Web Archiving Conference 2025 met the high standards set at previous conferences. It is a great opportunity to exchange ideas, learn about innovative projects, and foster collaborations in the field of web archiving. The UK Web Archive colleagues contributed significantly through presentations and active participation. 

08 May 2025

Marking 80 Years: Documenting VE and VJ Day Commemoration in the UK Web Archive

By Nicola Bingham, Lead Curator of Web Archives, British Library

Home page of the ve-vjday80.gov.uk website
Home page of the ve-vjday80.gov.uk website

This year marks a significant national milestone: the 80th anniversary of the end of the Second World War. With Victory in Europe (VE) Day falling on 8th May and Victory over Japan (VJ) Day on 15th August, commemorations are planned across the UK to honour the conclusion of a conflict that reshaped the world.

To document this anniversary, the UK Web Archive is curating a special collection titled "VE / VJ Day 80", which will record how people and communities across the UK are commemorating the end of WWII, from national ceremonies to local grassroots events.

Collection Scope

This curated collection focuses on UK-based websites documenting commemorative events, public activities, and community involvement related to VE/VJ Day 80. Rather than a detailed historical retrospective, the collection aims to reflect contemporary responses and engagement with this anniversary.

Key Aspects of UK Commemorations

The collection includes a wide variety of commemorative themes and activities such as:

· National Events: Organised by groups like the Royal British Legion, including parades and memorials.

· Local Celebrations: Street parties, community gatherings, and regional events.

· Church Services: Remembrance services held nationwide.

· Beacon Lighting: Symbolic ceremonies at dusk.

· Remembrance Readings: Recitals of "The Tribute" and similar dedications.

· Veteran Involvement: Honouring the voices and presence of those who served.

· Contrasting voices or critical perspectives of the commemorations.

Why We Are Archiving This

By collecting these websites now, we’re creating a rich and enduring resource for future researchers, historians, educators, and the general public. This collection will preserve not only official narratives but also grassroots and personal perspectives, reflecting the diversity of the UK’s commemorative landscape.

One recent example of how the UK Web supports research is the work of Dr Liam Markey, whose blog post, published earlier this week, describes how he has used archived web content.

Between 2018 and 2023, Liam completed a PhD at the University of Liverpool in collaboration with the British Library, examining how remembrance practices in Britain, particularly the concept of military victimhood, shape national identity and reflect militaristic thinking. His work highlights the value of digital resources like the UK Web Archive in documenting contemporary remembrance culture.

How You Can Contribute

We welcome nominations of websites, blogs, and social media accounts that reflect VE/VJ Day 80 commemorations and perspectives.

Are you organising a public or community event?

Are you sharing your thoughts or experiences online?

If so, we’d love to hear from you.

Please email your suggestions to: [email protected] 

Although the UK Web Archive website is currently offline, our team is actively capturing web content using remotely hosted systems, ensuring this material is preserved for the future.

Here are a few examples of sites already being archived:

Royal British Legion – Remembering the End of WWII (https://www.britishlegion.org.uk/getinvolved/events/remembranceevents/rememberingtheendofthesecondworldwar)

VE Day 80 Community Events (https://www.veday80.org.uk/)

VE/VJ Day 80 (https://ve-vjday80.gov.uk/)

English Cathedrals – VE Day Services (https://www.englishcathedrals.co.uk/latestnews/veday808thmay2025asharedmomentofcelebration/)

Breckland Council – Remembrance Grants & Readings (https://www.breckland.gov.uk/article/24080/VEVJDay80AnniversaryGrants)

Royal Navy – WWII Veterans’ Stories (https://www.royalnavy.mod.uk/news/2025/january/06/20250106ww2veteransurgedtocomeforwardtomark80thanniversary)

Beacon Lighting Guide (Glinton Parish Council) (https://glintonpc.gov.uk/wpcontent/uploads/2024/07/VEDay80AnniversaryGuidev19.pdf)

VE Day Blog Posts from the British Library

This is one of multiple blog posts being published across the British Library blogs this week:

UK Web Archive: https://blogs.bl.uk/webarchive/2025/05/digital-memory-and-the-militarised-past.html 

European Studies: https://blogs.bl.uk/european/2025/05/remembering-sacrifice-celebrating-freedom.html

Newsroom: https://blogs.bl.uk/thenewsroom/2025/04/ve-day-in-the-news.html 

Social Science: https://blogs.bl.uk/socialscience/2025/05/ve-day-voices-from-history-.html 

Untold Lives: https://blogs.bl.uk/untoldlives/2025/04/children-in-war-time.html 

OSZAR »