This is the first version of a larger project that I’ve been working on in my spare time to create a dataset of the guests, songs, books, and luxuries on the long-running BBC radio program Desert Island Discs (https://www.bbc.co.uk/programmes/b006qnmr, https://en.wikipedia.org/wiki/Desert_Island_Discs). Originally, I began with data gathered by webscraping with Python for the Guardian’s Datablog in November 2011 (https://www.theguardian.com/tv-and-radio/datablog/2010/nov/11/desert-island-discs-radio4).
However, I knew that that version of the dataset was incomplete and had errors. In addition to wanting to develop a more complete version of the dataset (encompassing as much info as possible from the eight decades that Desert Island Discs has been on the air), I wanted to add information that would make it possible to ask other sorts of questions, for example, about the gender balance of the program, and the different roles of people who have been invited onto the show.
To that end, I have modified the original Guardian spreadsheet by adding columns for gender, role, and for composer, as well as for performer. To create the information within these columns, I drew on info from the BBC Desert Island Discs website. In particular, for “role”, I was drawing on the short description used in the archives of the program by date (https://www.bbc.co.uk/programmes/b006qnmr/broadcasts/2001/12) — chiefly because the description (“Sue Lawley’s castaway is TV chef Jamie Oliver”) is what I have heard used on BBC 4 to announce and promote the program; so that label is an indication of what the BBC has seen as a legible and meaningful identity for each guest. It should be said that I see this as a dataset about the BBC program, and the choices its creators have made. It is certainly a better representation of the BBC and its choices, more than it is a complete representation of any of the guests themselves.