-
Using Dimensionality Reduction and Tag Parameter Spaces to Study Historical Change in a Large Document Archive
- Author(s):
- Tim Hitchcock, William J Turkel (see profile)
- Date:
- 2021
- Group(s):
- CSDH-SCHN 2021: Making the Network
- Subject(s):
- History, Databases, Machine learning, Crime, Punishment
- Item Type:
- Presentation
- Meeting Title:
- CSDH/SCHN Conference 2021
- Meeting Org.:
- Canadian Society for Digital Humanities (CSDH/SCHN)
- Meeting Loc.:
- Remote, hosted from Edmonton, AB
- Meeting Date:
- May 30 – June 3, 2021
- Tag(s):
- Old Bailey Online, Text linguistics, Historical databases, Representation, Crime and punishment
- Permanent URL:
- http://dx.doi.org/10.17613/6vva-km27
- Abstract:
- In this presentation we discuss one approach to studying historical change in a large document archive, The Old Bailey Proceedings Online. In addition to the texts themselves, we are working with two kinds of representation. The first is a set of XML tags that were added to the trial accounts when the digital archive was created. Since these tags were drawn from small finite sets, we can think of them as dimensions that can be used to categorize each trial in a tag parameter space. The second is a dimension reduction technique, Stable Random Projections (Schmidt 2018). Each SRP is a small sketch, or fingerprint, of a given trial, and each trial can be located in a space of SRPs. We are using SRPs in conjunction with the parameter space created by the XML tags to assess the representativeness of trials in particular periods of time and to identify outliers and anomalies. As Schmidt showed in his own examples, clusters in SRP space occur at a variety of scales, and can often be mapped onto classifications that are meaningful to human observers (e.g., as represented by the XML tags).
- Metadata:
- xml
- Status:
- Published
- Last Updated:
- 2 years ago
- License:
- All Rights Reserved
- Share this:
Downloads
Item Name: hitchcock-turkel-csdh-2021.pdf
Download View in browser Activity: Downloads: 68
-
Using Dimensionality Reduction and Tag Parameter Spaces to Study Historical Change in a Large Document Archive