Institute of Historial Research Digital History Postgraduate Seminar: Information Extraction and Entity Linkage in Historical Crime Records (2020)
Abstract: This research seeks to discover and invent methodology to parse en-masse a subset of the British Library Newspapers set of digitised newspapers: crime reports in nineteenth century London. The goal of this research is to corroborate and augment the existing information in the Digital Panopticon, by using newspaper reports of police court hearings to shed light on the criminal justice processes that took place before a case made it to the Old Bailey, giving historians structured access to a valuable additional source of crime data. This presentation will cover the project’s progress so far, from initial named entity recognition and entity linkage experimentation, and the research currently being carried out to help alleviate some of the pitfalls of these processes.
Undergraduate Dissertation: “Triple Scoring: Scoring and ranking the truth of factual triples” (2018)
Abstract: In a lot of cases, what we consider a fact essentially boils down to “x is y”. Facts such as “the sky is blue”, “space is cold”, “the universe is huge”, all essentially follow this format: subject-relation-object. This is advantageous, as it allows us to represent facts in a very mathematical way, one which can be parsed by a computer. This project aims to create a model that can assign a numeric truth score to a type-like relation triple. These triples take the form of (subject, relation, object), and can be used to represent a fact. In this project, a triple scoring model is designed for the WSDM Cup 2017 Triple Scoring task. The model utilises large corpora, and performs natural language processing and information retrieval techniques to corroborate and rank facts, culminating in a model that achieves an 8th place result out of a pool of 21 solutions.