Paper 1: Stylometric Techniques for Multiple Author Clustering
Abstract: In 1598-99 printer, William Jaggard named Shakespeare as the sole author of The Passionate Pilgrim even though Jaggard chose a number of non-Shakespearian poems in the volume. Using a neurolinguistics approach to authorship identification, a four-feature technique, RPAS, is used to convert the 21 poems in The Passionate Pilgrim into a multi-dimensional vector. Three complementary analytical techniques are applied to cluster the data and reduce single technique bias before an alternate method, seriation, is used to measure the distances between clusters and test the strength of the connections. The multivariate techniques are found to be robust and able to allocate nine of the 12 unknown poems to Shakespeare. The authorship of one of the Barnfield poems is questioned, and analysis highlights that others are collaborations or works of yet to be acknowledged poets. It is possible that as many as 15 poems were Shakespeare’s and at least five poets were not acknowledged.
Keywords: Authorship Identification; Principal Component Analysis; Linear Discriminant Analysis; Vector Space Method; Seriation