Identification of changes in literary writing style using machine learning

Authors

  • Germán Ríos-Toledo Centro Nacional de Investigación y Desarrollo Tecnológico - CENIDET (México)
  • Noé Alejandro Castro-Sánchez Centro Nacional de Investigación y Desarrollo Tecnológico - CENIDET (México)
  • Grigori Sidorov Instituto Politécnico Nacional - IPN (México)
  • Juan-Pablo Posadas-Durán Instituto Politécnico Nacional - IPN (México)

DOI:

https://doi.org/10.7764/onomazein.46.04

Keywords:

detection of style changes over time, n-grams, syntactic n-grams, vector space model, style change, machine learning

Abstract

This research aims to identify changes in the writing style over time of 7 authors of Englishspeaking novels. For each author, an organization of the novels was carried out according to the date of publication. The novels were classified in three stages called initial, intermediate and final; each stage contains 3 novels. Between two consecutive stages there are at least 2 years of separation between the publication dates of the novels. To solve the problem of detecting changes in writing style over time, it is proposed to use a supervised automatic learning-based approach. Vector space models were created from the frequencies of use of n-grams of different types and lengths. In addition, the algorithm of Principal Component Analysis (PCA) was used as the n-gram selection method. The solution was addressed as a classification problem using the Vector Support Machine algorithms (Support Vector Machine, SVM), Naive Bayes Multinomial (Multinomial Naive Bayes, MNB), Logistic Regression (LG) and Liblinear as classifiers. The metric to measure the efficiency of the learning algorithms was accuracy. The research showed significant changes in five of the authors with an average accuracy between 70% and 80% in the different types of n-grams.

Author Biographies

Germán Ríos-Toledo, Centro Nacional de Investigación y Desarrollo Tecnológico - CENIDET (México)

Tecnológico Nacional de México/Centro Nacional de Investigación y Desarrollo Tecnológico (CENIDET), México.  

 

Noé Alejandro Castro-Sánchez, Centro Nacional de Investigación y Desarrollo Tecnológico - CENIDET (México)

Tecnológico Nacional de México/Centro Nacional de Investigación y Desarrollo Tecnológico (CENIDET), México.

 

Grigori Sidorov, Instituto Politécnico Nacional - IPN (México)

Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), México. 

 

 

Juan-Pablo Posadas-Durán, Instituto Politécnico Nacional - IPN (México)

Escuela Superior de Ingeniería Mecánica y Eléctrica, Unidad Zacatenco (ESIME Zacatenco), Instituto Politécnico Nacional (IPN), México.

Published

2019-12-31

How to Cite

Ríos-Toledo, G. ., Castro-Sánchez, N. A. ., Sidorov, G. ., & Posadas-Durán, J.-P. . (2019). Identification of changes in literary writing style using machine learning. Onomázein, (46), 102–128. https://doi.org/10.7764/onomazein.46.04

Issue

Section

Articles

Most read articles by the same author(s)