Authorship Attribution Using Principal Component Analysis and Nearest Neighbor Rule for Neural Networks

Mehmet Can

doi:10.21533/scjournal.v1i2.59

Authorship Attribution Using Principal Component Analysis and Nearest Neighbor Rule for Neural Networks

Mehmet Can

Abstract

Feature extraction is a common problem in statistical pattern recognition. It refers to a process whereby a data space is transformed into a feature space that, in theory, has exactly the same dimension as the original data space. However, the transformation is designed in such a way that the data set may be represented by a reduced number of "effective" features and yet retain most of the intrinsic information content of the data; in other words, the data set undergoes a dimensionality reduction. Principal component analysis is one of these processes. In this paper the data collected by counting selected syntactic characteristics in around a thousand paragraphs of each of the sample books underwent a principal component analysis. To make a comparison, the original data is also processed. Authors of texts identified with higher success by the competitive neural networks, which use principal components. The process repeated on another group of authors, and similar results are obtained.

Full Text:

PDF

DOI: http://dx.doi.org/10.21533/scjournal.v1i2.59

Refbacks

There are currently no refbacks.

Digital Object Identifier DOI: 10.21533/scjournal

This work is licensed under a Creative Commons Attribution 4.0 International License

Username
Password
Remember me

Southeast Europe Journal of Soft Computing

Authorship Attribution Using Principal Component Analysis and Nearest Neighbor Rule for Neural Networks

Abstract

Full Text:

Refbacks