
APPROACHES TO THE CLASSIFICATION OF COMPLEX SYSTEMS: WORDS, TEXTS, AND MOREAndrij Rovenchak (Personal webpage)Ivan Franko National University of Lviv, Faculty of Physics, Ukraine
We will start from some introductory information about quantitative linguistics notions, like rankfrequency dependence, Zipf's law [1], frequency spectra [2], etc. Similarities in distributions of words in texts with level occupation in quantum ensembles hint a superficial analogy with statistical physics [3]. We thus will be able to define various parameters for texts based on this physical analogy, including ''temperature'', ''chemical potential'', entropy, and some others. The calculated parameters will make it possible to classify texts serving as an example of complex systems. Moreover, they are perhaps the easiest complex systems to collect and analyze. In particular, a correlation is observed between the level of language analyticity and the analog of temperature. From such relations, even certain observations regarding the evolution of languages could be made [4]. Similar approaches can be developed to study, for instance, genomes due to wellknown linguistic analogy [5]. We will consider certain nucleotide sequences in the mitochondrial DNA [6] and demonstrate their possible application as an auxiliary tool for comparative analysis of families and genera [7]. Finally, we will discuss entropy as one of the parameters, which can be easily computed from rankfrequency dependences [8]. Being a discriminating parameter in some problems of classification of complex systems, entropy can be given a proper interpretation only in a limited class of problems. Its overall role and significance remains so far an open issue.
References:
[1] I.I. Popescu, G. Altmann, P. Grzybek, B. D. Jayaram, R. Köhler, V. Krupa, J. Macutek, R. Pustet, L. Uhlirova, M. N. Vidya, Word frequency studies (BerlinNew York: Mouton de Gruyter, 2009). 