n-grams
Research on N-Grams in Information Retrieval
http://www.cs.umbc.edu/ngram/
Using Statistical Properties of Text to
Create Metadata
http://www.computer.org/conferences/meta96/crowder/onefile.html
Marc Damashek. Gauging Similarity with N-Grams: Language-Independent Categorization of Text. Science, Vol. 267, pp. 843-848, 10 February 1995.
National Security Agency: Information
Sorting and Retrieval by Language or Topic.
This is Marc Damashek's n-gram algorithm.