Hermetic Word Frequency Counter Advanced Version
What is a Word?

This program is intended mainly for counting words in natural language text such as English, German, French, etc. (whether in ANSI text files, such as HTML and XML files, or Word docx files). The word ‘word’ usually means a word in a natural language, but for this software it has an extended meaning. A word is any sequence of contiguous (and displayable) characters. The characters in a word may be letters (in any European language), or optionally numerals or any other displayable character, such as a hyphen.

Clicking on the 'Settings' button brings up the Settings panel, the upper half of which looks like this:

Settings panel, upper

A word may (optionally) also include underscores (_), colons (:), periods (.), forward and backward slashes (/\), @-signs, ampersands (&), commas and opening and closing parentheses, plus up to five user-specified characters (such as currency signs and asterisks). The fact that periods and @-signs may (if allowed in the Settings) occur within a word allows counting of email addresses. Allowing colons, forward slashes, hyphens, underscores and periods in a word allows counting URLs.

The following characters are not admissible in words: plus signs (+), semicolons (;), double quotes ("), the tilde (~), and left and right angle-brackets (<>).

There are some restrictions on which characters may occur at the start of end of a word. A word may begin or end with any alphabetic character and with any admissible non-alphabetic character (if such a character is allowed in the Settings) except for an apostrophe, a period, a comma, a hyphen or a parenthesis.

The fact that this program allows 'words' with commas, hyphens and parentheses means that the names of chemical compounds can be treated as single words, e.g., 2,5-dimethoxy-4-(N-propyl-thio)benzaldehyde.

Introduction User Manual: Contents
Hermetic Systems Home Page