Hermetic Word Frequency Counter
What is a Word?

The word 'word' usually means a word in a natural language such as English or German, but for this software it has an extended meaning. A word is a sequence of characters bounded by spaces, but it is necessary to specify which characters exactly are admissible in words.

This program is intended mainly for counting words in natural language text and in documents containing natural language text plus markup such as found in HTML and XML files. Thus there are some restrictions on which characters are admissible in words, and (for some characters) whether they may occur at the start or end of a word.

The following characters are not admissible in words: plus signs (+), semicolons (;), double quotes (") and left and right angle-brackets (<>). In the Advanced Version the tilde (~) is also not permitted.

A word may begin or end with any alphabetic character and with any admissible non-alphabetic character (if such a character is allowed in the Settings window) except for an apostrophe or a period (and, in the Advanced Version, except for a comma or a parenthesis).

Periods and @-signs may (if allowed in the Settings window) occur within a word, thus allowing you to count email addresses. Allowing colons, forward slashes, hyphens, underscores and periods in a word allows you to count URLs.

The fact that the Advanced Version allows words with commas and parentheses means that the names of chemical compounds can be treated as single words, e.g., 2,5-dimethoxy-4-(N-propyl-thio)benzaldehyde. (For more details on this possibility see here.)

Introduction User Manual: Contents
Hermetic Systems Home Page