Hermetic Word Frequency Counter What is a Word?
The word 'word' usually means a word in a natural language such as English or German, but for this software it has an extended meaning. A word is a sequence of characters bounded by spaces, but it is necessary to specify which characters exactly are admissible in words.
This program is intended mainly for counting words in natural language text and in documents containing natural language text (including HTML and XML files).
The following characters are not admissible in words: plus signs (+), semicolons (;), double quotes (") and left and right angle-brackets (<>). In the Advanced Version the tilde (~) is also not permitted.
- In the basic (non-Advanced) version a word is any sequence of characters consisting of letters from a European language plus (optionally) numerals, hyphens and apostrophes.
- In the Advanced Version a word may (optionally) also include underscores (_), colons (:), periods (.), forward and backward slashes (/\), @-signs, ampersands (&), commas and opening and closing parentheses, plus up to five user-specified characters (such as currency signs and asterisks).
A word may begin or end with any alphabetic character and with any admissible non-alphabetic character (if such a character is allowed in the Settings window) except for an apostrophe or a period (and, in the Advanced Version, except for a comma or a parenthesis).
Periods and @-signs may (if allowed in the Settings window) occur within a word, thus allowing you to count email addresses. Allowing colons, forward slashes, hyphens, underscores and periods in a word allows you to count URLs.
The fact that the Advanced Version allows words with commas and parentheses means that the names of chemical compounds can be treated as single words, e.g., 2,5-dimethoxy-4-(N-propyl-thio)benzaldehyde. (For more details on this possibility see here.)
Introduction User Manual: Contents Hermetic Systems Home Page