Hermetic Word Frequency Counter What is a Word? The term 'word' usually means a word in a natural language such as English or German, but for this software it has an extended meaning. A word is a sequence of characters bounded by spaces, but it is necessary to specify which characters exactly are admissible in words.
- In the basic version a word is any sequence of characters consisting of letters from a European language plus (optionally) hyphens (-), underscores (_), colons (:), periods (.), apostrophes ('), forward and backward slashes (/\), @-signs and numerals.
- In the Advanced Version a word may (optionally) in addition to these also include ampersands (&), commas and opening and closing parentheses, plus up to five user-specified characters (such as currency signs and asterisks).
The following characters are not admissible in words: plus signs (+), semicolons (;), double quotes (") and left and right angle-brackets (<>). (And, of course, a word cannot include a space.)
A word may begin or end with any alphabetic character and with any admissible non-alphabetic character (if such a character is allowed in the Set Parameters window) except for a hyphen, an apostrophe, a period or a colon (and, in the Advanced Version, except for a comma or a parenthesis).
Periods and @-signs may (if allowed in the Set Parameters window) occur within a word, thus enabling the counting of email addresses. Allowing colons, forward slashes, hyphens, underscores and periods in a word enables the counting of URLs.
The fact that the Advanced Version allows words with commas and parentheses means that chemical compounds can be treated as words, e.g.: 2,5-dimethoxy-4-(N-propyl-thio)benzaldehyde. (For more details on this possibility see here.)
Introduction User Manual: Contents Hermetic Systems Home Page