Phrase Frequency Counter Advanced
What is a Word?

The word 'word' usually means a word in a natural language such as English or German, but for this software it has an extended meaning. A word is a sequence of characters (mostly letters) bounded by spaces, but it is necessary to specify which characters exactly are admissible in words.

This program is intended mainly for counting words in natural language text and in documents containing natural language text (including HTML and XML files). So a word is any sequence of characters consisting of letters from a European language plus (optionally) numerals, hyphens, apostrophes, underscores (_), colons (:), periods (.), forward and backward slashes (/\), @-signs, ampersands (&), commas and opening and closing parentheses. The following characters are not admissible in words: plus signs (+), semicolons (;), double quotes (") and left and right angle-brackets (<>).

A word may begin or end with any alphabetic character and with any admissible non-alphabetic character (if such a character is allowed in the Settings window) except for an apostrophe, a period, a comma or a parenthesis.

Periods and @-signs may (if allowed in the Settings window) occur within a word, thus allowing you to count email addresses. Allowing colons, forward slashes, hyphens, underscores and periods in a word allows you to count URLs.

The fact that this program allows words with commas and parentheses means that the names of chemical compounds can be treated as single words, e.g., 2,5-dimethoxy-4-(N-propyl-thio)benzaldehyde.

Introduction User Manual: Contents
Hermetic Systems Home Page