A Customizable Word Count Program for Windows
Hermetic Word Frequency Counter scans an MS Word docx file or a text or text-like file — including HTML and XML files encoded via ANSI or UTF-8 — and counts the number of occurrences of the different words (optionally ignoring common words such as the and this). It is possible to specify exactly what counts as a word (e.g., words with hyphens or numerals). The words which are found can be listed alphabetically or by frequency, with rank and frequency count displayed for each word.
There are two versions of this word count software: basic (WFC) and advanced (WFCA, which does everything that WFC does, including scanning docx files). The main differences are that WFC counts words only in single docx, text and text-like files, whereas WFCA counts words in multiple files (in multiple folders) in a single operation and also counts phrases. If you need to count words in only one file at a time then WFC may be what you need. If you have many files or need more options and greater functionality, then you need WFCA. Click on this link for the WFCA page.
Here is a typical screenshot, showing the results of ascertaining word counts in a 2.63 MB file, with common words ignored and with the words sorted by frequency:
There is no limit on the size of an input text file or the number of words in it. There is a limit of about 10 MB on the amount of text in an MS Word docx file (though a docx file can be larger than this if it contains many images). For a docx file, only words in the body of the document are counted, not words in footnotes or endnotes.
For text and text-like files (including HTML and XML files) the text may be encoded via ANSI or UTF-8. It does not act directly on binary files such as pdf and MS-Word doc files (as distinct from docx files); such files can be scanned if saved as "Plain Text" files (see Scannable Files).
|ANSI is the single-byte text encoding which is the default encoding on your PC. UTF-8 is a variable-byte-length encoding of Unicode characters, often used in HTML and XML files.|
The program counts the frequencies of all words in the file (or optionally all words other than common words). If you just want to count the occurrences of a single word (or of each word in a set of words, or of any word matching a given pattern) then you can do this with the Advanced Version of this program.
The 'rank' and 'frequency' values may each be included in, or excluded from, the displayed results.
If the output file consists only of words, with no rank or frequency count values, then you can get these either as a list (one word per line) or as comma-separated. This is done by making the appropriate selection in the Display format drop-down menu.
What Counts as a Word?
This program is intended mainly for counting words in natural language text and in documents containing natural language text plus markup such as found in HTML and XML files. Thus there are some restrictions on which characters are admissible in words, and (for some characters) whether they may occur at the start or end of a word.
The word 'word' usually means a word in a natural language such as English or German, but for this software it has an extended meaning. A word is a sequence of characters bounded by spaces, but it is necessary to specify which characters exactly are admissible in words.
The following characters are not admissible in words: plus signs (+), semicolons (;), double quotes (") and left and right angle-brackets (<>). In the Advanced Version the tilde (~) is also not permitted.
- In the basic (non-Advanced) version a word is any sequence of characters consisting of letters from a European language plus (optionally) hyphens (-), underscores (_), colons (:), periods (.), apostrophes ('), forward and backward slashes (/\), @-signs and numerals.
- In the Advanced Version a word may (optionally) in addition to these also include ampersands (&), commas and opening and closing parentheses, plus up to five user-specified characters (such as currency signs and asterisks).
A word may begin or end with any alphabetic character and with any admissible non-alphabetic character (if such a character is allowed in the Settings window) except for an apostrophe or a period (and, in the Advanced Version, except for a comma or a parenthesis).
Periods and @-signs may (if allowed in the Settings window) occur within a word, thus allowing you to count email addresses. Allowing colons, forward slashes, hyphens, underscores and periods in a word allows you to count URLs.
The fact that the Advanced Version allows words with commas and parentheses means that the names of chemical compounds can be treated as single words, e.g., 2,5-dimethoxy-4-(N-propyl-thio)benzaldehyde. (For more details on this possibility see here.)
Trial version: A copy of the Hermetic Word Frequency Counter installation program can be freely downloaded from this website for the purpose of evaluation. Click on the following link for further information:
Download Hermetic Word Frequency Counter ...
Price and ordering: A single-user license for the fully-functional software is available for a period of 3 months, 1 year or with no time limit (a 'perpetual' license). Prices for each type of license are given at Purchase a User License. (A multiple-user license is available for this program.) An activation key is required in order to make the trial version permanently fully functional. An activation key can be obtained immediately if you purchase a user license either via PayPal or via Share-it.
Refund: A refund will be provided promptly up to 30 days after purchase if the software does not perform satisfactorily.
Purchasers of a perpetual user license for Hermetic Word Frequency Counter may upgrade to a perpetual user license for the Advanced Version by paying
$32.75, €28.75 or £20.75 (excluding any sales tax). To purchase the upgrade click on one of the links below. Note that this is available only if a perpetual single-user license for Hermetic Word Frequency Counter has already been purchased.
Updates: Purchasers of a user license for this software are entitled to an update to any later version at no additional cost.
Upgrading to the Advanced Version: