A Customizable Monolingual Text Extraction Program for Windows
Hermetic Word Frequency Counter scans an MS Word docx file or a text or text-like file — including HTML and XML files encoded via ANSI or UTF-8 — and counts the number of occurrences of the different words (optionally ignoring common words such as the and this). It is thus also a word-search program. It is possible to specify exactly what counts as a word (e.g., words with hyphens or numerals). The words which are found can be listed alphabetically or by frequency, with rank and frequency count displayed for each word.
There are two versions of this word count software: basic (WFC) and advanced (WFCA, which does everything that WFC does, including scanning docx files). The main differences are that WFC counts words only in single docx, text and text-like files, whereas WFCA counts words in multiple files (in multiple folders) in a single operation and also counts phrases. If you need to count words in only one file at a time then WFC may be what you need. If you have many files or need more options and greater functionality, then you need WFCA. Click on this link for the WFCA page.
Here is a typical screenshot, showing word counts for a 540.80 Kb text file, with common words ignored, upper/lower case distinguished, and the words sorted by frequency:
Here is another screenshot, showing word counts for a 187.62 Kb MS Word docx file (the text itself, when unpacked, is 340.88 Kb), with common words ignored, upper/lower case not distinguished, and the words again sorted by frequency:
In both cases the process took less than 20 seconds.
Theoretically there is no limit on the size of an input file or the number of words in it, but in practice (due to processing time needed) there is a limit of about 10 Mb on text files (and text-like files such as XML and HTML files). There is also a limit of about 10 Mb on the amount of text in an MS Word docx file (though a docx file can be larger than this if it contains many images). For a docx file, only words in the body of the document are counted, not words in footnotes or endnotes.
For text and text-like files (including HTML and XML files) the text may be encoded via ANSI or UTF-8. It does not act directly on binary files such as pdf and MS-Word doc files (as distinct from docx files); such files can be scanned if saved as "Plain Text" files (see Scannable Files).
|ANSI is the single-byte text encoding which is the default encoding on your PC. UTF-8 is a variable-byte-length encoding of Unicode characters, often used in HTML and XML files.|
The program counts the frequencies of all words in the file (or optionally all words other than common words). If you just want to count the occurrences of a single word (or of each word in a set of words, or of any word matching a given pattern) then you can do this with the Advanced Version of this program.
The 'rank' and 'frequency' values may each be included in, or excluded from, the displayed results.
If the output file consists only of words, with no rank or frequency count values, then you can get these either as a list (one word per line) or as comma-separated. This is done by making the appropriate selection in the Display format drop-down menu.
The input file need not consist simply of natural language text, but may be an HTML, XML, PHP or C/C++ file, or may mix natural language with tags such as "<table>".
When processing HTML files, HTML tags such as "<center>" are skipped. When processing XML files all text within "<" and ">" is skipped. PHP files are processed as HTML files in which C-style comments are possible. When processing PHP files, text within "<?php" and "?>" is not skipped.
Trial version: A copy of the Hermetic Word Frequency Counter installation program can be freely downloaded from this website for the purpose of evaluation. Click on the following link for further information:
Download Hermetic Word Frequency Counter ...
Price and ordering: A single-user license for the fully-functional software is available for a period of 3 months, 1 year or with no time limit (a 'perpetual' license). Prices for each type of license are given at Purchase a User License. (A multiple-user license is available for this program.) An activation key is required in order to make the trial version permanently fully functional. An activation key can be obtained immediately if you purchase a user license either via PayPal or via Share-it.
Refund: A refund will be provided promptly up to 30 days after purchase if the software does not perform satisfactorily.
Purchasers of a perpetual user license for Hermetic Word Frequency Counter may upgrade to a perpetual user license for the Advanced Version by paying
$32.75, €28.75 or £20.75 (excluding any sales tax). To purchase the upgrade click on one of the links below. Note that this is available only if a perpetual single-user license for Hermetic Word Frequency Counter has already been purchased.
Updates: Purchasers of a user license for this software are entitled to an update to any later version at no additional cost.
Upgrading to the Advanced Version: