Hermetic Word Frequency Counter
A Customizable Multiple-File Word and Phrase Counting Program for WindowsThis text extraction software comes in two versions: Hermetic Word Frequency Counter (WFC) and Hermetic Word Frequency Counter Advanced Version (WFCA). These are two separate programs. The main difference is that WFC counts words only in single docx, text and text-like files (including HTML and XML files), whereas WFCA counts words and phrases in multiple files (in multiple folders) in a single operation. If you need to count words in only one file at a time then WFC is what you need. (Click on this link for the WFC page.) If you have many files or wish to count phrases or need more options and functionality (such as the ability to search for words and phrases), then you need WFCA (so read on). More details are given below in Differences [of WFCA] from the Basic Version.
Prices Download the
a user license
Testimonials Translate to:Hermetic Word Frequency Counter Advanced Version scans an MS word docx file or a text file (an ANSI text file, an HTML file, an XML file, etc.), multiple such files, or text on the clipboard, and counts the number of occurrences of the different words and phrases (optionally ignoring common words such as the and this or words matching specified patterns). As well as being able to count all words and phrases, it can also count the number of occurrences of specified words and phrases in a given list (optionally matching specified patterns), that is, it can search for target words and phrases. The words or phrases which are found can be listed alphabetically, reverse alphabetically or by frequency, with rank and frequency displayed for each word or phrase. The results can be written to a file which can be read into Excel for further processing.
The Advanced Version does everything that the basic version does, including support for UTF-8 encoded text. The section below details the additional functionality of the Advanced Version, mainly, the ability to count words in multiple files, the ability to count phrases as well as words, and the ability to count occurrences of a word or a phrase which matches a specified pattern (so it is also a multiple-file search program). Thus the user manual for the basic version should be read before (or after) reading this page (but note that the appearance of the main window and of the 'Settings' window differ somewhat in the two versions).
This software counts words and phrases in docx and text and text-like files (including HTML and XML files). It does not act directly on binary files (other than docx files) such as pdf files; such files can be scanned if they can be converted to docx files or to text files (see Scannable Files in the user manual for the basic version).
Below is a screenshot when all words in 15 HTML files in a folder containing pages about the Chinese calendar downloaded from the web are counted:
Click here for a screenshot showing the output when relative frequencies (instead of absolute frequencies) are calculated.
Here is another screenshot when phrases (in the same set of files) are counted. (Click on the image to enlarge.) Here is a screenshot showing the result of counting all words in 65 Word docx files (sizes ranging from 13 KB to 1400 KB) with common words ignored: Here is a screenshot showing the result of counting all phrases of from 4 to 6 words which occur at least 3 times in a Word docx file of size 43.37 Kb containing 40.20 Kb of text:
This software has many different uses. One example is for searching for words and phrases in news stories. You can download multiple pages from the web then search through them for such terms as “economic recovery”, “chinese stocks”, “air traffic controller strike” and “IMF payment”. Searches can return the names of the files in which the target phrases occur, as explained at Report Formats (see also Sorting Documents by the Number of Occurrences of a Word or Phrase).
Differences from the Basic VersionThe following are some (but not all) features of the Advanced Version (WFCA) which are not present in the basic version (WFC):
The ability to:
- count not just all words in a file but also all phrases (within bounds of phrase length).
- scan not just one file but all files in a folder, and optionally in all subfolders of that folder, and to return a single report on the frequencies of words and phrases in all files scanned.
- specify not only a list of words to be ignored (such as common words in a natural language) but also specify a list of words and phrases which are to be counted (or searched for).
- count words or phrases matching a given pattern.
- ignore words matching a given pattern.
- display words and phrases counted in reverse alphabetical order as well as in alphabetical order and by frequency.
- display, for each word or phrase found when scanning multiple files, the files in which it occurs, and how many times.
- include or exclude files of certain types.
- generate an Excel-readable file containing a table of frequencies of words and phrases vs the files in which they occur.
- calculate relative frequencies of words and phrases as well as absolute frequencies.
Both the main screen and the 'Settings' screen differ somewhat from those in the basic version, although all the functions of the basic version are retained. Here is what the 'Settings' screen looks like in the Advanced Version:
See Setting the Operation Parameters (in the user manual for the basic version) for further information.
New in version 13.19 and later: the ability to create an Excel file containing
a table of frequency of words or phrases vs multiple files
- Multiple Input Files
- Two Modes of Operation: Count-All and Count-Only (Search)
- Count All Words or Count Specified Words And Phrases
- Count All Phrases
- Ignoring Common Words in a Particular Language
- Embedded Comments
- Exporting Results to Excel
- Creation of an Excel File with a Table of Words/Phrases vs Files
- Limiting Phrase Counts
- Report Formats
- Ordering Documents by the Frequency of a Word or Phrase
- Ordering Words or Phrases by the Number of Documents in Which They Occur
- Counting and Searching for Words and Phrases which Match Patterns
- Ignoring Words which Match Patterns
- Multiple Words-to-Ignore and Count-Only-Words/Phrases Files
- Use of this Program to Illustrate Zipf's Law
As stated above, the Advanced Version does everything that the basic version does, so the following sections of the user manual for the basic version apply also to the Advanced Version.
Trial version: A copy of the Hermetic Word Frequency Counter Advanced Version installation program can be downloaded for the purpose of evaluation. Click on the following link for further information:
Download Hermetic Word Frequency Counter Advanced ...
Price and ordering: A single-user license for the fully-functional software is available for a period of 3 months, 1 year or with no time limit (a 'perpetual' license). Prices for each type of license are given at Purchase a User License. (A multiple-user license is available for this program.) An activation key is required in order to make the trial version permanently fully functional. An activation key can be obtained immediately if you purchase a user license either via PayPal or via Share-it.
Refund: A refund will be provided promptly up to 30 days after purchase if the software does not perform satisfactorily.
Updates: Purchasers of a user license for this software are entitled to an update to any later version at no additional cost.
Upgrading from the basic version: Purchasers of a perpetual user license for Hermetic Word Frequency Counter may upgrade to a perpetual user license for the Advanced Version by paying $32.75, €28.75 or £20.75 (excluding any sales tax). To purchase the upgrade click on one of the links below. Note that this is available only if a perpetual single-user license for Hermetic Word Frequency Counter has already been purchased.
Hermetic Systems Home Page