Hermetic Word Frequency Counter
A Customizable Multiple-File Word and Phrase Counting Program for WindowsThis software comes in two versions: Hermetic Word Frequency Counter (WFC) and Hermetic Word Frequency Counter Advanced Version (WFCA). These are two separate programs. The main difference is that WFC counts words only in single docx, text and text-like files (including HTML and XML files), whereas WFCA counts words and phrases in multiple files (in multiple folders) in a single operation. If you need to count words in only one file at a time then WFC is what you need. (Click on this link for the WFC page.) If you have many files or wish to count phrases or need more options and functionality (such as the ability to search for words and phrases), then you need WFCA (so read on). More details are given below in Differences [of WFCA] from the Basic Version.
Prices Download the
a user license
Testimonials Translate to:Hermetic Word Frequency Counter Advanced Version scans an MS word docx file or a text file (an ANSI text file, an HTML file, an XML file, etc.), multiple such files, or text on the clipboard, and counts the number of occurrences of the different words and phrases (optionally ignoring common words such as the and this or words matching specified patterns). As well as being able to count all words and phrases, it can also count the number of occurrences of specified words and phrases in a given list (optionally matching specified patterns), that is, it can search for target words and phrases. The words or phrases which are found can be listed alphabetically, reverse alphabetically or by frequency, with rank and frequency displayed for each word or phrase. When sorting by frequency, the frequency can be either (a) the absolute or relative frequency of the number of occurrences of words/phrases in all files or (b) the number of files in which a word/phrase occurs. The results can be written to a file which can be read into Excel for further processing.
Theoretically there is no limit on the size of an input file or the number of words in it, but in practice (due to processing time needed) there is a limit of about 10 Mb on text files (and text-like files such as XML and HTML files). There is also a limit of about 10 Mb on the amount of text in an MS Word docx file (though a docx file can be larger than this if it contains many images). For a docx file, only words in the body of the document are counted, not words in footnotes or endnotes.
The Advanced Version does everything that the basic version does, including support for UTF-8 encoded text. The section below details the additional functionality of the Advanced Version, mainly, the ability to count words in multiple files, the ability to count phrases as well as words, and the ability to count occurrences of a word or a phrase which matches a specified pattern (so it is also a multiple-file search program). Thus the user manual for the basic version should be read before (or after) reading this page (but note that the appearance of the main window and of the 'Settings' window differ somewhat in the two versions).
This software counts words and phrases in docx and text and text-like files (including HTML and XML files). It does not act directly on binary files (other than docx files) such as pdf files; such files can be scanned if they can be converted to docx files or to text files (see Scannable Files in the user manual for the basic version).
Below is a screenshot showing the results of counting words in all .txt and .docx files (except for files ending in '_e.docx') in a folder:
Click here for a screenshot showing the output when relative frequencies (instead of absolute frequencies) are calculated.
Here is another screenshot when 3-word phrases (in the same set of files) are counted. (Click on the image to enlarge.) Here is a screenshot showing the result of counting all words in 65 Word docx files (sizes ranging from 13 KB to 1400 KB) with common words ignored: Here is a screenshot showing the result of counting all phrases of from 4 to 6 words which occur at least 3 times in a Word docx file of size 43.37 Kb containing 40.20 Kb of text:
This software has many different uses. One example is for searching for words and phrases in news stories. You can download multiple pages from the web then search through them for such terms as “economic recovery”, “chinese stocks”, “air traffic controller strike” and “IMF payment”. Searches can return the names of the files in which the target phrases occur, as explained at Report Formats (see also Sorting Documents by the Number of Occurrences of a Word or Phrase). There are, of course, many other possible uses for this software.
Differences from the Basic Version
The following are some (but not all) features of the Advanced Version (WFCA) which are not present in the basic version (WFC):
The ability to:
- count not just all words in a file but also all phrases (within bounds of phrase length).
- scan not just one file but all files in a folder, and optionally in all subfolders of that folder, and to return a single report on the frequencies of words and phrases in all files scanned.
- specify not only a list of words to be ignored (such as common words in a natural language) but also specify a list of words and phrases which are to be counted (or searched for).
- count words or phrases matching a given pattern.
- ignore words matching a given pattern.
- display words and phrases counted in reverse alphabetical order as well as in alphabetical order and by frequency.
- display relative frequency of occurrence as well as absolute frequency.
- display, for each word or phrase found when scanning multiple files, the files in which it occurs, and how many times.
- order words or phrases according to the number of files in a set of files in which those words or phrases occur.
- include or exclude files of certain types.
- generate an Excel-readable file containing a table of frequencies of words and phrases vs the files in which they occur.
Most users will need just a few of these abilities, so only the relevant parts of the user manual need be consulted.
Both the main screen and the 'Settings' screen differ somewhat from those in the basic version, although all the functions of the basic version are retained. Here is what the 'Settings' screen looks like in the Advanced Version:
See Setting the Operation Parameters (in the user manual for the basic version) for further information.
Conflation of counts: Note that there are two checkboxes whose intention is to combine the counts for the singular and plurals of certain words. If Drop final 's' unless 'ss' or vowel+'s' is checked then, for example, the count for dogs will be added to the count for dog to give a single count for dog, but the count for dresses will not be conflated with the count for dress. Similarly if Convert final 'ies' to 'y' is checked then the count for canaries will be added to the count for canary to give a single count for canary. This applies only to words longer than five characters so, for example, the count for pie will not be conflated with the count for pies.
New in version 13.19 and later: the ability to create an Excel file containing
a table of frequency of words or phrases vs multiple files
- Multiple Input Files
- Two Modes of Operation: Count-All and Count-Only (Search)
- Count All Words or Count Specified Words And Phrases
- Count All Phrases
- Ignoring Common Words in a Particular Language
- Embedded Comments
- Exporting Results to Excel
- Creation of an Excel File with a Table of Words/Phrases vs Files
- Limiting Phrase Counts
- Report Formats
- Ordering Documents by the Frequency of a Word or Phrase
- Ordering Words or Phrases by the Number of Documents in Which They Occur
- Counting and Searching for Words and Phrases which Match Patterns
- Ignoring Words which Match Patterns
- Multiple Words-to-Ignore and Count-Only-Words/Phrases Files
- Use of this Program to Illustrate Zipf's Law
As stated above, the Advanced Version does everything that the basic version does, so the following sections of the user manual for the basic version apply also to the Advanced Version.
Trial version: A copy of the Hermetic Word Frequency Counter Advanced Version installation program can be downloaded for the purpose of evaluation. Click on the following link for further information:
Download Hermetic Word Frequency Counter Advanced ...
Price and ordering: A single-user license for the fully-functional software is available for a period of 3 months, 1 year or with no time limit (a 'perpetual' license). Prices for each type of license are given at Purchase a User License. (A multiple-user license is available for this program.) An activation key is required in order to make the trial version permanently fully functional. An activation key can be obtained immediately if you purchase a user license either via PayPal or via Share-it.
Refund: A refund will be provided promptly up to 30 days after purchase if the software does not perform satisfactorily.
Updates: Purchasers of a user license for this software are entitled to an update to any later version at no additional cost.
Upgrading from the basic version: Purchasers of a perpetual user license for Hermetic Word Frequency Counter may upgrade to a perpetual user license for the Advanced Version by paying $32.75, €28.75 or £20.75 (excluding any sales tax). To purchase the upgrade click on one of the links below. Note that this is available only if a perpetual single-user license for Hermetic Word Frequency Counter has already been purchased.
Hermetic Systems Home Page