This text extraction software comes in two versions: Hermetic Word Frequency Counter (WFC) and Hermetic Word Frequency Counter Advanced Version (WFCA).  These are two separate programs. The main difference is that WFC counts words only in single docx, text and text-like files (including HTML and XML files), whereas WFCA counts words and phrases in multiple files (in multiple folders) in a single operation. If you need to count words in only one file at a time then WFC is what you need. If you have many files or wish to count phrases or need more options and functionality (such as the ability to search for words and phrases), then you need WFCA (so read on). More details are given below in Differences [of WFCA] from the Basic Version.

Hermetic Word Frequency Counter Advanced Version scans an MS word docx file or a text file (an ANSI text file, an HTML file, an XML file, etc.), multiple such files, or text on the clipboard, and counts the number of occurrences of the different words and phrases (optionally ignoring common words such as the and this or words matching specified patterns). As well as being able to count all words and phrases, it can also count the number of occurrences of specified words and phrases in a given list (optionally matching specified patterns), that is, it can search for target words and phrases. The words or phrases which are found can be listed alphabetically, reverse alphabetically or by frequency, with rank and frequency displayed for each word or phrase. The results can be written to a file which can be read into Excel for further processing.

Compatible with Windows 7 and 8 The Advanced Version does everything that the basic version does, including support for UTF-8 encoded text. The section below details the additional functionality of the Advanced Version, mainly, the ability to count words in multiple files, the ability to count phrases as well as words, and the ability to count occurrences of a word or a phrase which matches a specified pattern (so it is also a multiple-file search program). Thus the user manual for the basic version should be read before (or after) reading this page (but note that the appearance of the main window and of the 'Settings' window differ somewhat in the two versions).

This software counts words and phrases in docx and text and text-like files (including HTML and XML files). It does not act directly on binary files (other than docx files) such as pdf files; such files can be scanned if they can be converted to docx files or to text files (see Scannable Files in the user manual for the basic version).

Below is a screenshot when all words in 15 HTML files in a folder containing pages about the Chinese calendar downloaded from the web are counted:

Here is another screenshot when phrases (in the same set of files) are counted. (Click on the image to enlarge.) Here is a screenshot showing the result of counting all words in 65 Word docx files (sizes ranging from 13 KB to 1400 KB) with common words ignored: Here is a screenshot showing the result of counting all phrases of from 4 to 6 words which occur at least 3 times in a Word docx file of size 43.37 Kb containing 40.20 Kb of text:

This software has many different uses. One example is for searching for words and phrases in news stories. You can download multiple pages from the web then search through them for such terms as “economic recovery”, “chinese stocks”, “air traffic controller strike” and “IMF payment”. Searches can return the names of the files in which the target phrases occur, as explained at Report Formats (see also Sorting Documents by the Number of Occurrences of a Word or Phrase).

Differences from the Basic Version

The following are some (but not all) features of the Advanced Version (WFCA) which are not present in the basic version (WFC):

The ability to:

Both the main screen and the 'Settings' screen differ somewhat from those in the basic version, although all the functions of the basic version are retained. Here is what the 'Settings' screen looks like in the Advanced Version:

See Setting the Operation Parameters (in the user manual for the basic version) for further information.

New in version 13.19 and later: the ability to create an Excel file containing
a table of frequency of words or phrases vs multiple files

User Manual

As stated above, the Advanced Version does everything that the basic version does, so the following sections of the user manual for the basic version apply also to the Advanced Version.

