Hermetic Word Frequency Counter Advanced Version
Count All Phrases

This section of the user manual explains what the "Count all phrases" button does. The previous section (Count All Words or Count Specified Words and Phrases) explains what the "Count words/phrases" button does.


A letter is an element of an alphabet in some language. A word is a sequence of letters and optionally some non-letters such as digits and the hyphen, as in little‑used. (What counts as a word, for this software, is discussed in more detail at What is a Word?) A phrase is a sequence of words separated by spaces and non-letters.

If all phrases are counted then (in any moderately-sized section of text) there are a huge number of them, so a (user-specified) limit has to be placed on the maximum number of words in a phrase. All phrases must be checked to ascertain how often they occur. Often (but not always) only the phrases which occur more than once are of interest.


As noted previously, the main screen has two buttons for counting words and phrases. If you wish to (a) count all words (not phrases) in one or more files or (b) count only specified words or phrases then click on the "Count words/phrases" button; what happens then is explained in the previous section.

If you wish to count all phrases in one or more files then click on the "Count all phrases" button. This brings up a panel such as this:

You can then specify the lengths of the shortest and longest phrases that you are interested in, and the minimum number of times they should occur in order to be included in the results. Select one of the options besides the radio buttons. (These are disabled if there are no words to ignore.) The result is something like this:

he rememberedSuppose you have requested a count of phrases with the minimum number of words less than the maximum number, e.g., phrases of length 2 and 3. Then all 3-word phrases such as "he remembered aeroplanes" will contain two proper sub-phrases, in this case "he remembered" and "remembered aeroplanes". This program does not count proper sub-phrases separately except when a sub-phrase is part of two (or more) different phrases, as in "he remembered aeroplanes" and "he remembered vividly". In such a case the sub-phrase gets its own count, as shown at right.


If you select "Allow phrases consisting entirely of words to ignore" then the result will usually contain many phrases which are of no interest.

If you select "Remove words to ignore from phrases" then the result (in this case) is:

Such sequences of words are usually not actually occurring phrases, but this form of display brings together those words-not-to-ignore which are more-or-less adjacent, so may be more useful.


One can count pairs of words, triples of words, etc., by setting maximum length equal to the minimal length (e.g. 2,2 or 3,3) and by selecting "Exclude phrases containing a word-to-ignore" to give, for example:

 


As when counting words/phrases, results can be displayed in various formats by selecting one from the "Format" drop-down list. If the file source is a folder then the most detailed is the option "word file-list (+freq)". A less detailed format which gives the number of files in which a word or phrase occurs is "word freq. no.files". If you have not selected this then at this panel you are given an opportunity to do so.

Other display formats which show the files in which words/phrases occur are described at Display Formats.

Larger values for the length of the longest phrase increase the processing time. The value for "Minimum number of occurrences" does not affect the processing time.


To reduce the number of phrases found by specifying 'filter words' see Filter Found Phrases.

To count all phrases which match a certain pattern see Counting Words and Phrases with Pattern-Matching.

Introduction User Manual: Contents
Hermetic Systems Home Page