Hermetic Word Frequency Counter Advanced Version
Count All Phrases

This section of the user manual explains what the "Count all phrases" button does. The previous section (Count All Words or Count Specified Words and Phrases) explains what the "Count words/phrases" button does.


A phrase is a sequence of two or more words (separated by spaces). If all phrases are counted then (in any moderately-sized section of text) there are a huge number of them, so a (user-specified) limit has to be placed on the maximum number of words in a phrase. All phrases must be checked to ascertain how often they occur. Often (but not always) only the phrases which occur more than once are of interest.


As noted previously, the main screen has two buttons for counting words and phrases. If you wish to (a) count all words (not phrases) in one or more files or (b) count only specified words or phrases then click on the Count words/phrases button; what happens then is explained in the previous section.

If you wish to count all phrases in one or more files then click on the Count phrases button. This brings up a panel whose content depends on the settings, specifically, on (a) whether there is a specification of words to ignore and (b) whether there is a specification of words and phrases to count. Clicking on the Count all phrases button brings up a panel such as this:

You can then specify the lengths of the shortest and longest phrases that you are interested in, and the minimum number of times they should occur in order to be included in the results. Clicking on the Count phrases button will then count all such phrases (a) except those consisting entirely of words-to-ignore or (b) except those containing at least one word-to-ignore (depending on which checkboxes are checked).

Suppose we are counting phrases of length 3, 4 or 5, with the checkbox "Exclude phrases consisting only of words to ignore" checked, then we might obtain:

If, in addition, the checkbox "Exclude phrases containing a word to ignore" is checked then we obtain (because all the phrases shown above contain "the" or "of"):

Thus one can count word pairs, word triples, etc.:

 


As when counting words/phrases, results can be displayed in various formats by selecting one from the Format drop-down list. If the file source is a folder then the most detailed is the option word file-list (+freq). A less detailed format which gives the number of files in which a word or phrase occurs is word freq. no.files. If you have not selected this then at this panel you are given an opportunity to do so.

Other display formats which show the files in which words/phrases occur are described at Display Formats.

Larger values for the length of the longest phrase increase the processing time. The value for Minimum number of occurrences does not affect the processing time.

If you uncheck the Exclude phrases consisting only of words to ignore, or if you have not specified any words-to-ignore, then the result will often contain phrases which are of no interest.

If you have specified a list of words or phrases to be counted (either in a count-only words/phrases file or in the Extra count-only words/phrases textbox in the Settings window) then clicking on the Count all phrases button brings up a similar panel which advises you that the operation will ignore the specification of count-only words/phrases (since you wish to count all phrases, not just particular phrases).


To count all phrases which match a certain pattern see Counting Words and Phrases with Pattern-Matching.

To reduce the number of phrases found by specifying 'filter words' see Filter Found Phrases.

Introduction User Manual: Contents
Hermetic Systems Home Page