Hermetic Word Frequency Counter Advanced Version Counting Phrases A phrase is a sequence of one or more words (separated by spaces). If all phrases were counted then (in any moderately-sized section of text) there would be a huge number of them, most of which would be of little interest. Thus to count phrases (as distinct from words) you must specify which phrases are to be counted.
If you are interested in just a few words or phrases, they can be added to the 'Extra count-only words/phrases' textbox. If there is more than one word/phrase to be counted then they must be separated by a comma+space (", "), not just a comma. A word or phrase to be counted may include a comma, but may not include a comma+space.
If there are many words/phrases to be counted (termed count-only words/phrases) then they should be placed in a text file, and that file specifed by means of the button labelled 'Count-only words/phrases files'. (Phrases in that file are best kept one per line, but more than one is possible if they are separated by a comma+space.) It is possible to switch between multiple files containing words/phrases to be counted, as explained in the section Multiple Words-to-Ignore and Count-Only-Words/Phrases Files.
If there are count-only words specified then any specification of words to be ignored will be inoperatve. Thus words-to-ignore are not ignored if they occur in count-only phrases. For example, if "is" is to be ignored then it will not be ignored in a phrase to be counted such as "service is available".
Upper/lower case in count-only words/phrases is not distinguished. Upper and lower case in the results is or is not distinguished depending on the setting of 'Upper/lower case significant' in the Parameters panel. For example, if a text file contains "Jack Smith" three times and "Jack smith" once, and the phrase "Jack Smith" is to be counted, then the result will be as follows: If upper/lower case is not significant then "4 jack smith". If significant then "3 Jack Smith, 1 Jack smith".
If words and phrases to be counted are placed in a count-only words/phrases file then two conventions should be observed:
- A phrase must be contained within one line in the file (that is, a phrase cannot extend over more than one line).
- A line may contain more than one word or phrase, but if so then they must be separated by a comma plus a space (not just a comma), e.g., calendar, common era, common era calendar.
There is no limit on the number of words in a phrase, but long phrases significantly increase processing time.
To count only words and phrases which match a certain pattern see Counting Words and Phrases with Pattern-Matching.
Introduction User Manual: Contents Hermetic Systems Home Page