Phrase Frequency Counter Advanced
Non-English Text

Use of the program to count phrases in non-English text

Phrase Frequency Counter Advanced may be used with text in most (but not all) European languages, including German, French, Italian, Spanish and Portuguese — in fact, any language whose characters can be encoded using ISO 8859-1, a subsetof Windows 1252. (For more details see Scannable Files and Languages Supported.) Some European languages (such as Polish and Czech) and all non-European languages (such as Arabic and Hebrew) are not supported.

The screenshot below shows the result of counting phrases in a 1.04 Mb HTML page in French (using the French words-to-ignore file, so that phrases consisting entirely of words-to-ignore are excluded):

Phrase Frequency Counter Advanced counting words in French text

Here is an example of the output when scanning German text for phrases with exactly 4 words occurring at least twice:

Phrases in German text

When we select the option Remove words to ignore from phrases and recount we obtain "condensed" phrases:

German phrases with words-to-ignore removed

Here is an example of the output when scanning Portuguese text (the phrases are ordered alphabetically):

Counting Portuguese phrases

Words to ignore for all six langugesNote that when scanning text in some language one normally must first (via the Settings window) load the words-to-ignore file for that language. This program provides such files for English, German, French, Spanish, Portuguese and Italian. When processing a folder which contains files in more than one of these languages (or when successively scanning single files in more than one language) one can use the words-to-ignore file for all six languages.

PFCA Main Page Further Information
Hermetic Systems Home Page