Ignoring Common Words and Stop Words

A stop word (in the terminology of linguistics) is a word to be ignored or removed. When counting words in natural language text, stop words are usually common words such as 'the', 'and', etc. You can tell the program to ignore such common words. These words are contained in a file of your choice (depending on language). When this file has been specified and Ignore common words in file is checked the program will ignore any words in the text which it finds in the specified file. No distinction is made between upper and lower case in words to be ignored. For example, if "and" is to be ignored then "AND" will also be ignored.

If just a few special words are to be ignored then they can be specified in the Ignore these words textbox, as shown above.

To ignore common words in a particular language click on the Select common words in some language button to bring up the language selection panel, then select the language.

Eight files are provided containing common words in English (cwds_en.txt, 353 words), German (cwds_de.txt, 373), French (cwds_fr.txt, 325), Italian (cwds_it.txt, 273), Spanish (cwds_es.txt, 277), Portuguese (cwds_pt.txt, 276), both English and German (cwds_en_de.txt) and common words in all six languages (cwd_en_de_es_pt_it_fr.txt). These files are in the folder containing the program files (created during program installation), and there is a download link in the Windows Explorer program menu after installation to a ZIP file containing all eight files.

You can add or remove words as you wish, and words do not have to be in alphabetical order or on separate lines. Words within a single line must be separated by a space (not a comma or a comma+space). As noted above, upper/lower case is not significant in the list of common words. The file must consist only of text (that is, binary files such as MS Word files cannot be used).

The "common words" file need not actually consist of common words. You can include any words in the file, common or otherwise. For some reason you might wish to ignore a set or words which are not common words.

In the user manual for the Advanced Version the term "common words" is replaced by the term "words-to-ignore". The words-to-ignore file cannot contain phrases, only words.

