Hermetic Word Frequency Counter
Counting Words in One File Which are Not in Another

It is unlikely that many people will wish to make use of this possibility, but if so:

If you want to get a list of all the non-common words (subject to the parameters setting) which occur in a file (say, File B) which do not occur in another file (say, File A), then here's how:

  1. Specify File A as the input file.
  2. In the Parameters panel select (if not already selected) the usual common words file (for English, cwds_en.txt).
  3. Set word order to 'Alphabetical' and display format to 'Word'.
  4. Specify an output file (say File C) with a .txt extension.
  5. Count word frequencies (this creates File C).
  6. Open File C in a text editor and delete the first line, "Word (longest has n characters)".
  7. Open the common words file, delete the four comment lines at the top, select all, paste the common words at the end of the words in File C, and save the file under the same name.
  8. Back at the program specify File B as the input file.
  9. In the Parameters panel select File C as the common words file.
  10. Specify an output file (say File D) with a .txt extension.
  11. Count word frequencies.

File D will then contain all words (except for the usual common words) which occur in File B but which do not occur in File A.

File C can then be used with further files B2, B3, etc., to find all non-common words which occur in File B2 but not in File A, in File B3 but not in File A, and so on.

Note that the results must be seen in the context of the parameter settings. For example, if hyphens are permitted in words then "comma" may appear in File D (indicating that it occurs in File B but not in File A) even though "comma-separated" occurs in File A. This is correct provided that "comma" occurs by itself (i.e., other than in a hyphenated word) in File B but does not occur by itself in File A.

Introduction User Manual: Contents
Hermetic Systems Home Page