Problem: You have two large sets of ANSI text files. You suspect that some of the files in one set are duplicates, or near duplicates, of files in the other set. You want to remove duplicate files so that there are no files common to both sets. But the number of files in these sets is so large (say, ten or more in each set) that it is impractical to compare the two sets by inspecting each file in one set and comparing it with all the files in the other set. A program to find exact duplicates might give some matches, but that depends on the files being exactly the same. What if some files in one set are almost the same as files in the other set, or are the same except for some header text at the start?

Solution: You put each set of files into a separate folder. Your run Duplicate Text Finder and specify the two folders. You click on the Start button and the program begins comparing all the files in the 2nd folder with the files in the 1st folder. If it finds any duplicates then it tells you.

When the program starts up the first time it looks like this:

Duplicate Text Finder startup

How to Test the Trial Version

You can test the trial version as follows: In some temporary folder create two subfolders, say, 1st folder and 2nd folder. Download the zip file dtf_test_files_1st_folder.zip and extract the files into the subfolder 1st folder Then download the zip file dtf_test_files_2nd_folder.zip and extract the files into the subfolder 2nd folder. Download and install the trial version (follow the link below). Run the program and specify 1st folder as the 1st folder and 2nd folder as the 2nd folder. Then click on the 'Start' button and the result should be as shown below:

Duplicate Text Finder result

What the Program Does

When the program is started (after the 1st and 2nd folders are specified) it first reads an initial part of each file in the 1st folder, extracts the words and stores them. Then for each file in the 2nd folder it reads an initial part of that file, extracts the words, and compares them with the words extracted from each of the files in the 1st folder. The words to not have to match exactly, but should be close. If it finds a match then it displays the file names and (optionally) the words from each file which justify the match. If matches are found then you can copy the results to the clipboard or save them to a file, then eliminate the duplicates if you wish.


There are a few ways to adjust the operation of the program.

