Search Indexed File Hierarchy
Formerly Hermetic MultiFile Search
Hermetic Systems

Windows software for indexing and rapidly searching a hierarchy of files

By hierarchy of files (a.k.a. file hierarchy) we mean a set of files and folders (a.k.a. directories) in a single folder (or at the top level), together will all files (and subfolders) in all subfolders (if any) of that folder.

Search Indexed File Hierarchy is a program for indexing — and subsequently searching quickly — text files, HTML files, XML files and MS Word DOCX files in a file hierachy on a hard disk, USB flash drive, CD-ROM or similar storage medium.

Download and installation of this software has been temporarily suspended pending revision.

Note that this software is not intended to be used to index or to search online web pages.

Search Indexed File Hierarchy does not simply create a list of files or simply search for character strings in a file. It indexes every word in every file, and any file containing a particular set of words (or any of those words, or an exact phrase) can be found by searching for those words (or that phrase). Minor changes can be made to the set of files to be searched without the need for re-indexing.

Examples of uses for this software:

  1. To search files on a flash drive or a CD-ROM (e.g. when that is a copy of a website).
  2. To search the chapters of a book or the sections of a report.
  3. To search the local source files of your website (files on your PC).
  4. To search files in a single folder containing many miscellaneous and unrelated files.

For an example of the output of the search program see here.



Search Indexed File Hierarchy consists of two integrated modules, an index module which is used to create an index of the searchable files and a search module which is used to search those files. When the progam starts up you choose which you wish to work with. For an explanation of each, click on the links below.

The software 'Search Indexed File Hierarchy' at startup

Result of indexing 475 filesThe index module is capable of indexing a hierarchy of over 1000 text, DOCX or HTML files, totalling over 20 Mb, and containing over 50,000 different words. In one test, the index module indexed 475 files (a mixture of text files, HTML files and Word DOCX files) containing 55.61 Mb of text, in about 29 minutes, to produce an index consisting of 117,454 different words. Even with that number of different words the search module returns results in just a few seconds.

The results of a search are displayed automatically in a text box or in your default web browser (with links to the files containing the words searched for).

This software is mainly intended to be used to search for words and phrases in a set of files which is permanent or does not change much. Minor editing of the files will not affect performance, files can be removed, and the set of searchable files can be moved to a different place.

It may also be used with a temporary set of files, in the sense that you might just want to search a particular set of files for a short time. If there are fewer than 100 files that you wish to search, containing fewer than 10,000 different words, then the index module will create a new index file in less than a minute and you can then search those files.


Which files can be indexed (and thus searched)?

This program regards a word as any consecutive sequence of letters, numbers (that is, digits, 0-9) and optionally one or more hyphens, subject to the condition that the first charcter is a letter. Thus a word may not include an apostrophe or any other punctuation. Non-english characters such as ü and é are allowed in words, so this software can be used with text in most European languages such as German, Spanish, etc.

ANSI is the single-byte text encoding which is the default encoding on your PC. UTF-8 is a variable-byte-length encoding of Unicode characters, often used in HTML and XML files.
Indexable (and thus searchable) files are MS Word DOCX files (but not Word DOC files), text and text-like files, which may be ordinary text files, HTML files, XML files, and in general any non-binary file, that is, any file which consists only of ANSI text characters plus whitespace. This allows text which includes non-English letters such as ä, é and ñ. The program works with DOCX files and with text-like files which consist entirely of text encoded via the 8-bit Windows-1252 encoding (this is a superset of ISO 8859-1). Languages which can be 8-bit encoded using Windows-1252 include English, German, French, Danish, Italian, Norwegian, Portuguese, Spanish, Swedish and Finnish.

Some text files are automatically excluded if they have file extensions such as css and js (a complete list of excluded file extensions is given here).


To use this software it is not necessary to study a lengthy manual or follow flow charts. Its use is fully explained in these two web pages:

The Index Module   The Search Module



System requirements: This software will run under all versions of the Windows operating system from Windows 98 to Windows 10 with at least 64 Mb of RAM. Installation requires about 3 Mb of disk space.

Trial version: A copy of the software is available for free download from this website for the purpose of evaluation. Click on the following link for further information:
 
Download ...