Index Files Search Words Lite
User Manual
 


For a sample screenshot click here.


Which files can be indexed?

Indexable (and thus searchable) files are text and text-like files, which may be ordinary text files, HTML files, XML files or in general any non-binary file, that is, any file which consists only of text characters plus whitespace. This excludes files produced by MS-Word and Adobe Acrobat, but allows text which includes non-English letters such as ä, é and ñ. The software does not allow words to include apostrophes (as occurs in French text), numerals or other non-text characters such as colons.

A text-like file is one which contain only bytes with byte values in the range 32 through 255) plus white space (that is, bytes with values 32, 13, 10, 9, 12 and 8, which are the byte values for space, carriage return, linefeed, tab, formfeed and backspace respectively). Non-text-like files cannot be indexed but text-like files with file extensions such as .c, .cpp, .css, .js and .php can be.

The software reads 16-bit Unicode text files as well as 8-bit ASCII text files.


Creating the index file

When you specify the file to be indexed and searched the software looks for a corresponding index file in the same folder (the index file for a particular search file has the same name with .ifswl appended) and reports whether or not an index file was found there. At this point you can tell the program to use some other folder as the location of the index file (and it will then report whether it finds the corresponding index file there). This allows you to keep all index files in one folder, separate from the files which are searched (in case you don't want the index files in the same folders as the search files).

After you have decided on a folder for the index file (by default it is the folder containing the file to be searched) you must create the index file, unless it already exists (having previously been created). If the index file already exists then you can re-create it.

On quit delete the data fileIf the 'On quit delete' checkbox is checked then the index file is deleted when the program ends (if it is large then the program will ask for confirmation). Not deleting it means that it will be available next time the program is run and does not have to be recreated (which may take awhile if the file is large). If searches of a particular file are performed frequently then it is better to keep the index file. The size of the index file depends on the number of different words found in the file to be searched, and ranges from less than 10 KB (for a file with a few hundred different words) to more than 80 KB (for a file with over 10,000 different words).


Search words and search type

There is no limit on the number of search words which can be given for a search. Searches are not case-sensitive (that is, no distinction is made between upper and lower case). You can search for any of the search words, all of the search words or the exact phrase as entered.

Characters other than letters are ignored, and the words in the searchable files and in the 'Search words' textbox are delimited by non-letters. Thus Jack and Jill is three words, Jack-Jill is two words (Jack and Jill) and Jack1 likes Jill2 is three words (Jack, likes and Jill).

Text in languages other than English is searchable. That is, the search words and words in the searchable text may contain non-English letters, such as as ä, é and ñ, but may not contain numerals or apostrophes. Thus the software is fully compatible with text in English, German, Spanish and most European languages (but not French).

Hyphenated search words can be used, but only in exact-phrase searches, such as a search for end-of-the-world schemes.

When an 'any-word' search is performed with more than one search word, a report can be generated showing occurrences of any of the search words which occur in the file. When an 'all-words' search is performed on the same set of search words, a report can be generated only when all the words occur in the file. If they do not, you are informed as to how many of the search words were found. If all the search words occur in the file then an 'all-words' search and an 'any-word' search give rise to the same report.


What if the indexed file is edited?

Suppose you have created an index file for a file, then someone edits that file. Does this mean that the search file has to be re-indexed? No. The index file does not take note of the exact positions of the words in the search file, so some minor editing will not affect the search results, except in the case that editing adds a word to the file which did not previously occur in the file; in this case a search for that word will not find it.

On the other hand, if editing removes all occcurences of a word from the file then a search on that word will not find it because, even though the word remains in the index, the software looks for it in the file and if it does not find it then it will not report that it is there.

The phrases displayed in the search results always accord with the current contents of a file, even if that file has been edited since the last time the index file was created.


Stem searches

A stem is a sequence of letters which may be the initial segment of some word. Stems are marked by a terminating asterisk, e.g., comput*. This software allows searching for multiple words by using stems as search words.

For example, if you search for psycho* the software will find all occurrences of the following words in thefile: psychoactive, psychological, psychology, psychosis, psychospiritual, psychotherapies, psychotherapy and psychotic.

The asterisk can only be used at the end of a search word. You cannot search for, e.g., *like.

The search words may include more than one stem, e.g., myster* school*.

A stem is equivalent to the set of all words which occur in the file and which begin with that stem, so searching on that word is equivalent to doing an any-word search on all the words which begin with that stem. Thus a stem search must be an any-word search; stems may not be used in an all-words search or an exact-phrase search.

When a report is generated, if the search words include one or more stems then the actual words searched for will be displayed (labelled as 'Expanded [search words]'. If no words in the file match a search word (whether or not it is a stem) then that search word (or its expansion) will not appear in this list of actual words searched for. So, for example, if you search on bird gibbon dog* emu*, and bird, dog and dogs occur in the file but gibbon does not, nor does any word beginning with emu (such as emulate), then the list of expanded words will be bird, dog, dogs.


Output options

The results of a search can be displayed either in a textbox within the software or as a web page in your default web browser. The display of results in the web page is preferable, since the search words are then displayed in boldface. The textbox option is provided in case there is some problem with displaying results in the default web browser.

If you have checked the 'Generate report' checkbox then the software will display the results automatically either by opening a textbox or by displaying a web page in your default web browser (you may have to switch to the browser manually). If you have not checked this checkbox then you can generate the report by clicking on the 'Report' button. The results can be preserved either by copying from the textbox to the clipboard (and from there to some text editor program such as Notepad) or by saving the web page to disk.

You can control whether the filepath of the search file is displayed in the report by checking or unchecking the corresponding checkbox. If the search file is an HTML file then the description tag will be displayed in the report if the corresponding checkbox is checked.

You can also control the maximum number of extracts which will be displayed in the report and the size of these extracts — see more on this in the next section.


Number and size of extracts

When the report is generated (but not during the search itself) every occurrence of a search word is extracted together with several words before and after each search word, making a phrase called an extract. If the textbox next to 'Maximum number of extracts' is left blank then all these extracts will be displayed in the report. If the search file is large then in an any-word search, or in a stem search, there could be over a hundred extracts. If you don't wish to see them all, but only, say, the first ten, then you can limit the output by specifying the maximum number of extracts to be displayed.

The number of words before and after the occcurence of a search word is determined by the value selected for 'Size of extract'. For example, here are extracts (following a search of a physics text for ferromagnetic) with this value set to 1, 3, 5, 7 and 9 respectively:

... models of ferromagnetic material. ...

... research concerns only models of ferromagnetic material. / The standard q-state ...

... material J < 0. This research concerns only models of ferromagnetic material. / The standard q-state Potts spin model is a ...

... of an antiferromagnetic material J < 0. This research concerns only models of ferromagnetic material. / The standard q-state Potts spin model is a spin model in which there ...

... J > 0, and for a model of an antiferromagnetic material J < 0. This research concerns only models of ferromagnetic material. / The standard q-state Potts spin model is a spin model in which there can be q different spin ...

Index Files Search Words Lite Index Files Search Words
Hermetic Systems Home Page