|
The search module of Index Files Search Words is for searching text, HTML, XML, etc. files in a hierarchy of files on hard disk, etc. (not on an online website). It requires an index file previously created by the index module containing data regarding the contents of the searchable files.
There is no limit on the number of search words which can be given for a search. Searches are not case-sensitive (that is, no distinction is made between upper and lower case). You can search for files which contain any of the search words, all of the search words or the exact phrase as entered.
Characters other than letters are ignored, and the words in the searchable files and in the 'Search words' textbox are delimited by non-letters. Thus Jack and Jill is three words, Jack-Jill is two words (Jack and Jill) and Jack1 likes Jill2 is three words (Jack, likes and Jill).
Text in languages other than English is searchable. That is, the search words and words in the searchable text may contain non-English letters, such as as ä, é and ñ (unless such words were excluded when the index module was created), but may not contain numerals or apostrophes. Thus the software is fully compatible with text in English, German, Spanish and most European languages, but is only partially compatible with French text (which has words containing apostrophes).
Hyphenated search words can be used, but only in exact-phrase searches, such as a search for end-of-the-world schemes.
What if the indexed files are edited?
Suppose you have created an index file for a set of files, then someone edits those files. Will this mean that the search module won't work properly and that those files have to be re-indexed? No. The index file does not take note of the exact positions of words in a file, so some minor editing will not affect the search results, except in the case that editing adds a word to a file which was not previously in that file and which is not in any other indexed file. In this case a search for that word will not return that file among the search results.
On the other hand, if editing removes all occurrences of a word from a file then a search on that word will not bring up the file in the search results (even though the word remains in the index) because the search module looks for that word in the file (in which it previously occurred), and if it does not find it then it will not display that file in the results.
The phrases displayed in the search results always accord with the current contents of a file, even if that file has been edited since the last time the index file was created.
Also, if an indexed file is deleted then this will not faze the search module. The deleted file will simply not show up in search results.
What if the indexed files are moved?
No problem. When the files to be searched are indexed only their paths relative to the original top folder are recorded. Thus the indexed files (in the original top folder and its lower folders) can be moved to a new location, that is, to a new top folder (on the same drive or a different drive). If you then specify this new folder as the top folder in the search module then the indexed files in their new location can be searched.
In this user manual searching the author's M.Phil. thesis on spin models is used to demonstrate the operation of the software. If you wish to follow this example in a hands-on manner first install the trial version of the software. Download the file spin_models.zip (370 KB) and unzip the files into some folder on your hard disk, say, spin_models. Then download the index file spin_models_index.ifsw (30 KB) to some folder, say, ifsw_files. (The index file may be in the same folder as the files to be searched, but need not be). If you have followed the example of the use of the index module then you will already have these files in these folders. If the folders do not already exist then you should create them using Windows Explore.
When first run the search module appears thus:
The first thing to do is to click on the 'Index file' button to specify the location of the index file for this set of files. In this example it is spin_models_index.ifsw in the ifsw_files folder. (All index files created by this program end in _index.ifsw.) The software reads this index file and displays the title of the searchable files, the number of those files and the number of different words in those files.
Once an index file has been specified you can click on the 'List indexed files' button to see which files are included in this index.
Now click on the 'Top folder' button to specify the location of the top folder containing the searchable files. In this example it is the spin_models folder.
The top part of the window should now look like this:
Now for a search. Enter "dynamic exponent" in the 'Search words' textbox, select "Exact phrase" as the search type and click on the 'Search' button. After a few seconds the results will appear:
If you now click on the 'Report' button (with the 'HTML' option selected), or if you repeat the search with the 'Generate report' checkbox checked, then a web page will appear in your default web browser (this may take a few seconds and you may have to switch to the browser window manually). In the trial version the titles are simply titles. In the fully-functional version (after activation of the software) they would link to the files in the spin_models folder on your PC. In the example below they go to pages on this website.
Appendix 8: Relaxation Time and Singular Dynamic Scaling
... scaling Aeppli et al. found that the dynamic exponent Z for the relaxation time had a value close ...
... time ~ x Z eff where Z eff (the "effective dynamic exponent") = A(ln x T ) + B, with constant A and B . / ...
Footnotes
... the symbol q denotes only the new critical dynamic exponent, but sometimes here it is used to mean the ...
Now (i) enter "computational theoretical" into the 'Search words' textbox, (ii) select "Any word" as the search type, (iii) set the maximum number of extracts to 8 and (iv) click on the 'Search' button. Six files will be returned (as at right) and (if the 'Generate report' checkbox was checked) this page should appear in your web brower (with links to the files in the spin_models folder; here the links have been changed to point to the pages on this website).
Note that in the case of "Chapter 1: Spin Models" the number of extracts is reported as "8 (29 possible)". This tells us that if no limit had been set on the maximum number of extracts then 29 would have been displayed (showing all occurrences of the search words in this file).
It may happen that you are using an index file for a set of files but wish to search just one of those files. In this case simply select the 'Search a single indexed file' option and specify the file in the usual way.
For a more detailed discussion see Indexing and Searching a Single File. Note that searching a single file can be done regardless of whether the index file was created by indexing just that file or by indexing many files.
A stem is a sequence of letters which may be the initial segment of some word. Stems are marked by a terminating asterisk, e.g., comput*. This software allows searching for multiple words by using stems as search words.
For example, if you search for dog* the software will find all files containing words which begin with the stem, e.g., dog, dogs, doglike, doggy, dogged, doggone, dogon, etc. As another example, if you search on hypothe* then the software will find all files which contain hypothesis, hypotheses, hypothesize and hypothetical.
The asterisk can only be used at the end of a search word. You cannot search for, e.g., *like.
The search words may include more than one stem, e.g., myster* school*.
A stem is equivalent to the set of all words which occur in some file and which begin with that stem, so searching on that word is equivalent to doing an any-word search on all the words which begin with that stem. Thus a stem search must be an any-word search; stems may not be used in an all-words search or an exact-phrase search.
When a report is generated, if the search words include one or more stems then the actual words searched for will be displayed (labelled as 'Expanded [search words]'. If no words in any file match a search word (whether or not it is a stem) then that search word (or its expansion) will not appear in this list of actual words searched for. So, for example, if you search on bird gibbon dog* emu*, and bird, dog and dogs occur in some file (not necessarily the same file) but gibbon does not, nor does any word beginning with emu (such as emulate), then the list of expanded words will be bird, dog, dogs.
The results of a search can be displayed either in a textbox within the software or as a web page in your default web browser (as mentioned in the example of use above for this page). The display of results in the web page is preferable, since the search words are then displayed in boldface and there are links to the files found. The textbox option is provided in case there is some problem with displaying results in the default web browser.
If you have checked the 'Generate report' checkbox then the software will display the results automatically either by opening a textbox or by displaying a web page in your default web browser (you may have to switch to the browser manually). If you have not checked this checkbox then you can generate the report by clicking on the 'Report' button. The results can be preserved either by copying from the textbox to the clipboard (and from there to some text editor program such as Notepad) or by saving the web page to disk.
You can control whether the filepaths of the files found are displayed in the report by checking or unchecking the corresponding checkbox. If a file found is an HTML file then the description tag will be displayed in the report if the corresponding checkbox is checked.
As illustrated in the example of use above, you can also control the maximum number of extracts which will be displayed in the report. You can also control the size of these extracts (see more on both of these points below).
If the 'Sort files found by search word occurrence' is not checked then the files will be displayed in the order in which the index module found them when creating the index file, their physical order.
If this checkbox is checked then the program will take note of the relative frequencies of the search words found in the files. Files are then displayed in descending order of the sum of these frequencies for all search words. Thus files in which the relative frequencies of the search words are higher will be displayed earlier in the output.
"Relative frequency" here means: relative to the number of occurrences in the file of the word most-frequently occurring in the file, and is different from the absolute frequency of a word. The use of relative rather than absolute frequencies means that the size of the file in which a search word is found does not affect the position of that file in the search results.
If this checkbox is checked then the search will take a little longer.
There is no limit on the number of files which can be searched (other than that this is the number of files which were indexed) but there is a limit on the number of matches (files found containing one or more of the search words). At most 300 files can be returned in a search.
When the report is generated (but not during the search itself) every occurrence of a search word is extracted together with several words before and after each search word, making a phrase called an extract. If the textbox next to 'Maximum number of extracts' is left blank then all these extracts will be displayed in the report. In an any-word search, or in a stem search, for each file in which at least one search word is found there could be over a hundred extracts. If you don't wish to see them all, but only, say, the first ten, then you can limit the output by specifying the maximum number of extracts to be displayed.
The number of words before and after the occcurence of a search word is determined by the value selected for 'Size of extract'. For example, here are extracts (following a search for ferromagnetic) with this value set to 1, 3, 5, 7 and 9 respectively:
... models of ferromagnetic material. ... ... research concerns only models of ferromagnetic material. / The standard q-state ...
... material J < 0. This research concerns only models of ferromagnetic material. / The standard q-state Potts spin model is a ...
... of an antiferromagnetic material J < 0. This research concerns only models of ferromagnetic material. / The standard q-state Potts spin model is a spin model in which there ...
... J > 0, and for a model of an antiferromagnetic material J < 0. This research concerns only models of ferromagnetic material. / The standard q-state Potts spin model is a spin model in which there can be q different spin ...
When you specify an index file, or when you load the settings after having saved them at the last run and you initiate asearch, the program looks for the data file associated with the index file. As explained in the section on the index module, a data file is created along with an index file, but need not be present when the search module is run. The search module looks for the associated data file in the same folder as the index file; if it is not found then the search module recreates the data file (in the same folder as the index file).
If the 'On quit delete' checkbox is checked then the data file is deleted when the program ends. Not deleting it means that it will be available next time the program is run and does not have to be recreated by the search module (though this does not take long to do). The data file may be rather large if there are many searchable files and they contain many different words, so if searches are not performed on a daily basis then this file may be deleted to recover disk space.
Index Files Search Words The Index Module Hermetic Systems Home Page