Hermetic Sitemap Generator
As we said earlier, Hermetic Sitemap Generator is an offline XML Sitemap builder which scans files in a specified folder on te local hard drive and files in selected subfolders, and generates an XML Sitemap. This can then be submitted to search engines to help them index your website. This program can automatically set appropriate values of the change-frequency and priority tags for each file in the Sitemap.
Here is a sample screenshot:
Specifying the Top Folder
It is assumed that:
- All the files in the online website are contained in a local copy on your hard drive.
- All the files in the local copy are contained in a folder tree (starting with a top folder such as C:\websites\ace_hardware\).
- The structure of the folder tree on your website is mirrored in the structure of the folder tree on your local copy. (There may be subfolders of the top folder on your hard drive which do not correspond to any subfolders on your website.)
- There may be (and probably are) files in the local copy which are not to be included in the Sitemap.
Hermetic Sitemap Generator scans the files in the top folder of your local copy and in all subfolders recursively to build the Sitemap.
Click on the 'Top folder' button to specify the top folder in the usual way, as shown above right.
Specifying the Website URL
The top folder on your local hard drive should correspond to the top folder on your website. When Hermetic Sitemap Generator builds the Sitemap it has to state a URL for each of the files on your website that are included in the Sitemap, so it has to know the URL for the top folder.
Included File Types
You can specify the file types to be included in the Sitemap by clicking on the 'Set file types' button, which brings up a panel as at right, with a list of all file types which can be included in Google search results. Only file types which are checked will be included when Hermetic Sitemap Generator generates the Sitemap file.
Click to enlarge
This panel must be displayed, and the 'Confirm' button clicked, at least once before files can be listed or a Sitemap built. This is to ensure that you are aware of what file types are allowed in the Sitemap (and thus what file types will automatically be excuded).
Clicking on the 'Default' button selects the file types .htm, .html, .doc, .docx, .pdf and .txt.
Note that if you include .php files then there may be some which you wish to exclude. They can be excluded as explained in Exclusion of Files and Lower Subfolders below.
Exclusion of Top Subfolders
If there are some files in the folder tree which are not on your website, or which you don't want included in the Sitemap, then before instructing the program to scan the folder tree and build the Sitemap you have to specify which files are to be excluded from the scan.
The first step in doing this is to exclude unwanted top subfolders (these are the immediate subfolders of the top folder). Normally you will wish to scan most, if not all, of the top subfolders. Some of them, however, may contain only files which are not to be included in the Sitemap, so these subfolders should be excluded. To do this click on the 'Exclude top subfolders' button and check those which are to be excluded.
Exclusion of Files and Lower Subfolders
The following files are automatically excluded:
- Those HTML files which contain a robots meta tag which includes noindex.
- Files of a type which do not appear in Google search results.
Even after selecting some top subfolders for exclusion there may still be files in the included top subfolders (and in subfolders of these folders) which you don't wish to include in the Sitemap (in particular, some .php files). There are three ways to exclude these files:
- By specifying a minimum file size for inclusion.
- By giving a set of character strings such that any file whose pathname includes one of these strings will be excluded.
- By specifically selecting files for exclusion.
- By Filesize
Files of less than 1000 bytes usually don't contain anything worth returning in a search, but rather are redirection files and such.
- By Filepath Exclusion
A file is identified by a filepath, which is the name of the file preceded by the name of the folder containing it, preceded by the name of the folder containing that folder, and so on, up to the root folder. For example, if folder websites is in the root folder of Drive C:, and contains a subfolder example which contains a subfolder bak which contains a file file.txt then the filepath for that file is C:\websites\example\bak\file.txt.
Hermetic Sitemap Generator allows you to exclude files and subfolders whose filepaths contain a given character string (or any string in a given list of strings). This may be a substring of a folder name in the filepath, a substring of a file name, or any substring of a filepath (e.g., docs\2008-). This allows you to exclude files with the same (or similar) names in several subfolders, e.g., template.htm. It also allows you to exclude files in subfolders of included top subfolders, e.g., if you have some deeply-nested subfolders named bak which contain backups of files in immediately higher subfolders. When excluding subfolders it is advisable to include a leading and trailing backslash, e.g., \bak\.
These character strings must be separated by commas, may contain spaces and are not case-sensitive. (Remember that in Windows "ABC.TXT" is equivalent to "abc.txt", but this is usually not true for files on your webserver.)
If you have too many character strings to be visible all at once in the textbox then you can click on the 'Expand' button and a window will open displaying them one per line. You can edit this textbox by adding or deleting character strings, and then cancel or confirm. If you confirm then the strings will reappear in alphabetical order in the 'Exclude filepaths' textbox.
This facility may be used in conjunction with the file listing facility (see below). The file list may be inspected for files which you wish to exclude, then some identifying part of the filepaths (such as a subfolder name) may be added to the 'Exclude filepaths' textbox or to the 'Filepath exclusions' textbox.
If there are still files which you don't want included in the Sitemap, and which have not been excluded by either of the two preceding methods, then they must be excluded specifically. There are two ways to do this:
(a) If you have listed files to be included (see the next section), and the list includes some that you wish to exclude, you can simply copy them and paste them into the "Excluded files" window. (b) You can also click on the 'Exclude files' button, select a folder, and select the file(s) to be excluded. Only files with file types specified in the 'Set file types' window will be displayed.
In the example at right four files are excluded. To select multiple files first select one file, then hold down the ctrl key and select the other files, then click on the 'Open' button. The files will then be added to the textbox which lists the excluded files (unless the selected file is not in an included folder). The result in this case would be:
Do this for each folder which contains unwanted files which are not automatically excluded and are not excluded by methods (i) or (ii). Select no more than about eight files at one time, otherwise a "Too many files selected" error message will result.
The 'Excluded files' textbox is editable. Thus if you wish to remove a file which you have selected for exclusion then you can delete it from the textbox; just be sure that the excluded files are one to a line (blank lines are ignored).
Listing Files to be Included
After specifying the top folder and the allowed file types, and after excluding unwanted folders and files, you can get a list of the files which will be included in the Sitemap by clicking on the 'List files' button.
It is advisable to view the files to be indexed before building the Sitemap. If you see a file that you don't want included then you can exclude it either by means of one of the three methods of file exclusion described above or by cutting and pasting the filepath (without the top folder) into the 'Excluded Files' textbox (but make sure to put this file on a separate line).
Folder and File Names with Special Characters are not Included
A character in the name of a folder or a file is special if its ASCII value is less than 32 or greater than 127. All letters of the English alphabet (plus numerals, etc.) are non-special. All letters of non-English alphabets which have diacritical marks (e.g., ü, é and ñ) are special characters. At the present time this program does not allow inclusion in the Sitemap of files whose filepaths contain special characters.
When files are listed, and folders or files with special characters are found, a note at the bottom of the listing will inform you of this, as in:
Renaming a file using only non-specal characters will allow it to be included.
Options for the XML Sitemap
A Sitemap file consists mainly of items of the following form, one for each file, which provide for search engines the location of the file, the date (and optionally the time) of the last modification of the file, the frequency with which the file is modified, and its relative importance with respect to other files on the website.
<url> <loc>http://www.yoursite.com/contents.htm</loc> <lastmod>2009-04-29T14:30:22+00:00</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url>
Only the <loc> tag is required, the other three are optional. A "+00:00" after the time means GMT. The priority value can range from zero to 1.0.
Hermetic Sitemap Generator allows you to select from several options for these tags when building the Sitemap, as shown here:
- Last Modified Tag
You can include just the date, e.g., "2009-04-29", or the date plus the time, or you can omit the lastmod tag entirely.
- Change Frequency Tag
You can specify a single value for all files ("daily", "weekly", "monthly", "yearly", "always" or "never"), you can omit the tag, or you can have the program assign a value automatically.
If automatically then the program will select from "daily", "weekly", "monthly" and "yearly" depending on how long it has been since the file was last modfied. You can exclude "daily" or "yearly" by selecting the appropriate option.
- Priority Tag
You can specify the value 0.5 for all files, you can omit the tag, or you can have the program assign a priority automatically. In the last case the program will assign a value between 0.01 and 1.0 depending on how deeply nested is the subfolder containing this file. Files which are lower in the folder tree receive lower priority values.
- Order of Items
Usually Sitemap generators simply write the file items to the Sitemap file in the order in which they are found when traversing the folder tree (this is called the physical order). Hermetic Sitemap Generator allows you to order the items in three ways:
- Physical order.
- File type. In this case all doc files (if any) will be grouped together at the top, with all htm files and then all html files appearing later, and all xls files (if any) at the end.
- Date of last modification of the file (with later files appearing earlier in the Sitemap).
- Item format
An item corresponds to a file. There are two item formats available:
- Multi-line, for example:
<url> <loc>http://www.yoursite.com/contents.htm</loc> <lastmod>2011-02-19T00:15:33-06:00</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url>
- Single line, the same, but all on one line:<url><loc>http://www.yoursite.com/contents.htm</loc><lastmod>2011-02-19T00:15:33-06:00</lastmod>...
Search engines don't care which format you use. The difference is simply that a Sitemap file with a multi-line format is easier to inspect visually.
Building and Inspecting the XML Sitemap
After you have set up the operation as described above, building the Sitemap is simply a matter of clicking on the 'Build XML Sitemap' button. The process is quick — a Sitemap with over 1000 items takes less than twenty seconds to build.
After the Sitemap has been built you may be able to view it simply by clicking on the 'View' button. This works only if the xml file extension is associated with some program (in which case clicking on the file name in Windows Explorer will cause that program to run and open the file).
You can view the Sitemap file using any text editor. Sitemap files must use UTF-8 character encoding, and such files begin with three hexadecimal bytes: EF, BB and BF. These are not displayed by text editors (such as Windows Notepad) which can handle UTF-8 files, but in non-UTF-8-compatible text editors these three bytes will appear as “ï»¿”.
The generated Sitemap can usually be uploaded to your website without change. If you edit a Sitemap file you must use a UTF-8-compatible text editor (such as Notepad), which will save the file with the required initial three hexadecimal bytes.
Uploading and Validating the XML Sitemap
The Sitemap file can be uploaded to your web server via FTP like any other file. It should be placed in the root directory, and you should have a robots.txt file in the root directory containing a line such as:
so that search engines visiting your site know where to find the Sitemap.
After uploading the Sitemap you should (if you have any doubts) validate it (that is, ascertain that there are no XML errors present). Here are two web pages which provide this validation:
Informing Search Engines
There are three ways of drawing the attention of search engines to the existence of the Sitemap:
- In a robots.txt file in the root directory on your website include a line such as this, directing a visiting search engine to your Sitemap:
- Some search engines (including Google and Yahoo) allow you to "ping" them with the URL of your Sitemap. This is done by using your web browser to make a request of the form
- Some search engines provide web pages which allow you to submit Sitemaps (see www.google.com/webmasters).
Top of page Hermetic Sitemap Generator Introduction Hermetic Systems Home Page