Hermetic Sitemap Builder
User Manual
The two sections below are only for HTML sitemaps.The sections below are only for XML Sitemaps.
Here is a sample screenshot:
When this user manual was originally written it treated the generation of XML Sitemaps as the primary function of Hermetic Sitemap Builder and the generation of HTML sitemaps as secondary. (The software now treats them both as of equal interest.) Thus the word "Sitemap" is usually used rather than the word "sitemap". What is said about exclusion of files and folders, and listing and building sitemaps, applies to both types of sitemap except where it is clear that one or the other kind is being discussed.
Specifying the Top Folder It is assumed that:
![]()
- All the files in the online website are contained in a local copy on your hard drive.
- All the files in the local copy are contained in a folder tree (starting with a top folder such as C:\websites\ace_hardware\).
- The structure of the folder tree on your website is mirrored in the structure of the folder tree on your local copy. (There may be subfolders of the top folder on your hard drive which do not correspond to any subfolders on your website.)
- There may be (and probably are) files in the local copy which are not to be included in the Sitemap.
Hermetic Sitemap Builder scans the files in the top folder of your local copy and in all subfolders recursively to build the Sitemap.
Click on the 'Top folder' button to specify the top folder in the usual way:
Specifying the Website URL
The top folder on your local hard drive should correspond to the top folder on your website. When Hermetic Sitemap Builder builds the Sitemap it has to state a URL for each of the files on your website that are included in the Sitemap, so it has to know the URL for the top folder.
This is not the case, however, if you are building an HTML sitemap using relative filepaths, not absolute filepaths.
Included File Types For HTML sitemaps:
Only HTML files (those with file extension htm, html, shtm and shtml)) are included in HTML sitemaps generated by this program.
For XML Sitemaps:
![]()
You can specify the file types to be included in the Sitemap by clicking on the 'Set file types' button, which brings up a panel as at right (click to enlarge), with a list of all Google file types. Only file types which are checked will be included when Hermetic Sitemap Builder generates the Sitemap file.
This panel must be displayed, and the 'Confirm' button clicked, at least once before files can be listed or a Sitemap built. This is to ensure that you are aware of what file types are allowed in the Sitemap (and thus what file types will automatically be excuded).
Clicking on the 'Default' button selects the file types .htm, .html, .doc, .docx, .pdf and .txt.
Exclusion of Top Subfolders
If there are some files (of Google file type) in the folder tree which are not on your website, or which you don't want included in the Sitemap, then before instructing the program to scan the folder tree and build the Sitemap you have to specify which files are to be excluded from the scan.
The first step in doing this is to exclude unwanted top subfolders (these are the immediate subfolders of the top folder). Normally you will wish to scan most, if not all, of the top subfolders. Some of them, however, may contain only files which are not to be included in the Sitemap, so these subfolders should be excluded. To do this click on the 'Exclude top subfolders' button and check those which are to be excluded.
Exclusion of Files and Lower Subfolders The following files are automatically excluded:
- Those HTML files which contain a robots meta tag which includes noindex.
- Those which are not of Google file type (if building an XML Sitemap) or those which are not HTML files (if building an HTML sitemap).
Even after selecting some top subfolders for exclusion there may still be files in the included top subfolders (and in subfolders of these folders) which you don't wish to include in the Sitemap. There are three ways to exclude these files:
- By specifying a minimum file size for inclusion.
- By giving a set of character strings such that any file whose pathname includes one of these strings will be excluded.
- By specifically selecting files for exclusion.
![]()
- By Filesize
Files of less than 1000 bytes usually don't contain anything worth returning in a search, but rather are redirection files and such.
- By Filepath Exclusion
A file is identified by a filepath, which is the name of the file preceded by the name of the folder containing it, preceded by the name of the folder containing that folder, and so on, up to the root folder. For example, if folder websites is in the root folder of Drive C:, and contains a subfolder example which contains a subfolder bak which contains a file file.txt then the filepath for that file is C:\websites\example\bak\file.txt.
Hermetic Sitemap Builder allows you to exclude files and subfolders whose filepaths contain a given character string (or any string in a given list of strings). This may be a substring of a folder name in the filepath, a substring of a file name, or any substring of a filepath (e.g., docs\2008-). This allows you to exclude files with the same (or similar) names in several subfolders, e.g., template.htm. It also allows you to exclude files in subfolders of included top subfolders, e.g., if you have some deeply-nested subfolders named bak which contain backups of files in immediately higher subfolders. When excluding subfolders it is advisable to include a leading and trailing backslash, e.g., \bak\.
These character strings must be separated by commas, may contain spaces and are not case-sensitive. (Remember that in Windows "ABC.TXT" is equivalent to "abc.txt", but this is usually not true for files on your webserver.)
If you have too many character strings to be visible all at once in the textbox then you can click on the 'Expand' button and a window will open displaying them one per line. You can edit this textbox by adding or deleting character strings, and then cancel or confirm. If you confirm then the strings will reappear in alphabetical order in the 'Exclude filepaths' textbox.
This facility may be used in conjunction with the file listing facility (see below). The file list may be inspected for files which you wish to exclude, then some identifying part of the filepaths (such as a subfolder name) may be added to the 'Exclude filepaths' textbox or to the 'Filepath exclusions' textbox.
- Explicitly
If there are still files which you don't want included in the Sitemap, and which have not been excluded by either of the two preceding methods, then they must be excluded specifically. To do this click on the 'Exclude files' button, select a folder, and select the file(s) to be excluded. Only files with file types specified in the 'Set file types' window will be displayed.
In the example at right four files are excluded. To select multiple files first select one file, then hold down the ctrl key and select the other files, then click on the 'Open' button. The files will then be added to the textbox which lists the excluded files (unless the selected file is not in an included folder). The result in this case would be:
Do this for each folder which contains unwanted files which are not automatically excluded and are not excluded by methods (i) or (ii). Select no more than about eight files at one time, otherwise a "Too many files selected" error message will result. The 'Excluded files' textbox is editable. Thus if you wish to remove a file which you have selected for exclusion then you can delete it from the textbox; just be sure that the excluded files are one to a line (blank lines are ignored).
Selecting the Type of Sitemap
After specifying the top folder, the URL of your website, the allowed file types, and after excluding unwanted folders and files, you are almost ready to generate a sitemap. This can be either an XML Sitemap or an HTML sitemap. Clicking on the button displayed at right brings up the window shown below (here shown after the output file has been selected in the usual way, by clicking on the button labelled 'XML Sitemap file'; XML Sitemap files must have extension xml):
If it's an HTML sitemap that you're after then select that option, and the window changes (again after the output file has been selected, and some checkboxes checked; HTML sitemap files must have extension htm or html) to:
Listing Files to be Included
After specifying the top folder and the allowed file types, and after excluding unwanted folders and files, you can get a list of the files which will be included in the Sitemap by clicking on the 'List files' button. Having written the list of files to this file, the program calls whatever is the default program for opening text files (usually Notepad) to display the list.
It is advisable to view the files to be indexed before building the Sitemap. If you see a file that you don't want included then you can exclude it either by means of one of the three methods of file exclusion described above or by cutting and pasting the filepath (without the top folder) into the 'Excluded Files' textbox (but make sure to put this file on a separate line).
For HTML sitemaps only: Only HTML files will be listed.
Folder and File Names with Special Characters are not Included A character in the name of a folder or a file is special if its ASCII value is less than 32 or greater than 127. All letters of the English alphabet (plus numerals, etc.) are non-special. All letters of non-English alphabets which have diacritical marks (e.g., ü, é and ñ) are special characters. At the present time this program does not allow inclusion in the Sitemap of files whose filepaths contain special characters.
When files are listed, and folders or files with special characters are found, a note at the bottom of the listing will inform you of this, as in:
Renaming a file using only non-specal characters will allow it to be included.
The following two sections are only for HTML sitemaps.
Options for the HTML Sitemap For options for the XML Sitemap see below.
At the window for selecting the type of sitemap, after 'Build HTML sitemap' is selected, you may specify:
- Which of the page title, description tag and path to the file are to be included.
- Whether a relative filepath or an absolute filepath (i.e., a URL) is to be used.
- Whether the items in the sitemap are to be indented.
- Whether the sitemap is to be inserted in a pre-written template file.
The page title and the description tag (if any) are read from the file.
If items include a filepath then this may either be absolute (at right, top) or relative (bottom). If the HTML sitemap is placed in the root directory of your website then the relative filepath is sufficient for hotlinking to the web pages.
![]()
Items in an HTML sitemap always include hotlinks to the files. If the title is included then that is a hotlink, otherwise the filepath itself (relative or absolute) is a hotlink (as at right).
If 'Indent items' is checked then the items will be indented to a degree in accordance with the level of the subfolder containing the corresponding file (for example, see this sitemap).
If no 'sitemap file template' is specified, or the corresponding checkbox is unchecked, then the sitemap has a simple page title, as at right. You might prefer to have something more than this (as in the example above) so you can create a 'sitemap file template', which is simply an HTML file with the character string "[SITEMAP]" occurring somewhere in the middle (as, for example, here), and then tell Hermetic Sitemap Builder where to find that file.
The software will then insert the generated sitemap in place of that character string to create the finished sitemap file.
Excluding Further Files from the HTML Sitemap It may happen that you have specified the excluded folders, excluded files and excluded filepaths for your XML Sitemap file, and have built it, but when you come to build the HTML sitemap you find that there are some files (e.g., web pages which are not in English) that, although you want them indexed by the search engines, you wish to exclude from the HTML sitemap.
You could exclude those files as described above, but then you would have to include them again next time you want to build the XML Sitemap. There is a better way. Simply place (between <head> and </head>) the following meta tag in any HTML file that you wish to exclude from the HTML sitemap:
<meta name="smb_html_sitemap" content="exclude">Then when you build the HTML sitemap these files will not be included. This method of exclusion applies only to HTML sitemaps, and does not affect the building of XML Sitemaps.
The remainder of this user manual applies only to XML Sitemaps.
Options for the XML Sitemap A Sitemap file consists mainly of items of the following form, one for each file, which provide for search engines the location of the file, the date (and optionally the time) of the last modification of the file, the frequency with which the file is modified, and its relative importance with respect to other files on the website.
<url> <loc>http://www.yoursite.com/contents.htm</loc> <lastmod>2009-04-29T14:30:22+00:00</lastmod> <changefreq>weekly</changefreq> <priority>0.5</priority> </url>Only the <loc> tag is required, the other three are optional. A "+00:00" after the time means GMT. The priority value can range from zero to 1.0.
Hermetic Sitemap Builder allows you to select from several options for these tags when building the Sitemap, as shown here:
- Last Modified Tag
You can include just the date, e.g., "2009-04-29", or the date plus the time, or you can omit the lastmod tag entirely.
- Change Frequency Tag
You can specify a single value for all files ("daily", "weekly", "monthly", "yearly", "always" or "never"), you can omit the tag, or you can have the program assign a value automatically.
If automatically then the program will select from "daily", "weekly", "monthly" and "yearly" depending on how long it has been since the file was last modfied. You can exclude "daily" or "yearly" by selecting the appropriate option.
- Priority Tag
You can specify the value 0.5 for all files, you can omit the tag, or you can have the program assign a priority automatically. In the last case the program will assign a value between 0.01 and 1.0 depending on how deeply nested is the subfolder containing this file. Files which are lower in the folder tree receive lower priority values.
- Order of Items
Usually Sitemap generators simply write the file items to the Sitemap file in the order in which they are found when traversing the folder tree (this is called the physical order). Hermetic Sitemap Builder allows you to order the items in three ways:
- Physical order.
- File type. In this case all doc files (if any) will be grouped together at the top, with all htm files and then all html files appearing later, and all xls files (if any) at the end.
- Date of last modification of the file (with later files appearing earlier in the Sitemap).
Building and Inspecting the XML Sitemap After you have set up the operation as described above, building the Sitemap is simply a matter of clicking on the 'Build XML Sitemap' button. The process is quick — a Sitemap with over 1000 items takes less than twenty seconds to build.
After the Sitemap has been built you may be able to view it simply by clicking on the 'View' button. This works only if the xml file extension is associated with some program (in which case clicking on the file name in Windows Explorer will cause that program to run and open the file).
You can view the Sitemap file using any text editor. Sitemap files must use UTF-8 character encoding, and such files begin with three hexadecimal bytes: EF, BB and BF. These are not displayed by text editors (such as Windows Notepad) which can handle UTF-8 files, but in non-UTF-8-compatible text editors these three bytes will appear as “”.
The generated Sitemap can usually be uploaded to your website without change. If you edit a Sitemap file you must use a UTF-8-compatible text editor (such as Notepad), which will save the file with the required initial three hexadecimal bytes.
Uploading and Validating the XML Sitemap The Sitemap file can be uploaded to your web server via FTP like any other file. It should be placed in the root directory, and you should have a robots.txt file in the root directory containing a line such as:
Sitemap: http://www.yoursite.com/example_sitemap.xmlso that search engines visiting your site know where to find the Sitemap.
After uploading the Sitemap you should (if you have any doubts) validate it (that is, ascertain that there are no XML errors present). Here are two web pages which provide this validation:
- www.webmasterwebtools.com/sitemap-validation/
- www.smart-it-consulting.com/internet/google/submit-validate-sitemap/
Informing Search Engines There are three ways of drawing the attention of search engines to the existence of the Sitemap:
- In a robots.txt file in the root directory on your website include a line such as this, directing a visiting search engine to your Sitemap:
Sitemap: http://www.yoursite.com/your_sitemap.xml - Some search engines (including Google and Yahoo) allow you to "ping" them with the URL of your Sitemap. This is done by using your web browser to make a request of the form
http://search_engine_URL/ping?sitemap=www.yoursite.com/your_sitemap.xml
- Some search engines provide web pages which allow you to submit Sitemaps (see www.google.com/webmasters).
Top of page Hermetic Sitemap Builder Introduction Hermetic Systems Home Page