Prefix Mapping


In order to retrieve documents that were indexed as filesystem paths, Information Server needs to have Web-based URLs that correspond to those file system paths created by the Verity Spider. Furthermore, these Web-based URLs must also be specified as aliases in your Web server. There are two options for Verity Spider which aid in creating the necessary Web-based URLs.

Verity Spider Indexing Options

Option
Description
-prefixmap file
Specifies a control file which contains filesystem-to-URL mapping information. See "Using the Control File" below for more information on the content of this file.
-abspath
Forces the Verity Spider to use the absolute path to file system documents when indexing. This option is necessary to generate document paths that can be understood by the Web server, which would otherwise try to reconcile Verity Spider's generated relative document paths using the Web server's document root path as the starting point.

Using the Control File

The control file, which is specified by -prefixmap, is a text file which contains the necessary information for mapping a file system URL to a Web-based URL. A control file consists of the following columns:

Item
Description
SourceField
Field from which values for SourcePrefix will be read.
SourcePrefix
Specifies the file system path prefix indexed by the Verity Spider. If the path includes a trailing slash, then DestPrefix must also include a trailing slash. If the path includes spaces, then enclose it in double quotes. For example:

SourcePrefix
"C:\My Documents\Files"

DestField
Field where the value modified by DestPrefix will be stored. Generally, you will be using the URL field for use with $$doc.URL in view templates.
DestPrefix
Specifies the Web server alias used by Information Server to retrieve documents from a Web server for viewing. The alias must be created manually in the Web server, before users are allowed to view documents.
Flag
Maps backslashes to forward slashes when specified as a slash "/". This can be useful with indexed Windows NT file systems, where a SourcePrefix path ends with a trailing backslash, and there are any subdirectories.

Prefix Mapping Examples

Example 1

In this example, the control file ctrlhtml.txt maps a user's Web documents, indexed by way of file system, to a Web server alias for the user. It is assumed that /search/html is aliased to /~search, with web as the Web server document root, for the Web server. The command to run the Verity Spider would look like the following:


% vspider -collection /users/colls/pbhtml -start
/search/html -abspath -prefixmap
/verity/ctrlhtml.txt

Note that each record in the control file must be on a single line. The contents of ctrlhtml.txt would look like:

     # For each document, all lines are considered in order.
     # 
     # SourceField  SourcePrefix  DestField  DestPrefix
     VdkVgwKey      /search/html  URL        http://web/~search

With this mapping, using the SEARCHScript variable $$doc.URL in a view template for Information Server will correctly retrieve a document from /search/html by using the Web server alias http://web/~search.

Example 2

In this example, the control file ctrldoc.txt maps Microsoft Word documents, indexed by way of file system, to a Web server alias. It is assumed that d:\docs\isrel is aliased to /reldocs, with web/ as the Web server document root, for the Web server. Furthermore, the Web server must be configured with a mime-type for Microsoft Word documents. The command to run the Verity Spider would look like the following:

c:\>vspider -collection d:\colls\docs -start
d:\docs\isrel\ -abspath -prefixmap
d:\verity\ctrldoc.txt

Note that each record in the control file must be on a single line. The contents of ctrldoc.txt would look like:

     # For each document, all lines are considered in order.
     #
     #SourceField  SourcePrefix    DestField  DestPrefix           Flag
     VdkVgwKey     d:\docs\isrel\  URL        http://web/reldocs/   /

Note that since SourcePrefix contains a trailing slash, DestPrefix does, too. Also, Flag is specified here because there may be subdirectories beneath isrel which need to have backslashes translated to forward slashes to be served properly by the Web server. With this mapping, using the SEARCHScript variable $$doc.URL in a view template for Information Server will correctly retrieve a document from d:\docs\isrel\ by using the Web server alias http://web/reldocs/.

Example 3

In this example, PowerPoint slides have been converted to HTML. The conversion to HTML created two files for each slide; a text only tsld*.htm, and a text and graphics sld*.htm, where "*" is an incrementing number. We want to index the text only files, and then use the text and graphics files for links in a view template.

The control file, ppview.txt, will substitute a document file path, and then map the new path to a Web server alias. It is assumed that c:\dev\slides is aliased to /~dev/slides, with web/ as the Web server document root, for the Web server. The command to run the Verity Spider would look like the following:

c:\>vspider -collection d:\colls\docs -start
c:\dev\slides -abspath -prefixmap
d:\verity\ppview.txt

Note that each record in the control file must be on a single line. The contents of ppview.txt would look like:

# For each document, all lines are considered in order.
#
# SourceField  SourcePrefix    DestField  DestPrefix              Flag
VdkVgwKey      \dev\slides\t   URL        /dev/slides/             /
VdkVgwKey      c:\dev\slides\  URL        http://web/~dev/slides/  /

The first record in ppview.txt performs the path substitution required to have the text and graphics files, sld*.htm, used as links in a view template instead of the text only files, tsld*.htm. The rule of equal trailing slashes for SourcePrefix and DestPrefix can be disobeyed here because the DestPrefix is not a Web URL. The Web URL is mapped in the second record.

With ppview.txt, using $$doc.URL in a view template will correctly retrieve the text and graphics document from c:\dev\slides\ by using the Web server alias http://web/~dev/slides/.





Copyright © 1998, Verity, Inc. All rights reserved.