Prefix Mapping
In order to retrieve documents that were indexed as
filesystem paths, Information Server needs to have Web-based URLs that
correspond to those file system paths created by the Verity Spider.
Furthermore, these Web-based URLs must also be specified as aliases in
your Web server. There are two options for Verity Spider which aid in
creating the necessary Web-based URLs.
Verity Spider Indexing Options
Using the Control File
The control file, which is specified by -prefixmap,
is a text file which contains the necessary information for mapping a
file system URL to a Web-based URL. A control file consists of the
following columns:
Prefix Mapping Examples
Example 1
In this example, the control file ctrlhtml.txt
maps a user's Web documents, indexed by way of file system, to a Web
server alias for the user. It is assumed that /search/html
is aliased to /~search, with web as the Web
server document root, for the Web server. The command to run the Verity
Spider would look like the following:
- % vspider -collection /users/colls/pbhtml -start
- /search/html -abspath -prefixmap
- /verity/ctrlhtml.txt
Note that each record in the control file must be on a
single line. The contents of ctrlhtml.txt would look like:
# For each document, all lines are considered in order.
#
# SourceField SourcePrefix DestField DestPrefix
VdkVgwKey /search/html URL http://web/~search
With this mapping, using the SEARCHScript variable
$$doc.URL in a view template for Information Server will
correctly retrieve a document from /search/html by using
the Web server alias http://web/~search.
Example 2
In this example, the control file ctrldoc.txt
maps Microsoft Word documents, indexed by way of file system, to a Web
server alias. It is assumed that d:\docs\isrel is aliased
to /reldocs, with web/ as the Web server
document root, for the Web server. Furthermore, the Web server must be
configured with a mime-type for Microsoft Word documents. The command to
run the Verity Spider would look like the following:
- c:\>vspider -collection d:\colls\docs -start
- d:\docs\isrel\ -abspath -prefixmap
- d:\verity\ctrldoc.txt
Note that each record in the control file must be on a
single line. The contents of ctrldoc.txt would look like:
# For each document, all lines are considered in order.
#
#SourceField SourcePrefix DestField DestPrefix Flag
VdkVgwKey d:\docs\isrel\ URL http://web/reldocs/ /
Note that since SourcePrefix contains a trailing
slash, DestPrefix does, too. Also, Flag is specified here because there
may be subdirectories beneath isrel which need to have
backslashes translated to forward slashes to be served properly by the
Web server. With this mapping, using the SEARCHScript variable $$doc.URL
in a view template for Information Server will correctly retrieve a
document from d:\docs\isrel\ by using the Web server alias
http://web/reldocs/.
Example 3
In this example, PowerPoint slides have been converted
to HTML. The conversion to HTML created two files for each slide; a text
only tsld*.htm, and a text and graphics sld*.htm,
where "*" is an incrementing number. We want to index the text
only files, and then use the text and graphics files for links in a view
template.
The control file, ppview.txt, will
substitute a document file path, and then map the new path to a Web
server alias. It is assumed that c:\dev\slides is aliased
to /~dev/slides, with web/ as the Web server
document root, for the Web server. The command to run the Verity Spider
would look like the following:
- c:\>vspider -collection d:\colls\docs -start
- c:\dev\slides -abspath -prefixmap
- d:\verity\ppview.txt
Note that each record in the control file must be on a
single line. The contents of ppview.txt would look like:
# For each document, all lines are considered in order.
#
# SourceField SourcePrefix DestField DestPrefix Flag
VdkVgwKey \dev\slides\t URL /dev/slides/ /
VdkVgwKey c:\dev\slides\ URL http://web/~dev/slides/ /
The first record in ppview.txt performs
the path substitution required to have the text and graphics files,
sld*.htm, used as links in a view template instead of the
text only files, tsld*.htm. The rule of equal trailing
slashes for SourcePrefix and DestPrefix can be disobeyed here because
the DestPrefix is not a Web URL. The Web URL is mapped in the second
record.
With ppview.txt, using $$doc.URL
in a view template will correctly retrieve the text and graphics
document from c:\dev\slides\ by using the Web server alias
http://web/~dev/slides/.
Copyright © 1998, Verity, Inc. All rights reserved.