Reference of Command-line Options


The Verity Spider V3.6 options are described below. Note that option names are case-sensitive.

Initialization Options

Option
Description
-start
Details The starting point for the indexing job. When you execute an indexing job from a command-line and you do not use a command file (with -cmdfile), you must URL-escape any special characters in the starting point. To URL-escape a special character, use "%hex-ASCII-character-number" in place of the character. For example, you would use /time%26/ instead of /time&/. This allows the operating system to properly process the command string. For web crawling, this also ensures that a web server will be able to process the string.

Web Crawling (web site indexing) - The starting URL or URLs for the Verity Spider to follow. URLs outside of the starting point will not be followed unless you specify -unlimited or -jumps.

Directory Walking (filesystem indexing) - The starting directory or directories in which the Verity Spider will start indexing. All subdirectories beneath the starting point will be indexed unless you specify values for -pathlen, -prunedir, or any of the inclusion or exclusion criteria.

Note
If you use -start, you cannot use -restart or -refresh.

-refresh
Details Used for updating a collection, checks the last modified date of all documents in the collection with a status of "done" as reported by the collection reporting utility, vsdb. These documents are retreived again and those with a newer modification date are reparsed and reindexed. It is possible that new documents will be found and indexed.

For further control over which documents are checked, see -refreshtime.

Keep in mind that in a refresh, only "done" documents are processed. If you need to process documents in other states, such as "fail," then you must use -restart.

Note
If you use -refresh, you cannot use -start, -restart, and you must use at least one of -host, -domain, -nofollow or -unlimited.

-restart
Details Used after a halt for resubmitting an unfinished indexing job, specifies that the persistent store for the specified collection be read and only those candidate URLs that were in the queue but not yet processed are parsed. Candidate URLs for -restart correspond to URLs of the following status as reported by vsdb:

cand, used, inse, upda, dele, fail

For more information about vsdb, see
"Verity Spider Reporting" in Chapter 3.

During an indexing job that does not use -nostorage, the persistent store is updated after each URL is extracted from a page. If a halt occurs during a parse cycle, the parsed document and any parsed URLs will be in the candidate queue. When you run with -restart, the original document will be reparsed and URLs will be checked for duplication.

Warning!
This option is not available for jobs where -nostorage is used as there is no state recorded in the persistent store.

Note
If you use -restart, you cannot use -start, -refresh, -resync or -nostorage, and for web crawling you must use at least one of -host, -domain, -nofollow or -unlimited. For filesystem indexing, the default behavior is to limit to the host. In addition, you must specify a collection from which to read the persistent store.
-resync
Details Creates a persistent store for a collection. Since all information is taken from the collection, this should be fairly quick (no URLs are actually accessed). If the persistent store exists, it is wiped and rebuilt. This can be very useful for imported collections and for collections which have been updated with mkvdk. It is a good idea to run a job with -resync after you restart so the persistent store is updated.

Note
You can specify a collection using -collection, but you cannot use any other options.

Core Options

Option
Description
-collection path
Details The full path to the collection you want to create or update.

Warning!
You will receive an error if you specify a filename with an extension of .clm. Support for meta collections was disontinued as of Verity Spider V3.5. The Verity Spider V3.6 supports the creation and maintenance of collections built with the universal filter.

-cmdfile path
Details Specifies that Verity Spider reads command-line syntax from a file in addition to the options passed in the command-line. This option includes the path name to the file containing the command-line syntax. The -cmdfile option circumvents command-line length limits.

Note
It is highly recommended you take advantage of the abstraction offered by this option. User error in erroneously including or omitting options in subsequent indexing jobs can be greatly reduced.

-help
Displays Verity Spider syntax options.

Processing Options

Option
Description
-abspath
Type Filesystem only

Details
Forces the Verity Spider to use the absolute path to filesystem documents when indexing. This option is necessary to generate document paths that can be understood by the Web server, which would otherwise try to reconcile the Verity Spider's generated relative document paths using the Web server's document root path as the starting point.

Typically, you will use -abspath in conjunction with -prefixmap where you would want a canonical file path to be used in substituting an URL prefix. For example, you would want your indexed document paths to be "C:\docs\documents," so that a simple substitution in the -prefixmap file would map them to an alias such as /mydocs which you create in your web server.

See also
-prefixmap

-indexers num_indexers
Details Specifies the maximum number of indexing threads to run on a collection.

The default value is 2. Note that increasing the value for -indexers requires additional CPU and memory resources. You may want to experiment to determine the optimum value for your system.

See also
-maxindemem

-license
Details Specifies the license file to use. By default, the file named ind.lic is used, and it is stored in the directory:

installdir/platform/admin/

where installdir represents the product installation directory, and platform represents the platform directory.

-maxindmem kilobytes
Details Specifies the maximum amount of memory, in kilobytes, used by each indexing thread. The number of threads is specified with -indexers.

By default, each indexing thread uses as much memory as is available from the system.

-noflock
Details Disables file locking in the collection while indexing to improve performance. You should only use this option when you are absolutely sure yours is the only indexing process operating on the collection.

Warning!
You cannot use this option when other processes are also indexing into the collection. Doing so will likely cause data corruption in the collection.

-noindex
Details Specifies that the Verity Spider collects URLs while not indexing them. This option is typically used in conjunction with mkvdk -persist or collection servicers (collsvc).

Warning!
When you execute an indexing job for a collection and you use -noindex, the persistent store for the collection is not updated. You will not be able to use -refresh on such a collection, where -noindex has been used or you indexed using other tools, until you perform a synchronization using -resync .

See also
For more information on mkvdk, see the Verity Collection Building Guide. For more information on collection servicers, see
"Collection Servicers" in Chapter 3 of this guide.
-nostorage
Details Disables the persistent store for the current indexing job, forcing system memory to hold all state information for the indexing job. You should only use this option for small jobs of 10,000 documents or less.

Warning! This option is being maintained for use at the discretion of Verity Technical Support only. The use of this option may cause indexing problems.
-prefixmap file
Type Filesystem only

Details
Specifies a control file (simple ASCII text) that maps filesystem paths to Web aliases. In conjunction with -abspath, typically used to create an URL field that is the Web equivalent of a filesystem path. Filesystem indexing is faster than web crawling over the network. If you use -prefixmap to replace the filesystem path with the web URL, relative hyperlinks in the HTML pages are kept intact when viewed through Information Server.

The format for the control file is:

src_field src_prefix dest_field dest_prefix

For example, to map the filepath,

/usr/pub/docs


to

http://web/~verity

use the following:

vdkvgwkey /usr/pub URL http://web/~verity

Note
You should always also use -abspath, especially on Windows NT, because the Verity Spider format of relative paths to the filesystem directory is not always what is expected.

See Also
"Using the Control File" in Chapter 3 of this user's guide
-style path
Details Specifies the path to the style files to use when creating a new collection. Specifying -style is optional as the spider style files are always automatically picked up from:

installdir
/common/style

where installdir represents the product installation directory.

Note
If you specify -style when submitting a job, you can safely omit it when resubmitting an indexing job as the style information will already be part of the collection. If you are using -cmdfile, you can leave it there.

-submitsize num_documents
Details Specifies the number of documents submitted for indexing at one time.

The default value is 128. The upper limit is 64,000.

Note
Although larger values mean more efficient processing by the indexer, smaller values will allow more parallelism on multi-CPU systems. Furthermore, in the event of a halt during indexing, a smaller value means fewer documents will be lost.

If a halt occurs during indexing, the chunk of documents specified by -submitsize is lost because there is no transactional rollback for indexing and the documents are no longer in the queue for indexing. Remember that -restart can only continue with URLs and documents which are enqueued.

-summary path
Details Specifies that a summary of spidering work be written to a file. Remember to include the target filename in the path.
-temp path
Details Specifies the directory for temporary files (disk cache). If you do not specify a value for this option, Verity Spider will first try to use your system's environment variable setting. If an environment variable cannot be determined, /tmp will be used for UNIX, and C:\tmp will be used for Windows NT.

Note
Make sure the location you specify contains enough disk space to handle the documents which are downloaded and held before indexing. The documents are deleted from the harddisk after they are indexed. For more information

Networking Options

Option
Description
-connections num_connections
Details Specifies the maximum number of simultaneous socket connections. Each connection implies a separate thread.

The default value is 4.

Note
Increasing this value may not help because of such dependencies as your network connection and the abilities of remote hosts.

-delay time
Type Web crawling only

Details
Specifies the minimum time between HTTP requests in milliseconds.

There is no default value.

-header string
Type Web crawling only

Details
Specifies an HTTP header to be added to the spidering request. For example:

-header "Referer: http://www.verity.com/"

Verity Spider sends some predefined headers, such as Accept and User-Agent among others, by default. Special headers are sometimes necessary to correctly index a site.

For example, previous versions of Verity Spider did not support the "Host" header, which is needed for Virtual Host indexing. Also, a "Proxy-authentication" header was needed to pass a username and password to a proxy server.

In Verity Spider 3.6, the "Host" header is supported by default, and the -proxyauth option is available for proxy server authentication. Therefore the -header option is maintained only for backwards compatibility and possible future enhancements.

Note
Misuse of this option will cause spider failure. In the event that this happens, use the -restart option with modified -header values.

-hostcache num_hostnames
Details Specifies the number of hostnames to cache to avoid DNS lookups. Without this option, the host cache will continue to grow.

The default value is 256.

-noproxy host_1 ...host_n
Type Web crawling only

Details
Used in conjunction with -proxy, -noproxy specifies that the Verity Spider directly access the hosts whose names match those specified. You can use regular expressions. By default, when -proxy is specified, the Verity Spider first tries to access every host with the proxy information. To improve performance, use -noproxy for those hosts you know can be accessed without a proxy host. For the host variable, you can include the wildcards * and ? in your value, such as:

'*.verity.com'

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

Note
You must have the appropriate Verity Spider licensing capability to use this option.

-proxy proxyhost:port
Type Web crawling only

Details
Specifies host and port for proxy server.

Note
You must have the appropriate Verity Spider licensing capability to use this option.

See Also
-proxyauth for proxy servers that require authentication, and -noproxy for hosts which you know are accessible without having to go through a proxy server.

-proxyauth login:password
Type Web crawling only

Details
Specifies login information for proxy server connections that require authorization to get outside the firewall. Used in conjunction with -proxy.

Note
You must have the appropriate Verity Spider licensing capability to use this option.

-retry num_retries
Details Specifies the number of times the Verity Spider should attempt to access an URL. You should use -retry when it is likely that an unstable network connection will give false rejections.

The default value is 4.

-timeout integer
Details Specifies the time period, in seconds, that the Verity Spider should wait before timing out on a network connection and on accessing data. The data access value is automatically twice the value you specify for the network connection timeout.

The default value for the network connection timeout is 30 seconds, and therefore the value for the data access timeout is 60 seconds .

Paths and URLs Options

Option
Description
-auth file
Details Specifies an authorization file to support authentication for secure paths.

Note
There must be a corresponding "Authfile=" entry in the Information Server configuration file, inetsrch.ini, so that documents can be accessed for viewing. Both -auth and Authfile= must point to the same file. For more information, see
"Authenticating Secure Paths" in Chapter 3 of this guide.
-cgiok
Type Web crawling only

Details
Allows indexing of URLs containing the ? symbol. This typically means the URL leads to a CGI or other such processing program.

The return document produced by the web server is indexed and parsed for document links which are followed and in turn indexed and parsed. However, if the web server does not return a page, perhaps because the URL is missing parameters which are required for processing in order to produce a page, then nothing happens. There is no page to index and parse.

Example
An example of a blank URL without parameters is: http://server.com/cgi-bin/program?

If you include parameters in the URL you are indexing, as specified with the -start option, then those parameters are processed and any resulting pages are indexed and parsed.

By default, URLs with ? symbols are skipped.

-domain name_1 [name_n] [...]
Type Web crawling only

Details
Limits indexing to the specified domain(s). You can use regular expressions. URLs not in the specified domain(s) will not be downloaded or parsed.

You may list multiple domains by separating each one with a single space.

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

Note
You must have the appropriate Verity Spider licensing capability to use this option.

-host host_1 [host_n] [...]
Type Web crawling only

Details
Limits indexing to the specified host or hosts. You can use regular expressions. Note that you may list multiple hosts by separating each one with a single space. URLs not on the specified host(s) will not be downloaded or parsed.

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

-jumps num_jumps
Type Web crawling only

Details
Specifies the maximum number of levels deep an indexing job can go from the starting URL. Specify a number between 0 and 254.

The default value is unlimited. If you continue to see extremely large numbers of documents in a collection where you do not expect them, you should consider experimenting with this option, in conjunction with the Content options, to pare down your collection.

-nofollow
Type Web crawling only

Details
Disables the following of HTML hyperlinks (hrefs). You would typically use this option to test indexing of a site by trying a single page.

-norobo
Type Web crawling only

Details
Specifies that any robots.txt files encountered are ignored. The default is to honor any robots.txt files.

The robots.txt file is used on many web sites to specify what parts of the site indexers should avoid. The default behavior is to honor all robots.txt files. In addition, if you are re-indexing a site and robots.txt has changed, the indexer will delete documents that have been newly disallowed by robots.txt.

You can use the -norobo option to specify that the indexer ignore any robots.txt files it encounters. This option should, of course, be used with discretion.

See Also
http://info.webcrawler.com/mak/projects/robots/norobots.html
-pathlen num_paths
Type Web crawling only

Details
Limits indexing to the specified number of path segments in the URL.

The path length is determined as follows:
a) The Web host name is not included. For example, www.spider.com:80/ would not be included in determining the path length.
b) All elements following the host name are included.
c) The actual file name, if present in the URL, is included. For example, /world.html would be included in determining the path length.

Any directory paths between the Web host and the actual file name are included.

Example
For example, for the following URL, the path length would be 4:

http://www.spider:80/comics/fun/funny/world.html
                      <-1-> <2> <-3-> <--4-->

The default value is 100 path segments.

-prunedir dir_1 ... dir_n
Type Filesystem only

Details
Specifies that the spider will skip filesystem directories whose names match those specified. You can use regular expressions. For the dir variable, you can include the wildcards * and ? in your value, such as:

'/my_doc*/'

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

-refreshtime timeunits
Details Specifies that any documents which have been indexed since the timeunits value are not to be refreshed..

The syntax for timeunits is: n day n hour n min n sec, where n is a positive integer. Note that there must be spaces, and since the first three letters of each time unit is parsed, you can use the singular or plural form.

Example
For example, if you specify:

-refreshtime 1 day 6 hours

then only those documents which were last indexed at least 30 hours and 1 second ago, will be refreshed.

Note
This option is valid only with the -refresh option. When you perform a resync with -resync, the last indexed date is cleared.

-reparse
Type Web crawling only

Details
Forces parsing of all HTML documents already in the collection. This option can be used with any of the
Initialization Options.

You can use -reparse when you know there are HTML documents which are updated often with new links, or when you want to include paths and documents which were previously skipped due to any of the exclusion options. In the latter case, remember to remove the exclusion criteria, else there will be little for the Verity Spider to do. This can be easy to overlook when you are using -cmdfile.
-unlimited
Details Specifies no limits to be placed on Verity Spider, if neither -host, nor -domain is specified. The default is to limit based on the host of the first URL listed. Also removes the 1 megabyte -maxdocsize default (1024 kilobytes, or 1 megabyte) if -maxdocsize is not specified.
-virtualhost host_1 [host_n] [...]
Details Specifies that DNS lookups are avoided for the hosts listed. This allows you to index by alias, such as when multiple web servers are running on the same host.

Normally, when Verity Spider resolves host names, it uses DNS lookups to convert the names to canonical names, of which there can be only one per machine. This allows for the detection of duplicate documents, to prevent results from being diluted. In the case of multiple aliased hosts, however, duplication is not a barrier as documents can be referred to by more than one alias, and yet remain distinct because of the different alias names.

Example
For example, you may have both marketing.verity.com and sales.verity.com running on the same host. Each alias has a different document root, although document names such as index.htm may occur for both. With -virtualhost, both server aliases can be indexed as distinct sites. Without -virtualhost, they would both be resolved to the same host name and only the first document encountered from any duplicate pair would be indexed.

Content Options

Option
Description
-casesen
Details Makes processing case-sensitive by specifying that the spider process separately keys that differ only in case. Use only for indexing UNIX servers.
-exclude exp_1 [exp_n] [...]
Details Files and paths in URLs matching the specified expression(s) will not be followed. For document types, use -mimeexclude instead. For example, specify -mimeexclude application/pdf rather than -exclude *.pdf. For the exp variable, you can include the wildcards (*) and (?) in your value, such as:

'/my_doc*/year199?'

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

Note
When specifying an URL, you must use a canonical host name which includes the actual server name (and not an alias), the domain name and the port number, if applicable.

Example

http://alter.verity.com:80

-include exp_1 [exp_n] [...]
Details Includes only files and paths in URLs matching the specified expression or expressions. For document types, use -mimeinclude instead. For example, specify -mimeinclude text/html rather than -include *.htm *.html. For the exp variable, you can include the wildcards * and ? in your value, such as:

'/my_doc*/year199?'

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

Note
When specifying an URL, you must use a canonical host name which includes the actual server name (and not an alias), the domain name and the port number, if applicable.

Example

http://alter.verity.com:80

-indexclude exp_1 [exp_n] [...]
Details Specifies that only those files and paths in URLs which match the expressions be followed but not indexed. For document types, use -indmimeexclude instead. You would use this option to gather some documents, such as HTML tables of contents, to gain access to other documents for indexing. The -exclude option, on the other hand, prevents specified documents from being followed at all. For the exp variable, you can include the wildcards * and ? in your value, such as:

'/my_doc*/year199?'

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

Note
When specifying an URL, you must use a canonical host name which includes the actual server name (and not an alias), the domain name and the port number, if applicable.

Example

http://alter.verity.com:80

-indinclude exp_1 [exp_n] [...]
Details Specifies that only those files and paths in URLs which match the expressions be followed and indexed. For the exp variable, you can include the wildcards * and ? in your value, such as:

'/my_doc*/year199?'

Previously, the -include option would not allow you to index desired documents if the starting URL was not followed.

Example
For example, if you want to index all documents that include "search" in the URL at http://web.verity.com, you cannot use:

vspider -collection collname -start http://web.verity.com -include '*search*'

This is because the starting point does not match the -include criteria. Now, you can use -indinclude to follow all documents (unless, of course you have specified any of the exclude options) and index only those documents that match your criteria. Simply replace -include with -indinclude in the above example.

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

Note
When specifying an URL, you must use a canonical host name which includes the actual server name (and not an alias), the domain name and the port number, if applicable.

Example

http://alter.verity.com:80

-indmimeexclude mime_1 [mime_n] [...]
Details Specifies that only those MIME types which match the expressions be followed but not indexed. Use this option to gather some documents, such as HTML tables of contents, to gain access to other documents for indexing. The -mimeexclude option, on the other hand, prevents specified documents from being followed at all. For the mime variable, you can include the wildcards * and ? in your value, such as:

'text/*'

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

See Also
"Setting MIME Types" later in this chapter.
-indmimeinclude mime_1 [mime_n] [...]
Details Specifies that only those MIME types which match the expressions be followed and indexed. The -mimeinclude option would not allow you to index desired documents if the starting URL is not followed. For the mime variable, you can include the wildcards * and ? in your value, such as:

'text/*'

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

Example
For example, if you want to index all Word documents at http://web.verity.com, you cannot use:

vspider -collection collname -style style_dir -start http://web.verity.com -mimeinclude 'application/msword'

This is because the starting point does not match the -mimeinclude criteria. Now, you can use -indmimeinclude to follow all documents (unless, of course you have specified any of the exclude options) and index only those documents that match your criteria. Simply replace -mimeinclude with -indmimeinclude in the above example.

See Also
"Setting MIME Types" later in this chapter.
-maxdocsize integer
Details Specifies the maximum size, in kilobytes, for documents to be indexed. Any documents larger than the value specified by maxdocsize will be ignored. If you use the -unlimited option, and do not specify a value for maxdocsize, then the default value of 1024 kilobytes does not apply and all documents of all sizes will be indexed.

The default value is 1024 kilobytes (1 megabyte).

-metafile filename
Type Web crawling only

Details
Allows you to use a text file to map custom meta tags to valid HTTP header fields. This means you are able to use your own meta tag, in the document, to replace what is returned by the web server, or to insert it if nothing is returned. Currently, the only header fields of real value are "Last-Modified" and "Content-Length." Note, however, that future enhancements could allow for much greater variety.

The syntax for entries in the text file is:

name
Last-Modified y|n
or
name Content-Length y|n

where y|n is an override flag which can be either yes or no.

Example
An example mapping file for -metafile might include:
Doc_Last_Touched Last-Modified n
Doc_Size Content-Length y

If you use the y override flag, the value for the custom meta tag overrides the value for the valid field, even if both values are present and differ. This can be useful when the valid field value is always sent, but you want to specify your own value with a custom meta tag.

If you use the n override flag, then the value for the custom meta tag will be used only if there is no value for the valid field returned by the server. If a value for the valid field exists, then that is given precedence.

Warning!
If you have several entries mapping to the same valid field, only the last entry will take effect.

-mimeexclude mime_1 [mime_n] [...]
Details Specifies MIME types which are neither followed nor indexed. The default is to include all MIME types. For the mime variable, you can include the wildcards * and ? in your value, such as:

'text/*'

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

Use -indmimeexclude to allow the Verity Spider to follow documents, without indexing them, to gain access to other desirable document types.

See Also
"Setting MIME Types" later in this chapter.
-mimeinclude mime_1 [mime_n] [...]
Details Specifies MIME types to be included. The default is to include all MIME types. For the mime variable, you can include the wildcards * and ? in your value, such as:

'text/*'

On UNIX, you should include single quotes around the argument to protect the special characters such as (*). On Windows NT, you should use double quotes.

See Also
"Setting MIME Types" later in this chapter.

Locale Options

Option
Description
-charmap name
Details Specifies the character map to use. Valid values are 8859 or 850.

The default value is 8859.

-common
Details Specifies path to the Verity home directory (install_ir/common).

Note
This option is typically not needed, as long as the PATH environment variable is set correctly.

-datefmt format
Details Specifies the Verity import date format to use. Valid values are MDY, DMY, YMD, USA and EUR.

The default value is MDY.

-language name
Details Specifies the Verity locale to use in indexing. This option is being replaced by the semantically consistent -locale, and is still supported for backwards compatibility.
-locale name
Details Specifies the Verity locale to use in indexing, for example German (deutsch) or French (francais). The default is English (english). This option is identical to -language.
-msgdb path
Details Specifies the path to the ind.msg message database file. If the Verity Spider was installed properly, this option should be unnecessary.

Note
By default, the ind.msg message database is read from:

installdir/platform/admin

Logging Options

Option
Description
-verbose
Details Specifies the lowest level verbosity. Displays information for each URL or file accessed. Indicates indexed, ignored, or failed.
-debug
Details Specifies the medium level verbosity. Displays debugging-level messages, and implies -verbose. Displays more output than using -verbose.
-trace
Details Specifies the highest level verbosity. Displays trace-level messages, and implies -debug. Displays more output than using -debug.

Maintenance Options

Option
Description
-nooptimize
Details Prevents the Verity Spider from optimizing the collection, thus reducing processing overhead during the indexing job. You should use this option sparingly, as it leaves the collection in less than optimum shape. Some examples of when you might want to use this option are:

You want to manually perform custom optimization of the collection, using mkvdk. By default the Verity Spider optimization mimics the mkvdk actions of maxmerge and vdbopt. For more information on mkvdk, see the Verity Collection Building Guide.

You are running multiple indexing jobs against a collection, and want to wait until they are all finished to optimize.

Generally, you should not leave a collection unoptimized for too long, as search times can slow significantly.

In brief, optimizing a collection means creating a small number of large partitions, which can greatly reduce search times.

See Also Verity Collection Building Guide.

-purge
Details Deletes document tables and index files in the collection, and cleans up the collection's persistent store. The collection is then "fresh" with its original style files, and is not deleted from the file system.
-repair
Details Specifies a failure-recovery mode for the collection, where the goal is to determine the causes of any errors, repair the errors (if possible), and bring a collection back up.

Although the Verity indexing engine always leaves the collection in a consistent, usable state, and no data can be lost or corrupted due to machine failures, it is possible for a process or event external to the Verity engine to corrupt one or more collections.

You can use -repair for constant failure-recovery operation, or you can run it selectively on collections that are "down."





Copyright © 1998, Verity, Inc. All rights reserved.