-restart
|
Details Used after a halt for
resubmitting an unfinished indexing job, specifies that the persistent
store for the specified collection be read and only those candidate
URLs that were in the queue but not yet processed are parsed.
Candidate URLs for -restart correspond to URLs of the
following status as reported by vsdb:
cand, used, inse, upda, dele, fail
For more information about vsdb, see "Verity
Spider Reporting" in Chapter 3.
During an indexing job that does not use -nostorage, the
persistent store is updated after each URL is extracted from a page. If
a halt occurs during a parse cycle, the parsed document and any parsed
URLs will be in the candidate queue. When you run with -restart,
the original document will be reparsed and URLs will be checked for
duplication.
Warning! This option is not available for jobs where -nostorage
is used as there is no state recorded in the persistent store.
Note If you use -restart, you cannot use -start,
-refresh, -resync or -nostorage,
and for web crawling you must use at least one of -host,
-domain, -nofollow or -unlimited.
For filesystem indexing, the default behavior is to limit to the host.
In addition, you must specify a collection from which to read the
persistent store.
|