Working with Proxy Servers
The Verity Spider options -proxy, -proxyauth allow you to index documents on web sites that must be accessed through a proxy server. The -noproxy option allows you to specify hosts which must be accessed directly. This option is used to provide exceptions to the proxy-access rule enacted by the use of -proxy. In order for such documents to be accessed by Information Server to be viewed by searching users, corresponding functionality must exist in Information Server. The following functionality is enabled for Information Server in the Information Server administration pages and in environment variables.
NOTE: The method and syntax for specifying CGI environment variables will vary depending on the web server with which Information Server is running. See the documentation for your web server for instructions on specifying CGI environment variables.
Specifying a Proxy Server
If you use the Verity Spider option -proxy to index sites only accessible through a proxy server, you must also specify the proxy server in Information Server so that documents can be retrieved for viewing.
To specify a proxy server in Information Server, do the following:
- 1. Open the Information Server administration page in your web browser.
- 2. Click the Server Setup icon on the Information Server main menu bar.
- 3. Click Network Options on the Server Setup menu.
- 4. Under HTTP Proxy, specify the host name and port number, if applicable, for the proxy server.
Authenticating Proxy Servers
If the proxy server you are indexing web sites through also requires authentication, then you must specify all of the required information in a CGI environment variable, HTTP_PROXY. Since the CGI environment variable overrides the Information Server proxy information (in Network Defaults), you should use only HTTP_PROXY, and not specify anything in Information Server. For more information, see "Specifying Authentication Information."
Specifying Authentication Information
If you use the Verity Spider option -proxyauth to index web sites through a proxy server that requires authentication, Information Server must also use the authentication information to retrieve the documents for viewing.
In order to use the username and password you provided for -proxyauth, Information Server can read the necessary information from a CGI environment variable. This variable is:
- HTTP_PROXY
and must be set according to the instructions provided by your web server. The syntax for this variable is:
- host:port:username:password
If you omit any item, you must still include the separator colon ":", for example:
- host::username:password
NOTE: Since the CGI environment variable overrides the Information Server proxy information (in Network Defaults), you should use only HTTP_PROXY, and not specify anything in Information Server.
Specifying Hosts for Direct Access
The Verity Spider option -noproxy allows you to specify a list of hosts for which only direct access should be used. When retrieving documents for viewing, Information Server can be configured to first try proxy server access and failing that, to then try direct access. For more information, see "Controlling Access to a Proxy Server" in Chapter 11 of the Information Server User's Guide. However, a timeout must first occur before the second access method is used. By providing a list of direct-access hosts, you can avoid waiting for the timeout.
In order to provide Information Server with the same list of direct-access hosts for viewing as the one used by Verity Spider for indexing, you must specify the information in a CGI environment variable. This variable is:
- HTTP_NOPROXY
and must be set according to the instructions provided by your web server. The syntax for HTTP_NOPROXY is:
- host,host
where the list of hosts is separated by commas and must not contain any spaces.
Copyright © 1998, Verity, Inc. All rights
reserved.