Most caching proxy servers cache a file only after a user requests it. Caching Proxy has a cache agent that provides automatic cache preloading. You can specify that the cache agent automatically retrieves specified URLs, the most popular URLs, or both, and places them in the cache before they are requested.
In some cases, you need to set the host name of the proxy server and identify the cache access log before the cache is preloaded. To configure the cache agent, in the Configuration and Administration forms, select Cache Configuration and use the Cache Preload and Cache Refresh forms. Note that files representing query results (that is, files whose URLs include the question mark character (?) are cached only if query caching is enabled).
Automatic cache refreshing and preloading provides the following advantages:
Disadvantages include the following:
For optimal efficiency, set the cache agent to run when server activity is low and before the server becomes busy with client requests. Then the files are ready in the cache to provide fast service the first time a user requests them. By default, the cache agent is started every night at 3 a.m. local time.
Special considerations for reverse proxy configurations:
For security reasons, when you use a reverse proxy configuration, the Proxy http:* rule should be disabled, by default. (That is, this rule is commented in the ibmproxy.conf file.) However, if the rule is disabled, the cache agent is prevented from successfully sending requests and refreshing the cache content of Caching Proxy. A "403 Forbidden By Rule Error" in the error log results and refreshing the cache does not complete.
To avoid this problem, use cacheAgentService, which is an internal service provided by Caching Proxy. To enable the service, put the following Service directive before any other mapping rules in the ibmproxy.conf file:
Service /any-valid-string* INTERNAL:cacheAgentService
The variable any-valid-string is any string that is valid and that does not conflict with other mapping rules in the ibmproxy.conf file.
Both Caching Proxy and cache agent parse the URI based on this service directive. Instead of sending the URI directly to Caching Proxy, the cache agent utility prepends the URI with the /any-valid-string pattern in the service directive.
For example, the cache agent transforms the following URI:
http://www.ibm.com/
to
/any-valid-string/http://www.ibm.com/
The cache agent sends the URI with the prefix to Caching Proxy. When Caching Proxy receives the request, it removes the prefix /any-valid-string/. If the remaining URI is a fully qualified unit, Caching Proxy directly serves the request without mapping the URI against other rules.
Additionally, the cache agent can send a relative URI to Caching Proxy. For example, if you add LoadURL /abc/ using the previously referenced service directive in the ibmproxy.conf file, the cache agent transforms it into /any-valid-string/abc/ and sends it to Caching Proxy. Caching Proxy receives the URL, removes the prefix, maps /abc/ against other mapping rules, and handles the request if there is a match.
For information on the Service directive, see Service -- Customize the Service step.
On Linux and UNIX platforms, specify the host name of the proxy server whose cache is being preloaded or refreshed. On Windows platforms, specify the host name only if the proxy server being refreshed is not on the local machine (Note that refreshing a remote server's cache based on its most frequently accessed files is not possible because the local cache agent does not have access to a remote server's cache access log.)
To set the host name of the proxy server, in the Configuration and Administration forms, select Cache Configuration -> Cache Refresh: Identify cache destination server.
To preload the cache with the content stored at specific URLs, in the Configuration and Administration forms, use Cache Configuration -> Cache Preload. In this form, you can specify URLs for the cache agent to load. The proxy retrieves those pages when the cache agent starts, regardless of whether they were in the cache previously (These URLs are specified in the proxy configuration file by the LoadURL directive). This form can also be used to define URLs whose content is never cached. Access to a cache access log is not required for this type of cache preloading.
Use the Cache Preload form to configure the following options:
To preload the most frequently accessed pages automatically, use the Cache Configuration -> Cache Refresh form. This function requires a Cache Access Log for the proxy server. (The log's location and name can be changed; refer to Monitoring Caching Proxy for information.) The most popular URLs are determined automatically from the Cache Access Log. The administrator can also specify the number of popular pages to preload in the cache. (This number is specified in the proxy configuration file by the LoadTopCached directive.)
Use the Cache Refresh form to configure the following options:
Delving is an optional part of the automatic cache refresh feature. Most Web pages have links to other pages with related information, and users often follow the path linking from one page to another and from one site to another. Delving is a way to cache these logical information paths. In delving, the cache agent follows a specified level of hypertext (HTML) links on the pages it is loading, and also caches all of those linked pages. The linked pages can reside on the same host as the source page or on other hosts. An illustration is shown in Figure 1.
To control the delving process, the administrator specifies to the cache agent a maximum number of URLs that it can load (the default setting is 2000), a maximum length of time it can run (the default setting is two hours), and a maximum number of threads it can use (the default setting is four). The administrator can also configure additional controls. By default, delving is enabled for two levels of hierarchy and is not allowed across hosts. Additionally, a delay is inserted between requests. To change these settings, see Related proxy configuration file directives.
The cache agent loads and then refreshes the cache in this order:
Note that the cache agent does not check whether the maximum number of pages has been reached until it starts delving across links. If the value for the maximum number of pages (called MaxURLs in the proxy configuration file) is lower than the number of pages retrieved in steps 1 and 2, no linked pages are retrieved.
The following examples show how the cache agent handles cache refresh priorities and delving, relative to the maximum number of URLs that are specified (assume that delving is configured for all of these examples).
Configuration file setting | Result |
---|---|
LoadURL http://www.getthis.com/main.html LoadURL http://www.getmetoo.com/welcome.htm LoadTopCached 30 MaxURLs 50 |
If the Cache Access Log has more than 30 unique URLs, the cache agent retrieves main.html, welcome.htm, and the top 30 requested URLs based on the cache access log. Because it has not reached the MaxURLs value, it retrieves and loads up to 18 linked URLs from pages already cached. |
LoadURL http://ww.joesmith.edu/favorites.html LoadURL http://www.janesmith.edu/dislikes.html LoadTopCached 30 MaxURLs 25 |
If the cache access log has more than 30 unique URLs, the cache agent retrieves favorites.html, dislikes.html, and the top 30 requested URLs from the cache access log. No other files are retrieved because the value in MaxURLs has been exceeded. |
LoadURL http://www.hello.com/hi.htm LoadURL http://www.ballyhoo.com/index.html LoadTopCached 20 MaxURLs 25 |
If the cache access log has more than 20 unique URLs, the cache agent retrieves hi.htm, index.html, the top 20 requested URLs from the cache access log, and up to 3 linked URLs from the earlier pages. No other files are retrieved because the value in MaxURLs has been reached. |
The cache agent can also be configured by directly editing the appropriate directives in the proxy configuration file. For proxy configuration file directives relating to the cache agent, see the following reference pages in Appendix B. Configuration file directives:
If automatic cache refreshing is enabled, the cache agent automatically runs a refresh operation at the specified time. However, you also can run the cache agent at any time from a command line.
The executable file is as follows:
Where server_root is the drive and directory where you installed Caching Proxy (for example, C:\Program Files\IBM\edge\cachingproxy\cp).
On Linux and UNIX platforms, you can automatically run the cache agent at various times by using the cron daemon. Jobs controlled by cron are specified by adding a line to the system crontab file. An example entry of the command file on Linux and UNIX is:
45 16 * * * /usr/sbin/cacheagt
This command example starts the cache agent every day at 4:45 p.m. local time. You can use multiple entries to run the cache agent more than once, if desired. For more information, see your operating system's documentation about the cron daemon.
When using a cron daemon to run the cache agent, remember to turn off the automatic refresh option, either by using the Cache Configuration -> Cache Refresh configuration form or by editing the proxy configuration file. Otherwise, the cache agent runs more than once each day.