Delving is an optional part of the automatic cache refresh feature. Most Web pages have links to other pages with related information, and users often follow the path linking from one page to another and from one site to another. Delving is a way to cache these logical information paths. In delving, the cache agent follows a specified level of hypertext (HTML) links on the pages it is loading, and also caches all of those linked pages. The linked pages can reside on the same host as the source page or on other hosts. An illustration is shown in Figure 1.
To control the delving process, the administrator specifies to the cache agent a maximum number of URLs that it can load (the default setting is 2000), a maximum length of time it can run (the default setting is two hours), and a maximum number of threads it can use (the default setting is four). The administrator can also configure additional controls. By default, delving is enabled for two levels of hierarchy and is not allowed across hosts. Additionally, a delay is inserted between requests. To change these settings, see Related proxy configuration file directives.
The cache agent loads and then refreshes the cache in this order:
Note that the cache agent does not check whether the maximum number of pages has been reached until it starts delving across links. If the value for the maximum number of pages (called MaxURLs in the proxy configuration file) is lower than the number of pages retrieved in steps 1 and 2, no linked pages are retrieved.
The following examples show how the cache agent handles cache refresh priorities and delving, relative to the maximum number of URLs that are specified (assume that delving is configured for all of these examples).
Configuration file setting | Result |
---|---|
LoadURL http://www.getthis.com/main.html LoadURL http://www.getmetoo.com/welcome.htm LoadTopCached 30 MaxURLs 50 |
If the Cache Access Log has more than 30 unique URLs, the cache agent retrieves main.html, welcome.htm, and the top 30 requested URLs based on the cache access log. Because it has not reached the MaxURLs value, it retrieves and loads up to 18 linked URLs from pages already cached. |
LoadURL http://ww.joesmith.edu/favorites.html LoadURL http://www.janesmith.edu/dislikes.html LoadTopCached 30 MaxURLs 25 |
If the cache access log has more than 30 unique URLs, the cache agent retrieves favorites.html, dislikes.html, and the top 30 requested URLs from the cache access log. No other files are retrieved because the value in MaxURLs has been exceeded. |
LoadURL http://www.hello.com/hi.htm LoadURL http://www.ballyhoo.com/index.html LoadTopCached 20 MaxURLs 25 |
If the cache access log has more than 20 unique URLs, the cache agent retrieves hi.htm, index.html, the top 20 requested URLs from the cache access log, and up to 3 linked URLs from the earlier pages. No other files are retrieved because the value in MaxURLs has been reached. |