Maintaining cache content

Because caching involves making and saving a copy of the served file, some routine maintenance is required for the cache to function properly.

Cached files must be checked for freshness and invalidated when they are no longer consistent with the files on the origin server. Invalidated or unused files must be removed from the cache to make room for new files.

File expiration

Keeping cached objects consistent with the original object on the content server is known as maintaining cache freshness. For each document or other object that it caches, Caching Proxy computes a time at which the object expires.

For HTTP pages, the header of the document, generated by the content server, contains the expiration information.

Because the FTP protocol does not include equivalent expiration information, Caching Proxy generates its own Last-Modified: header for FTP files, which are based on the FTP directory information for each file, and uses this information to compute expiry times. If the proxy server cannot obtain directory information for the file from the FTP server, the default value that matches the FTP URL is used. In addition, because there is no standard date format for FTP servers, Caching Proxy might be unable to understand the date and time that is sent by some FTP servers. In that case, the proxy server's default expiry time value is used. This procedure allows the proxy to manage the caching of HTTP pages and FTP files in a similar manner.

Expiration can be specified by a content server in one of several ways (in order of preference):
  1. The content server specifies a header saying Cache-control: s-maxage= n . This tells the proxy that the object is fresh for n seconds after it is received.
  2. The content server specifies a header saying Cache-control: max-age= n . This tells the proxy that the object is fresh for n seconds after it is received.
  3. The content server specifies a header saying: Expires: n . This tells the proxy that the object is fresh until the time specified by n .
  4. The content server indicates when the document was last modified, by using a Last-Modified: n header. The proxy server computes the length of time since the document was last modified, multiplies this by the Cache Last Modified factor set in the proxy configuration file, and assumes that the document is valid for that length of time. For example, if the content server indicates that the document was last modified one week (seven days) ago, and the Cache Last Modified factor is 0.14, then the proxy server assumes that the document is valid for about one day. See Configuring cache freshness for instructions on setting the Cache Last Modified factor.
  5. If none of the above information is specified by the content server, Caching Proxy looks for the Cache Default Expiry setting that matches the current URL and uses that for the expiry time.

After the expiry time is computed as described, Caching Proxy checks to see whether there is a Minimum Hold value that applies for this URL. If there is, and the time it specifies is longer than the computed expiry time, then the time that is specified by the Minimum Hold value is used as the object's expiry time. This is true even if Caching Proxy computes an expiry time of 0 minutes for a document. Therefore, to avoid serving stale content, be cautious about using the Minimum Hold setting. (To set the Minimum Hold value, use the CacheMinHold directive or the Cache Configuration –> Cache Expiry Settings: URL Expiration setting.

The final expiry time value is checked against the time that is specified in the Time Margin setting. If the expiry time is greater than the Time Margin value, the document is cached; otherwise, it is not added to the cache.

If the document is found in the cache, but it is expired, Caching Proxy issues a special request known as an if-modified-since request to the content server. This request causes the content server to send the document only if it has been modified since it was last received by the proxy. If the document has not been modified, the content server sends a message indicating that, and does not resend the page. In that case, the proxy serves the cached document. For FTP files, the proxy server simulates this if-modified-since process. If it determines that the file has not been changed at the FTP server, it serves the file from the cache. Otherwise, it gets the newer version from the FTP server.

When an FTP file expires from the cache, the proxy simulates the HTTP if-modified-since revalidation process for the FTP file. It does this by reissuing the FTP LIST command for the requested file, parsing the file date from the response that is returned by the FTP server, and comparing this date with the date that the proxy server generated for the Last-Modified header when the file was initially retrieved. If the file date has not changed, then the proxy server marks the cached FTP file as revalidated, sets a new expiration time for the file, and serves the file from the cache rather than retrieving it again from the FTP server. If the two file dates do not match, then the proxy retrieves the file from the FTP server again and caches the new copy with the new file date.

It is not always possible to obtain the directory information for the file from the FTP server. If the proxy is unable to determine the file date for the FTP file, it does not generate a Last-Modified header for the file. Instead, it uses the value that is specified for the CacheDefaultExpiry directive that matches the URL to determine the length of time to keep the file in the cache. When this time period expires, the proxy always retrieves the file from the FTP server again. If specific FTP files in your cache seem to be using the CacheDefaultExpiry directive often and are frequently being retrieved (generating a high volume of network traffic), consider specifying a more granular CacheDefaultExpiry value for those specific files. Doing this holds them in the cache for a longer period of time.

To specify cache expiration settings in the Configuration and Administration forms, use the Cache Configuration –> Cache Expiry Settings –> Time Limit for Cached Files form.

Additional information about cache freshness

  • Almost all static web documents (as opposed to dynamically generated documents) include a Last-Modified header. This header is the most common way that proxies compute expiry times for documents and the first method that Caching Proxy tries for FTP files. If this fails, the proxy refers to the Default Expiry values.
  • Few documents use a Cache-control: s-maxage, Cache-control: max-age, or Expires: header.
  • Dynamically generated pages, which frequently are not cacheable, can include a header that saysExpires: 0 or Cache-control: no-cache, which mean that the document expires immediately.
  • Be cautious when setting the Default Expiry value to anything other than 0 minutes for URLs by using the HTTP: syntax. Many dynamically generated pages include none of the expiration headers and are therefore subject to the Default Expiry value. Setting Default Expiry to more than 0 minutes allows the proxy to cache those objects, but this might mean that users get out-of-date content (or unexpected results from CGI programs or servlets).
  • In the following circumstances, the proxy server revalidates documents with the server for every request, regardless of whether the cached document is expired:
    • The document includes one of the following headers:
      • Cache-control: s-maxage
      • Cache-control: must-revalidate
      • Cache-control: proxy-revalidate
    • The document requires user credentials but is allowed to be cached by the server.
    • The document contains a Cache-Control: no-cache header but is cached anyway (due to aggressive caching).

Dates in FTP

This applies to forward proxy configurations only.

Because the FTP protocol does not define dates and times as strictly as the HTTP protocol does, several factors can cause the Last-Modified header that is generated by the proxy for FTP files to be slightly different from the actual file date. These factors include the following:
  • Unlike the HTTP protocol, the FTP protocol does not specify that returned dates must be in Greenwich Mean Time (GMT). The date that is returned by the FTP server is likely to be in the FTP server's local time. Because the proxy has no way of determining what time zone the FTP server is running in, it interprets the time as in its own time zone. An exception to this is the Windows FTP server, which returns dates in GMT. If the proxy detects that the FTP server is running on Windows systems, it assumes that the directory date is in GMT.
  • Some FTP servers specify the date in the returned directory information in the format of Month Day Year only, and do not include the actual hours or minutes information for the date specified. If the FTP server does not return hour and minute information for the file, the proxy assumes that the file was last modified on the latest possible hour and minute of the date that is returned by the FTP server. For example, if the FTP server returns directory information for a file indicating that the file was last modified on October 13, 1998, but does not include information on the hours or minutes, the proxy assumes that the file was modified at 11:59:59 p.m. on October 13, 1998. Then, if the FTP server is not a Windows FTP server, the proxy converts this date from its own local time zone to the corresponding GMT.

Configuring cache freshness

To specify the expiration times for cached files, in the Configuration and Administration forms, select Cache Configuration –> Cache Expiry Settings. The following forms are useful.

URL-based expiration

Use this form to set the minimum length of time that files are held in the cache, which is based on their URLs. You can specify different caching behavior for different URL request templates.

To set URL-based file expiration by editing the proxy configuration file, see the reference sections in Configuration file directives for the following directives:

Default expiration settings

Use the Cache Expiration Settings form to specify the default expiration settings for used or unused files. You can set different values for HTTP, FTP, and Gopher files, and you can set different values for used or unused files.

This form also contains more file-expiration options:
  • Enable cached file expiration checking. This check box is selected by default. Generally, it is desirable to select this option so that the server does not send stale content.
  • Disable retrieval of files from remote servers. Select this option if you do not want the server to retrieve files from remote servers.
  • Do not cache files that will expire within. To prevent caching files that expire in a short time, specify the time period with this option. By default, files that expire within 10 minutes are not cached.

Last Modified Factor settings

Use the Last Modified Factor form to set the value that the proxy uses to calculate an expiration date for cached files with no expiration dates in their headers. You can set different values for files matching different request templates. The first matching template is used to calculate the expiration date.

Cache time limit

Use the Time Limit for Cached Files configuration form to set the maximum time that a file can remain in the cache. Time limits are set based on request templates, and you can specify that files are discarded or revalidated when the time limit expires. These settings can be used to maintain files whose expiration dates are invalid or files with long expiration times.

To set the maximum expiration time limit for cached files by editing the proxy configuration file, see the following:

Garbage collection

As part of the effort to keep popular URLs cached and minimize usage of system resources, Caching Proxy performs the cleanup process known as garbage collection, in which old or unused files are removed from the cache to make room for more-current files.

The garbage collection process examines the files in the cache directory and attempts to eliminate expired files to reduce the size of the cache and make room for new files. Garbage collection is done automatically, but some settings can be configured to tailor the process to your needs.

Configuring garbage collection

To configure garbage collection, in the Configuration and Administration forms, select Cache Configuration –> Garbage Collection Settings. Use this form to set the high water mark and low water mark, which determine when garbage collection is started and stopped. When the amount of space that is used in the cache reaches or exceeds the percentage set for the high water mark, garbage collection begins. Garbage collection continues until the percentage of used space in the cache is equal to or less than the value set for the low water mark.

You can choose between two garbage-collection algorithms. The responsetime algorithm optimizes the time that is required to respond to users by preferentially removing large files from the cache. The bandwith algorithm optimizes the use of network bandwidth by preferentially removing smaller files from the cache. Choose either, or a blend of the two.

To configure garbage collection by editing the proxy configuration file, see the reference sections for the following directives:

Icon that indicates the type of topic Reference topic



Timestamp icon Last updated: March 23, 2018 0:18
File name: cache-maint.html