HTTPCache Class Reference

#include <HTTPCache.h>

Collaboration diagram for HTTPCache:

Collaboration graph
[legend]
List of all members.

Public Member Functions

bool cache_response (const string &url, time_t request_time, const vector< string > &headers, const FILE *body)
FILE * get_cached_response (const string &url, vector< string > &headers)
FILE * get_cached_response_body (const string &url)
vector< string > get_conditional_request_headers (const string &url)
bool is_url_in_cache (const string &url)
bool is_url_valid (const string &url)
void purge_cache ()
void release_cached_response (FILE *response)
void update_response (const string &url, time_t request_time, const vector< string > &headers)
virtual ~HTTPCache ()
Accessors and Mutators for various properties.
bool get_always_validate () const
vector< string > get_cache_control ()
CacheDisconnectedMode get_cache_disconnected () const
string get_cache_root () const
int get_default_expiration () const
unsigned long get_max_entry_size () const
unsigned long get_max_size () const
bool is_cache_enabled () const
bool is_cache_protected () const
bool is_expire_ignored () const
void set_always_validate (bool validate)
void set_cache_control (const vector< string > &cc)
void set_cache_disconnected (CacheDisconnectedMode mode)
void set_cache_enabled (bool mode)
void set_cache_protected (bool mode)
void set_default_expiration (int exp_time)
void set_expire_ignored (bool mode)
void set_max_entry_size (unsigned long size)
void set_max_size (unsigned long size)

Static Public Member Functions

static HTTPCacheinstance (const string &cache_root, bool force=false)

Friends

class DeleteByHits
class DeleteCacheEntry
class DeleteExpired
class DeleteUnlockedCacheEntry
class HTTPCacheInterruptHandler
class HTTPCacheTest
class WriteOneCacheEntry

Classes

struct  CacheEntry

Detailed Description

Implements a single-user MT-safe HTTP 1.1 compliant (mostly) cache.

Clients that run as users lacking a writable HOME directory MUST disable this cache. Use Connect::set_cache_enable(false).

The design of this class was taken from the W3C libwww software. That code was originally written by Henrik Frystyk Nielsen, Copyright MIT 1995. See the file MIT_COPYRIGHT. This software is a complete rewrite in C++ with additional features useful to the DODS and OPeNDAP projects.

This cache does not implement range checking. Partial responses should not be cached (HFN's version did, but it doesn't mesh well with the DAP for which this is being written).

The cache uses the local file system to store responses. If it is being used in a MT application, care should be taken to ensure that the number of available file descriptors is not exceeded.

In addition, when used in a MT program only one thread should use the mutators to set property values. Even though the methods are robust WRT MT software, having several threads change values of cache's properties will lead to odd behavior on the part of the cache. Many of the public methods lock access to the class' interface. This is noted in the documentation for those methods.

Even though the public interface to the cache is typically locked when accessed, an extra locking mechanism is in place for `entries' which are accessed. If a thread accesses a entry, that response must be locked to prevent it from being updated until the thread tells the cache that it's no longer using it. The methods get_cache_response() and get_cache_response_body() both lock an entry; use release_cache_response() to release the lock. Entries are locked using a combination of a counter and a mutex. The following methods block when called on a locked entry: is_url_valid(), get_conditional_request_headers(), update_response(). (The locking scheme could be modified so that a distinction is made between reading from and writing to an entry. In this case is_url_valid() and get_conditional_request_headers() would only lock when an entry is in use for writing. But I haven't done that.)

Todo:
Because is_url_in_cache() and is_url_valid() are discrete, an interface that combines the two might be easier to use. Or maybe if is_url_valid() threw a special exception if the entry was missing. Something to help clients deal with URLs that are removed from the cache in between calls to the two methods.
Todo:
Change the entry locking scheme to distinguish between entries accessed for reading and for writing.
Todo:
Test in MT software. Is the entry locking scheme good enough? The current software throws an exception if there's an attempt to modify an entry that is locked by another thread. Maybe it should block instead? Maybe we should provide a tests to see if an update would block (one that returns right away and one that blocks). Note: Rob Morris added tests for MT-safety. 02/06/03 jhrg
Author:
James Gallagher <jgallagher@gso.uri.edu>

Definition at line 131 of file HTTPCache.h.


Constructor & Destructor Documentation

HTTPCache::~HTTPCache (  )  [virtual]

Destroy an instance of HTTPCache. This writes the cache index and frees the in-memory cache table structure. The persistent cache (the response headers and bodies and the index file) are not removed. To remove those, either erase the directory that contains the cache using a file system command or use the purge_cache() method (which leaves the cache directory structure in place but removes all the cached information).

This class uses the singleton pattern. Clients should never call this method. The HTTPCache::instance() method arranges to call the HTTPCache::delete_instance() using atexit(). If delete is called more than once, the result will likely be an index file that is corrupt.

Definition at line 346 of file HTTPCache.cc.


Member Function Documentation

bool HTTPCache::cache_response ( const string &  url,
time_t  request_time,
const vector< string > &  headers,
const FILE *  body 
)

Add a new response to the cache, or replace an existing cached response with new data. This method returns True if the information for url was added to the cache. A response might not be cache-able; in that case this method returns false. (For example, the response might contain the 'Cache-Control: no-cache' header.)

Note that the FILE *body is rewound so that the caller can re-read it without using fseek or rewind.

If a response for url is already present in the cache, it will be replaced by the new headers and body. To update a response in the cache with new meta data, use update_response().

This method locks the class' interface.

Parameters:
url A string which holds the request URL.
request_time The time when the request was made, in seconds since 1 Jan 1970.
headers A vector of strings which hold the response headers.
body A FILE * to a file which holds the response body.
Returns:
True if the response was cached, False if the response could not be cached.
Exceptions:
InternalErr Thrown if there was a I/O error while writing to the persistent store.

Definition at line 1816 of file HTTPCache.cc.

bool HTTPCache::get_always_validate (  )  const

Should every cache entry be validated before each use?

Returns:
True if all cache entries require validation.

Definition at line 1325 of file HTTPCache.cc.

References parse_time().

Here is the call graph for this function:

vector< string > HTTPCache::get_cache_control (  ) 

Get the Cache-Control headers.

This method locks the class' interface.

Returns:
A vector of strings, one string for each header.

Definition at line 1400 of file HTTPCache.cc.

CacheDisconnectedMode HTTPCache::get_cache_disconnected (  )  const

Get the cache's disconnected mode property.

Definition at line 1141 of file HTTPCache.cc.

string HTTPCache::get_cache_root (  )  const

Get the current cache root directory.

Returns:
A string that contains the cache root directory.

Definition at line 1049 of file HTTPCache.cc.

References DBG, and LOCK.

FILE * HTTPCache::get_cached_response ( const string &  url,
vector< string > &  headers 
)

Get information from the cache. For a given URL, get the headers and body stored in the cache. Note that this method increments the hit counter for url's entry and locks that entry. To release the lock, the method release_cached_response() must be called. Methods that block on a locked entry are: get_conditional_request_headers(), update_response() and is_url_valid(). In addition, purge_cache() throws Error if it's called and any entries are locked. The garbage collection system will not reclaim locked entries (but works fine when some entries are locked).

This method locks the class' interface.

This method does not check to see that the response is valid, just that it is in the cache. To see if a cached response is valid, use is_url_valid(). The FILE* returned can be used for both reading and writing. The latter allows a client to update the body of a cached response without having to first dump it all to a separate file and then copy it into the cache (using cache_response()).

Parameters:
url Get response information for this URL.
headers Return the response headers in this parameter
Returns:
A FILE * to the response body.
Exceptions:
Error Thrown if the URL's response is not in the cache.
InternalErr Thrown if the persistent store cannot be opened.

Definition at line 2174 of file HTTPCache.cc.

FILE * HTTPCache::get_cached_response_body ( const string &  url  ) 

Get a pointer to a cached response body. For a given URL, find the cached response body and return a FILE * to it. This updates the hit counter and it locks the entry. To release the lock, call release_cached_response(). Methods that block on a locked entry are: get_conditional_request_headers(), update_response() and is_url_valid(). In addition, purge_cache() throws Error if it's called and any entries are locked. The garbage collection system will not reclaim locked entries (but works fine when some entries are locked).

NB: This method does not check to see that the response is valid, just that it is in the cache. To see if a cached response is valid, use is_url_valid().

This method locks the class' interface.

Parameters:
url Find the body associated with this URL.
Returns:
A FILE* that points to the response body.
Exceptions:
Error Thrown if the URL is not in the cache.
InternalErr Thrown if an I/O error is detected.

Definition at line 2237 of file HTTPCache.cc.

vector< string > HTTPCache::get_conditional_request_headers ( const string &  url  ) 

Build the headers to send along with a GET request to make that request conditional. This method examines the headers for a given response in the cache and formulates the correct headers for a valid HTTP 1.1 conditional GET request. See RFC 2616, Section 13.3.4.

Rules: If an ETag is present, it must be used. Use If-None-Match. If a Last-Modified header is present, use it. Use If-Modified-Since. If both are present, use both (this means that HTTP 1.0 daemons are more likely to work). If a Last-Modified header is not present, use the value of the Cache-Control max-age or Expires header(s). Note that a 'Cache-Control: max-age' header overrides an Expires header (Sec 14.9.3).

This method locks the cache interface and the cache entry.

Parameters:
url Get the CacheEntry for this URL.
Returns:
A vector of strings, one request header per string.
Exceptions:
Error Thrown if the url is not in the cache.

Definition at line 1918 of file HTTPCache.cc.

int HTTPCache::get_default_expiration (  )  const

Get the default expiration time used by the cache.

Definition at line 1306 of file HTTPCache.cc.

unsigned long HTTPCache::get_max_entry_size (  )  const

Get the maximum size of an individual entry in the cache.

Returns:
The maximum size in megabytes.

Definition at line 1276 of file HTTPCache.cc.

unsigned long HTTPCache::get_max_size (  )  const

How big is the cache? The value returned is the size in megabytes.

Definition at line 1228 of file HTTPCache.cc.

HTTPCache * HTTPCache::instance ( const string &  cache_root,
bool  force = false 
) [static]

Get a pointer to the HTTP 1.1 compliant cache. If not already instantiated, this creates an instance of the HTTP cache object and initializes it to use cache_root as the location of the persistent store. If there's an index (.index) file in that directory, it is read as part of the initialization. If the cache has already been initialized, this method returns a pointer to that instance. Note HTTPCache uses the singleton pattern; A process may have only one instance of this object. Also note that HTTPCache is MT-safe. However, if the force parameter is set to true, it may be possible for two or more processes to access the persistent store at the same time resulting in undefined behavior.

Default values: is_cache_enabled(): true, is_cache_protected(): false, is_expire_ignored(): false, the total size of the cache is 20M, 2M of that is reserved for response headers, during GC the cache is reduced to at least 18M (total size - 10% of the total size), and the max size for an individual entry is 3M. It is possible to change the size of the cache, but not to make it smaller than 5M. If expiration information is not sent with a response, it is assumed to expire in 24 hours.

Parameters:
cache_root The fully qualified pathname of the directory which will hold the cache data (i.e., the persistent store).
force Force access to the persistent store if true. By default false. Use this only if you're sure no one else is using the same cache root! This is included so that programs may use a cache that was left in an inconsistent state.
Returns:
A pointer to the HTTPCache object.
Exceptions:
Error thrown if the cache root cannot set.

Definition at line 245 of file HTTPCache.cc.

Referenced by HTTPConnect::HTTPConnect().

bool HTTPCache::is_cache_enabled (  )  const

Is the cache currently enabled?

Definition at line 1080 of file HTTPCache.cc.

Referenced by HTTPConnect::fetch_url(), and HTTPConnect::is_cache_enabled().

bool HTTPCache::is_cache_protected (  )  const

Should we cache protected responses?

Definition at line 1111 of file HTTPCache.cc.

bool HTTPCache::is_expire_ignored (  )  const

Definition at line 1170 of file HTTPCache.cc.

bool HTTPCache::is_url_in_cache ( const string &  url  ) 

Look in the cache for the given url. Is it in the cache table?

This method locks the class' interface.

Parameters:
url The url to look for.
Returns:
True if url is found, otherwise False.

Definition at line 1600 of file HTTPCache.cc.

References CACHE_META, and DBG.

bool HTTPCache::is_url_valid ( const string &  url  ) 

Look in the cache and return the status (validity) of the cached response. This method should be used to determine if a cached response requires validation.

This method locks the class' interface and the cache entry.

Parameters:
url Find the cached response associated with this URL.
Returns:
True indicates that the response can be used, False indicates that it must first be validated.
Exceptions:
Error Thrown if the URL's response is not in the cache.

Definition at line 2067 of file HTTPCache.cc.

void HTTPCache::purge_cache (  ) 

Purge both the in-memory cache table and the contents of the cache on disk. This method deletes every entry in the persistent store but leaves the structure intact. The client of HTTPCache is responsible for making sure that all threads have released any responses they pulled from the cache. If this method is called when a response is still in use, it will throw an Error object and not purge the cache.

This method locks the class' interface.

Exceptions:
Error Thrown if an attempt is made to purge the cache when an entry is still in use.

Definition at line 2355 of file HTTPCache.cc.

void HTTPCache::release_cached_response ( FILE *  body  ) 

Call this method to inform the cache that a particular response is no longer in use. When a response is accessed using get_cached_response(), it is locked so that updates and removal (e.g., by the garbage collector) are not possible. Calling this method frees that lock.

This method locks the class' interface.

Parameters:
body Release the lock on the response information associated with this FILE *.
Exceptions:
Error Thrown if body does not belong to an entry in the cache or if the entry was already released.

Definition at line 2291 of file HTTPCache.cc.

Referenced by HTTPCacheResponse::~HTTPCacheResponse().

void HTTPCache::set_always_validate ( bool  validate  ) 

Should every cache entry be validated?

Parameters:
validate True if every cache entry should be validated before being used.

Definition at line 1316 of file HTTPCache.cc.

void HTTPCache::set_cache_control ( const vector< string > &  cc  ) 

Set the request Cache-Control headers. If a request must be satisfied using HTTP, these headers should be included in request since they might be pertinent to a proxy cache.

Ignored headers: no-transform, only-if-cached. These headers are not used by HTTPCache and are not recorded. However, if present in the vector passed to this method, they will be present in the vector returned by get_cache_control.

This method locks the class' interface.

Parameters:
cc A vector of strings, each string holds one Cache-Control header.
Exceptions:
InternalErr Thrown if one of the strings in cc does not start with 'Cache-Control: '.

Definition at line 1347 of file HTTPCache.cc.

void HTTPCache::set_cache_disconnected ( CacheDisconnectedMode  mode  ) 

Set the cache's disconnected property. The cache can operate either disconnected from the network or using a proxy cache (but tell that proxy not to use the network).

This method locks the class' interface.

Parameters:
mode One of DISCONNECT_NONE, DISCONNECT_NORMAL or DISCONNECT_EXTERNAL.
See also:
CacheDIsconnectedMode

Definition at line 1127 of file HTTPCache.cc.

void HTTPCache::set_cache_enabled ( bool  mode  ) 

Enable or disable the cache. The cache can be temporarily suspended using the enable/disable property. This does not prevent the cache from being enabled/disable at a later point in time.

Default: yes

This method locks the class' interface.

Parameters:
mode True if the cache should be enabled, False if it should be disabled.

Definition at line 1066 of file HTTPCache.cc.

Referenced by HTTPConnect::set_cache_enabled().

void HTTPCache::set_cache_protected ( bool  mode  ) 

Should we cache protected responses? A protected response is one that comes from a server/site that requires authorization.

Default: no

This method locks the class' interface.

Parameters:
mode True if protected responses should be cached.

Definition at line 1097 of file HTTPCache.cc.

void HTTPCache::set_default_expiration ( int  exp_time  ) 

Set the default expiration time. Use the default expiration property to determine when a cached response becomes stale if the response lacks the information necessary to compute a specific value.

Default: 24 hours (86,400 seconds)

This method locks the class' interface.

Parameters:
exp_time The time in seconds.

Definition at line 1292 of file HTTPCache.cc.

References DBG, and LOCK.

void HTTPCache::set_expire_ignored ( bool  mode  ) 

How should the cache handle the Expires header? Default: no

This method locks the class' interface.

Parameters:
mode True if a responses Expires header should be ignored, False otherwise.

Definition at line 1155 of file HTTPCache.cc.

References CACHE_FOLDER_PCT, CACHE_GC_PCT, DBG, LOCK, MEGA, and MIN_CACHE_TOTAL_SIZE.

void HTTPCache::set_max_entry_size ( unsigned long  size  ) 

Set the maximum size for a single entry in the cache.

Default: 3M

This method locks the class' interface.

Parameters:
size The size in megabytes.

Definition at line 1242 of file HTTPCache.cc.

void HTTPCache::set_max_size ( unsigned long  size  ) 

Cache size management. The default cache size is 20M. The minimum size is 5M in order not to get into weird problems while writing the cache. The size is indicated in Mega bytes. Note that reducing the size of the cache may trigger a garbage collection operation.

Note:
The maximum cache size is UINT_MAX bytes (usually 4294967295 for 32-bit computers). If size is larger the value will be truncated to the value of that constant. It seems pretty unlikely that will happen given that the parameter is an unsigned long. This is a fix for bug 689 which was reported when the parameter type was signed.
This method locks the class' interface.

Parameters:
size The maximum size of the cache in megabytes.

Definition at line 1191 of file HTTPCache.cc.

References MEGA.

void HTTPCache::update_response ( const string &  url,
time_t  request_time,
const vector< string > &  headers 
)

Update the meta data for a response already in the cache. This method provides a way to merge response headers returned from a conditional GET request, for the given URL, with those already present.

This method locks the class' interface and the cache entry.

Parameters:
url Update the meta data for this cache entry.
request_time The time (Unix time, seconds since 1 Jan 1970) that the conditional request was made.
headers New headers, one header per string, returned in the response.
Exceptions:
Error Thrown if the url is not in the cache.

Definition at line 1990 of file HTTPCache.cc.


Friends And Related Function Documentation

friend class DeleteByHits [friend]

Definition at line 260 of file HTTPCache.h.

friend class DeleteCacheEntry [friend]

Definition at line 261 of file HTTPCache.h.

friend class DeleteExpired [friend]

Definition at line 259 of file HTTPCache.h.

friend class DeleteUnlockedCacheEntry [friend]

Definition at line 262 of file HTTPCache.h.

friend class HTTPCacheInterruptHandler [friend]

Definition at line 255 of file HTTPCache.h.

friend class HTTPCacheTest [friend]

Definition at line 254 of file HTTPCache.h.

friend class WriteOneCacheEntry [friend]

Definition at line 263 of file HTTPCache.h.


The documentation for this class was generated from the following files:
Generated on Wed Jun 27 12:58:00 2007 for libdap++ by  doxygen 1.4.7