One challenge I've experienced recently is I can't figure out how to hint to the browser that it should refresh a particular cached page. (Without appending ?time=1634851491 to the URL.)
For example, let's say I've already cached the page /new.html
Now, I click a button which triggers a change to the page, and I am redirected back to it.
Even though the page has changed, and the browser should see a new timestamp in the header if pinging the server, it just doesn't seem to happen.
Has anyone dealt with this before? I tried to ask on StackOverflow, but lately my questions don't seem to get any attention, and I've run out of reputation to spend on bounties.
This is what ETags are for. Upon a user's first visit the server should return an ETag uniquely representing the current version of the page. The browser will cache both the page and the tag. Upon subsequent page visits the browser will send an If-None-Match header containing the tag for the version of the page it has cached. The server should compare the incoming tag with the tag for the current version and return a "304 Not Modified" response if the tags match or a full response with the newer tag in the ETag header if they don't.
a drawback of relying on ETag is that if a page is visited frequently, then the cache validation "If-None-Match" request still being sent and takes bw+latency+computation+etc and I suspect that if the connection is broken or status 503/504 is responded, then neither the cached page is shown.
my understanding is that he whant to refresh the page only if it's known to be changed and always use the cached version otherwise.
It's a combination of different headers that's hard to sum up in a short comment. A good article on the subject should talk about all these headers: Expires, Cache-control, Etag, Pragma, Vary, Last-Modified
Key CDN has an article on it. They certainly would have experience and expertise there. I didn't read the whole thing, but it seems to have it covered: https://www.keycdn.com/blog/http-cache-headers
There's also some interesting exceptions where rules aren't followed. Like browsers typically have a completely separate cache for favicons. I suppose because they use the icons in funny/different ways, like bookmarks.
There are also sometimes proxies (especially corporate MITM ones) that don't follow the rules. Hence the popularity of cache-busting parameters like you described.
There's no standard way for one page to invalidate another. I've seen some private patches to do it in squid, but that doesn't help because you want to do it for browsers.
Your options are probably:
a) redirect to a different URL as you've done by appending stuff to it
b) require revalidation on each request, recipies shown by other posters
c) POST to the url you want refreshed; post isn't cachable. Note that you can't redirect to POST somewhere else, but you can do it with javascript.
d) use XHR to force a request as another poster mentioned.
There's no easy way. One way is to use `max-age=0, must-revalidate` but then your origin server should be optimized for conditional GET requests.
It's a very tricky balance between origin server load and consistency. By deciding to use HTTP cache you agree to eventual consistency and this decision comes with its upsides and downsides.
There has been a proposal in 2007 for a thing called cache channels. It defined a mechanism for an origin server to expose a feed which caches would poll at an interval. The feed would list resources that have gone stale since the last query. This mechanism in conjunction with conditional GET requests would've solved part of the issue of hinting browsers to invalidate their local resources.
That directive says - cache, but ask the webserver if the page has changed every time. If the server responds 304 not modified, it uses the cached version.
From a performance perspective though, people on good internet might be dominated by RTT so a 304 might be almost as expensive as a full 200.
I have this same issue and have been working around it by appending to the URL. I'd like to believe there's a better way, but I don't know what it is. Alternatively, I could just disable caching but that would defeat the point.
as of my understanding of the original design of HTTP, each HTTP resource may state how long itself can be cached in the response header; and the client (browser, proxy, etc) does not have to re-request the resource before the expiry. this is the sandard, so you can not hint that a resource has to be revalidated - in standard way. obviously since then, several tricks emerged, like your mentioned timestamped URL approach - however i'm not sure upto what extent is it standardized in clients to understand that "/path?query" is somehow related to "/path", because originally the request string (path and url parameters) was opaque to the http client, so they should be cached independently. things obviously changed since then. the method i use is to fire a request to the URL which has to be refreshed by Ajax (XHR) with Cache-Control header (yes, it is a request header too), then display the response content or redirect to it.
> however i'm not sure upto what extent is it standardized in clients to understand that "/path?query" is somehow related to "/path", because originally the request string (path and url parameters) was opaque to the http client, so they should be cached independently. things obviously changed since then.
It hasn't changed. Those two URLs are still cached completely independently by the user agent. The ?time=... cache busting trick is meant to produce a cache key that's never been used before, thus requiring a fresh request. The new request doesn't clean up the cache entries for the old URLs; it just doesn't use them. That's one reason it's better to use etag and such to make the caches work properly, rather than fight them with this trick.
On many servers, if new.html is a static file, the same entity is produced regardless of parameters. But the user agent doesn't know this.
yes, thanks for clarification.
my impression that /path and /path?parameter were handled in relation of each other is because some proxy added an option to do so. but good to know that user agents does not.
For example, let's say I've already cached the page /new.html
Now, I click a button which triggers a change to the page, and I am redirected back to it.
Even though the page has changed, and the browser should see a new timestamp in the header if pinging the server, it just doesn't seem to happen.
Has anyone dealt with this before? I tried to ask on StackOverflow, but lately my questions don't seem to get any attention, and I've run out of reputation to spend on bounties.