One challenge I've experienced recently is I can't figure out how to hint to the...

ryanpetrich · on Oct 21, 2021

This is what ETags are for. Upon a user's first visit the server should return an ETag uniquely representing the current version of the page. The browser will cache both the page and the tag. Upon subsequent page visits the browser will send an If-None-Match header containing the tag for the version of the page it has cached. The server should compare the incoming tag with the tag for the current version and return a "304 Not Modified" response if the tags match or a full response with the newer tag in the ETag header if they don't.

bandie91 · on Oct 22, 2021

a drawback of relying on ETag is that if a page is visited frequently, then the cache validation "If-None-Match" request still being sent and takes bw+latency+computation+etc and I suspect that if the connection is broken or status 503/504 is responded, then neither the cached page is shown. my understanding is that he whant to refresh the page only if it's known to be changed and always use the cached version otherwise.

tshaddox · on Oct 21, 2021

Yeah, and it works the same way with If-Modified-Since and Last-Modified.

tyingq · on Oct 21, 2021

It's a combination of different headers that's hard to sum up in a short comment. A good article on the subject should talk about all these headers: Expires, Cache-control, Etag, Pragma, Vary, Last-Modified

Key CDN has an article on it. They certainly would have experience and expertise there. I didn't read the whole thing, but it seems to have it covered: https://www.keycdn.com/blog/http-cache-headers

There's also some interesting exceptions where rules aren't followed. Like browsers typically have a completely separate cache for favicons. I suppose because they use the icons in funny/different ways, like bookmarks.

There are also sometimes proxies (especially corporate MITM ones) that don't follow the rules. Hence the popularity of cache-busting parameters like you described.

djbusby · on Oct 22, 2021

I get the desired effect without Expires and Pragma, primarily using Cache-Control: no-store (or something I copied from MDN).

nesarkvechnep · on Oct 22, 2021

But no-store makes the resource uncacheable by both intermediate caches and browsers.

toast0 · on Oct 21, 2021

There's no standard way for one page to invalidate another. I've seen some private patches to do it in squid, but that doesn't help because you want to do it for browsers.

Your options are probably:

a) redirect to a different URL as you've done by appending stuff to it

b) require revalidation on each request, recipies shown by other posters

c) POST to the url you want refreshed; post isn't cachable. Note that you can't redirect to POST somewhere else, but you can do it with javascript.

d) use XHR to force a request as another poster mentioned.

bawolff · on Oct 22, 2021

E) use webworkers

Not saying its the right option, but it is an option.

rob-olmos · on Oct 21, 2021

for (c), HTTP 307 doesn't work?

toast0 · on Oct 21, 2021

Apparently, yes. My webfoo is a bit dated.

nesarkvechnep · on Oct 22, 2021

There's no easy way. One way is to use `max-age=0, must-revalidate` but then your origin server should be optimized for conditional GET requests.

It's a very tricky balance between origin server load and consistency. By deciding to use HTTP cache you agree to eventual consistency and this decision comes with its upsides and downsides.

There has been a proposal in 2007 for a thing called cache channels. It defined a mechanism for an origin server to expose a feed which caches would poll at an interval. The feed would list resources that have gone stale since the last query. This mechanism in conjunction with conditional GET requests would've solved part of the issue of hinting browsers to invalidate their local resources.

bawolff · on Oct 21, 2021

Cache-Control: max-age=0, must-revalidate

Sounds like what you want (presuming your server handles 304 logic correctly)

forgotmypw17 · on Oct 21, 2021

I do want caching to happen, however -- until something changes the page.

bawolff · on Oct 22, 2021

That directive says - cache, but ask the webserver if the page has changed every time. If the server responds 304 not modified, it uses the cached version.

From a performance perspective though, people on good internet might be dominated by RTT so a 304 might be almost as expensive as a full 200.

inside65 · on Oct 22, 2021

I have this same issue and have been working around it by appending to the URL. I'd like to believe there's a better way, but I don't know what it is. Alternatively, I could just disable caching but that would defeat the point.

bandie91 · on Oct 21, 2021

as of my understanding of the original design of HTTP, each HTTP resource may state how long itself can be cached in the response header; and the client (browser, proxy, etc) does not have to re-request the resource before the expiry. this is the sandard, so you can not hint that a resource has to be revalidated - in standard way. obviously since then, several tricks emerged, like your mentioned timestamped URL approach - however i'm not sure upto what extent is it standardized in clients to understand that "/path?query" is somehow related to "/path", because originally the request string (path and url parameters) was opaque to the http client, so they should be cached independently. things obviously changed since then. the method i use is to fire a request to the URL which has to be refreshed by Ajax (XHR) with Cache-Control header (yes, it is a request header too), then display the response content or redirect to it.

scottlamb · on Oct 21, 2021

> however i'm not sure upto what extent is it standardized in clients to understand that "/path?query" is somehow related to "/path", because originally the request string (path and url parameters) was opaque to the http client, so they should be cached independently. things obviously changed since then.

It hasn't changed. Those two URLs are still cached completely independently by the user agent. The ?time=... cache busting trick is meant to produce a cache key that's never been used before, thus requiring a fresh request. The new request doesn't clean up the cache entries for the old URLs; it just doesn't use them. That's one reason it's better to use etag and such to make the caches work properly, rather than fight them with this trick.

On many servers, if new.html is a static file, the same entity is produced regardless of parameters. But the user agent doesn't know this.

bandie91 · on Oct 22, 2021

yes, thanks for clarification. my impression that /path and /path?parameter were handled in relation of each other is because some proxy added an option to do so. but good to know that user agents does not.

bobbylarrybobby · on Oct 22, 2021

On Macs, in Safari, cmd-opt-R is “reload page from origin”, which I believe ignores cache