Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAJ_m3YBJ7O+=84b4PcZjwcVf+V8uXh_HWn36OUKZwQNJ3eOGEg@mail.gmail.com>
References: 
 <CAMxA3rvcHTXfd0Z52VHRQ578JRTymozWYYRN9E3H7fdfZqmbjw@mail.gmail.com>
	<CABzyt1GuS5ajGzCfR5Q6P_rrgHJaP0Ky9zWP6Vm7OwJK15=S4Q@mail.gmail.com>
	<CAJ_m3YBJ7O+=84b4PcZjwcVf+V8uXh_HWn36OUKZwQNJ3eOGEg@mail.gmail.com>
Date: Tue, 13 Sep 2011 09:46:31 +0100
Message-ID: 
 <CABvT1DE_P71w+NrOGpaKftxGOqf23i82pAZ6E2tg1qpPKo8XbA@mail.gmail.com>
Subject: Re: Even more fine-grained ETag support when querying views?
From: Robert Newson <rnewson@apache.org>
To: user@couchdb.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

My joke about bloom filters was apparently misunderstood but the
notion above, which sounds a lot like a Merkle tree, seems lucid to
me.

As for the strong vs. weak ETag variants, I think views need strong
ETags in all cases, given the declared semantics for them in 13.3.3

B.

On 12 September 2011 23:28, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> In general the idea is intriguing. Using a combining hash would allow
> you to get a specific hash value for a given range. Unfortunately,
> bloom filters are not a good solution here because they require an a
> priori guess of the number of keys that are going to be stored. On the
> other hand, CRC32 appears to be combinable.There are a couple issues
> though. The first of which is whether this is a strong enough hash to
> use for an ETag. There are two types of ETags with slightly different
> semantics, so we'd have to figure out what we can do and where this
> falls on that spectrum. Secondly, computing the range ETag would
> require the equivalent of a reduce=3Dfalse view call in addition to
> streaming the output if validation matched which has performance
> implications as well.
>
> On Mon, Sep 12, 2011 at 6:50 PM, Alon Keren <alon.keren@gmail.com> wrote:
>> Disclosure: I don't know much about e-tags, CouchDB internals (or bloom
>> filters).
>>
>> How about maintaining an e-tag for each sub-tree in the view, similar to=
 the
>> way (I think) reduce works?
>> When a row gets updated, its e-tag would be recalculated, and then its
>> parent's e-tag would be recalculated, and so on. The e-tag of an interna=
l
>> node could be the hash of all its children's hashes.
>> The actual e-tag that a view-query receives: the e-tag of the common
>> ancestor of all involved rows.
>>
>> The next time you query the same keys, you would supply the e-tag you've
>> just received.
>>
>> =A0Alon
>>
>>
>> On 10 September 2011 16:41, Andreas Lind Petersen <
>> andreaslindpetersen@gmail.com> wrote:
>>
>>> Hi!
>>>
>>> Background: I'm working on a web app that uses a single CouchDB databas=
e
>>> for
>>> storing data belong to 400000+ users. Each user has an average of about=
 40
>>> documents that need to be fetched in one go when the frontend is launch=
ed.
>>> I
>>> have accomplished this by querying a simple view with ?key=3DownerID (w=
ith a
>>> fallback to /_alldocs?startkey=3D<ownerID>_...&endkey=3D<ownerID>~ if t=
he view
>>> isn't built). Since the data for each user rarely changes, there's a
>>> potential to save resources by supporting conditional GET with
>>> If-None-Match, which would amount having the web app backend copy the
>>> CouchDB-generated ETag into the response sent to the browser.
>>>
>>> However, I just learned that CouchDB only maintains a single ETag for t=
he
>>> entire view, so every time one of my users changes something, the ETag =
for
>>> everyone else's query result also changes. This makes conditional GETs
>>> useless with this usage pattern.
>>>
>>> I asked about this on #couchdb and had a brief talk with rnewson, who w=
as
>>> sympathetic to the idea. Unfortunately we weren't able to come up with =
an
>>> idea that didn't involve traversing all docs in the result just for
>>> computing the ETag (my suggestion was a hash of the _revs of all docs
>>> contributing to the result). That would be a bad default, but might sti=
ll
>>> work as an opt-in thing per request, eg. slowetag=3Dtrue.
>>>
>>> Newson said I should try raising the discussion here in case someone el=
se
>>> had an idea for a cheaper way to calculate a good ETag. So what does
>>> everyone else think about this? Is my use case too rare, or would it be
>>> worthwhile to implement it?
>>>
>>> Best regards,
>>> Andreas Lind Petersen (papandreou)
>>>
>>> Here's our chat transcript:
>>>
>>> [11:46] <papandreou> Does anyone know if there are plans for issuing ev=
en
>>> more granular etags for view lookups? When you only look up a small ran=
ge
>>> or
>>> a specific key it would be really great if the ETag only changed when t=
hat
>>> subset changes rather than the entire view.
>>> [11:47] <papandreou> In the application I'm working on I'll hardly ever=
 be
>>> able to get a 304 response because of this.
>>> [...]
>>> [13:51] <+rnewson> papandreou: unlikely.
>>> [13:52] <papandreou> rnewson: So the best thing I can do is to fetch th=
e
>>> data and compute a better etag myself? (My use case is a backend for a =
web
>>> app)
>>> [13:53] <+rnewson> papandreou: You might be able to set ETag in a list
>>> function? If you can't, I'll gladly change CouchDB so you can.
>>> [13:54] <papandreou> rnewson: I thought about that, too, but that would
>>> cause a big overhead for every request, right?
>>> [13:55] <papandreou> rnewson: (Last time I tried views were slooow)
>>> [13:55] <papandreou> I mean lists
>>> [13:55] <+rnewson> papandreou: slower, yes, because couch needs to eval=
uate
>>> the javascript in an external process.
>>> [13:55] <+rnewson> how will you calculate the fine-grained ETag?
>>> [13:56] <+rnewson> Also we did recently make it slightly finer, before =
it
>>> was view group scope and now it's the view itself (I think)
>>> [13:56] <papandreou> rnewson: Maybe something like a hash of the _revs =
of
>>> all the documents contributing to the result?
>>> [13:56] <+rnewson> hm, that makes no sense actually. but we did refine =
it
>>> recently.
>>> [13:57] <+rnewson> papandreou: that doesn't sound cheap at all, and it
>>> would
>>> need to be cheaper than doing the view query itself to make sense.
>>> [13:58] <papandreou> rnewson: There's still the bandwidth thing
>>> [13:58] <+rnewson> oh, you're working with restricted bandwidth and/or =
have
>>> huge view responses?
>>> [13:59] <papandreou> rnewson: And it would be really nice to have somet=
hing
>>> like this completely handled by the database instead of inventing a bun=
ch
>>> of
>>> workarounds.
>>> [14:01] <+rnewson> If there's a correct and efficient algorithm for doi=
ng
>>> it, I'm sure it would be applied.
>>> [14:02] <papandreou> rnewson: I guess it depends on the use case. If th=
e
>>> database is rarely updated I suppose the current tradeoff is better.
>>> [14:03] <+rnewson> I'm sure the only reason we have ETags at the curren=
t
>>> granularity is because it's very quick to calculate. A finer-grain woul=
d be
>>> committed if a viable approach was proposed.
>>> [14:04] <papandreou> rnewson: I have a huge database with data belongin=
g to
>>> 400000+ different users, and I'm using a view to enable a lookup-by-own=
er
>>> thing. But every time a single piece of data is inserted, the ETag for =
the
>>> view changes
>>> [14:04] =3D=3D case_ [~case@AMontsouris-651-1-123-169.w83-202.abo.wanad=
oo.fr]
>>> has quit [Read error: Connection reset by peer]
>>> [14:04] <+rnewson> yes, I've completely understood the problem you stat=
ed
>>> earlier.
>>> [14:05] <+rnewson> I can't think of a way to improve this right now but=
 I
>>> would spend the time to implement it if you had one.
>>> [14:06] <papandreou> rnewson: So right now the code path that sends a 3=
04
>>> only needs to look at a single piece of metadata for the view to make i=
ts
>>> decision? That'll be hard to beat :)
>>> [14:07] <+rnewson> doesn't need to beat it, it just needs to be fast.
>>> [14:07] <+rnewson> but I don't see any current possible solutions, let
>>> alone
>>> fast ones.
>>> [14:07] <papandreou> rnewson: Well, thanks anyway for considering my
>>> suggestion. I'll let you know of I get an idea :)
>>> [14:08] <+rnewson> and it is now per-view and not per-viewgroup. so it'=
s
>>> what I said first before I thought it was silly
>>> [14:08] <+benoitc> query + last seq returned maybe ....
>>> [14:08] <+rnewson> but obviously a change could affect one view in a gr=
oup
>>> but not others
>>> [14:09] <papandreou> benoitc: The query is already sort of included sin=
ce
>>> it's in the url.
>>> [14:09] <+rnewson> benoitc: ?
>>> [14:10] <+benoitc> i was meaning last committed seq,but it won't change
>>> anything ...
>>> [14:10] <papandreou> benoitc: I guess you'd also need to make sure that=
 the
>>> ETag changes if a document is deleted?
>>> [14:10] <papandreou> ah
>>> [14:10] <+rnewson> benoitc: we already use the update_seq of the #view,
>>> which is finer-grained that db's last committed seq
>>> [14:11] <+benoitc> rnewson: commited seq in the view group but anyway i=
t
>>> won't work
>>> [14:12] <+rnewson> benoitc: right, that would be the pre-1.1.0 behavior=
, I
>>> think.
>>> [14:12] <+rnewson> which is coarser
>>> [14:12] <+rnewson> we simply don't record the info that papandreou's
>>> suggestion would need to work.
>>> [14:12] <+benoitc> papandreou: easier solution would be to request each
>>> time
>>> on on stale view
>>> [14:13] <papandreou> rnewson: Another reason why my suggestion sucks is
>>> that
>>> it would require two traversals of the range, right? I'm guessing it st=
arts
>>> streaming as soon as it has found the first doc now?
>>> [14:13] <+benoitc> and update after, think it would work. except if you
>>> want
>>> something strict
>>> [14:13] <+rnewson> papandreou: yes, we stream the results as we read th=
em,
>>> we don't buffer.
>>> [14:14] <papandreou> benoitc: Hmm, so the theory is that stale=3Dok wou=
ld
>>> increase the percentage of 304 responses?
>>> [14:14] <papandreou> rnewson: Right, yes, then it would take a serious =
hit.
>>> [14:14] <+rnewson> papandreou: but we could add an option that reads th=
e
>>> thing, builds an etag, and then streams the result. it would be slower,=
 but
>>> for the times that we can send 304 we'd save bandwidth. It sounds a bit=
 too
>>> niche to me, but you could raise it on user@
>>> [14:15] =3D=3D Frippe [~Frippe@unaffiliated/frippe] has quit [Ping time=
out:
>>> 240
>>> seconds]
>>> [14:15] <papandreou> rnewson: Would be awesome to have that as a
>>> configuration option
>>> [14:15] <+rnewson> papandreou: the view would not change, so neither wo=
uld
>>> the ETag (with stale=3Dok)
>>> [14:15] <+rnewson> papandreou: I think it would be a runtime option
>>> ?slow_etag=3Dtrue
>>> [14:15] <papandreou> rnewson: That would also be fine
>>> [14:16] <+rnewson> a better solution would not require two passes, thou=
gh.
>>> [14:16] <+benoitc> papandreou: i would use stale=3Dok, then query the v=
iew
>>> async, save new etag & ...
>>> [14:16] <papandreou> rnewson: I really don't think it's that niche :). =
But
>>> maybe ETag-nerds are rarer than I think, hehe
>>> [14:16] <+benoitc> rnewson: that could encourage pretty dangerous thing=
s
>>> [14:16] <+rnewson> benoitc: ?
>>> [14:17] <+benoitc> rnewson: cpu intensives tasks eacht time the call is
>>> done,
>>> [14:17] <+benoitc> rather than encouraging something async
>>> [14:18] <+benoitc> rahh I hate osx, it introduce be bad unicode chars i=
n
>>> vim
>>> :@
>>> [14:23] =3D=3D Frippe_ has changed nick to Frippe
>>> [14:23] <papandreou> benoitc: I'm not sure exactly how that would work?=
 I'm
>>> working on the backend for a web app, so the requests will be coming fr=
om
>>> multiple machines
>>> [14:24] <+benoitc> papandreou: call with stale=3D=3Dok and have a proce=
ss
>>> asking
>>> your deb for refresh from time to time
>>> [14:24] <+benoitc> s/deb/view
>>> [14:25] <+rnewson> benoitc: not sure I follow. doubling the number of v=
iew
>>> requests to achieve a finer etag is an ok solution, but shouldn't be th=
e
>>> default, but I do think we'd need a better solution than that.
>>> [14:25] <+rnewson> benoitc: and you might be forgetting all the md5
>>> verification we do all the time.
>>> [14:27] <+benoitc> rnewson: you don't need to call each views though
>>> [14:27] <+benoitc> I don't see the arg about last one
>>> [14:27] <papandreou> benoitc: Ah, ok, I understand now. Won't work very
>>> well
>>> for me, though, the web app is a single page thing that only asks for t=
his
>>> particular chunk of data once per session, so the ETag will probably ha=
ve
>>> changed anyway unless we accept day-old data.
>>> [14:27] <+benoitc> anyway enotime to discuss about that , i'm on
>>> anotherthing
>>> [14:32] <papandreou> rnewson: But next step would be for me to raise th=
e
>>> issue on the user mailing list?
>>> [14:33] <+rnewson> papandreou: on reflection, it's more a dev@ thing, b=
ut
>>> yes.
>>> [14:33] <+rnewson> post the suggestion about calculating an etag over t=
he
>>> results and then streaming them, with the caveat that a better solution
>>> should be found.
>>> [14:34] <papandreou> rnewson: Ok, I will, thanks :). Btw. do you think
>>> there's a chance that this will be easier for key=3D... queries than
>>> arbitrary
>>> startkey=3D...&endkey=3D... ones?
>>> [14:35] <+rnewson> papandreou: yes. for key=3D we could use a bloom fil=
ter.
>>> [14:38] <papandreou> rnewson: Man, I've got some reading up to do :).
>>> Thanks! So dev@ it is?
>>> [14:39] <+rnewson> papandreou: yes.
>>> [14:40] <+rnewson> papandreou: 'bloom filter' is just how we handwave
>>> solutions these days, it just sounds vaguely plausible to for the keys=
=3D
>>> variant
>>> [14:40] <+rnewson> but doesn't make sense at all for startkey/endkey
>>> [14:40] <+jan____> haha, I'm sitting in an ""HTTP Architecture" session=
,
>>> and
>>> all the two speakers do is tell the audience how CouchDB gets it all ri=
ght.
>>> [14:41] <+rnewson> at base, we'd want some cheap way to invalidate a ra=
nge
>>> of keys in memory.
>>> [14:49] <+jan____> the answer must include bloom filters.
>>>
>>
>