incubator-river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jgr...@simulexinc.com
Subject Re: Space/outrigger suggestions (remote iterator vs. collection)
Date Wed, 22 Dec 2010 18:57:11 GMT
Regarding the complications pointed out for returning a remote iterator from takeMultiple:
--
(1) One can size the batch to make best balance network bandwidth and
latency.
--

That's currently done by combination of server-side takeMultipleLimit and client-side maxEntries.
  If we use a remote iterator, I assume we'd include takeMultipleBatchSize and retain the
client-side maxEntries.   So I don't see this as being substantially different.

--
(2) One can limit the time a collection of exclusive locks are held under a
transaction by virtue of the timeout.
--

Hmm, why would this not be the case under a remote iterator?   I would think that the correct
behavior would be to release locks after a timeout expires regardless of whether the return
type was an iterator or collection.

--
(3) Batching in this way allows multiple clients to remove and process
entrys in a more scalable fashion than with a (unbounded or no entry limit)
remote iterator.
--

Users would still be free to make multiple calls with small values for maxEntries if they
so chose.   They would also gain the ability to make an unbounded request, which is currently
lacking, outside of repeated calls.

--
(4) [gleaned from text] More bookkeeping is necessary.
--

Certainly.   Also, we'd have to work out the precise semantics that the iterator operates
under and make them clear in the documentation.

--
(5) [gleaned from text] A remote iterator would certainly be less performant than a straight
batch take.
--

This is the biggest concern, I think.   As such, I'd be interested in seeing performance runs,
to back up the intuition.   Then, at least, we'd know precisely what trade-off we're talking
about.

The test would need to cover both small batches and large, both in multiples of the batch-size/takeMultipleLimit
and for numbers off of those multiples, with transactions and without.

jamesG

-----Original Message-----
From: "Dan Creswell" <dan.creswell@gmail.com>
Sent: Wednesday, December 22, 2010 5:23am
To: river-dev@incubator.apache.org
Subject: Re: Space/outrigger suggestions (remote iterator vs. collection)

Hey,

So the below means you are indeed following my explanation so to your
question:

Yes, you could use a remote iterator style of thing but for take it's quite
a heavyweight construct especially once you have transactions in the way.
The core implementation itself is very similar to contents and would have
for the most part similar performance. However, it'd certainly be less
performant than a straight batch take.

More of a concern though is the impact on other clients of the space
implementation: by virtue of lots of book-keeping, the most exclusive locks
on entry's and long running transactions that inflict delays on other
clients leading to poor scaling. Contents by virtue of it's read nature is a
little less painful performance wise and for a lot of applications you'd
pass no transaction which reduces performance pain further.

So I'd say that batch take is probably a better tradeoff than a take/remote
iterator combo because:

(1) One can size the batch to make best balance network bandwidth and
latency.
(2) One can limit the time a collection of exclusive locks are held under a
transaction by virtue of the timeout.
(3) Batching in this way allows multiple clients to remove and process
entrys in a more scalable fashion than with a (unbounded or no entry limit)
remote iterator.

In essence one puts the control squarely with the user so's they can get
what they want albeit at the price of some API asymmetry as you correctly
point out.

As an implementer, I could reduce my codebase a little if we did takes with
a remote iterator but being completely honest, not by enough that I'd
support a spec change for that reason alone.

HTH,

Dan.


Mime
View raw message