Mailing-List: contact river-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: river-dev@incubator.apache.org
Received-SPF: pass (athena.apache.org: domain of dan.creswell@gmail.com
 designates 209.85.216.182 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=KEk2CHGjBkPAZKnzRpV5qkjST1fsqKcMh1qig4NYzztPosN41U9dJpvu0EJGDtR2xp
         jykOBi2JKAI+ElV9m5TJwNEA1Cx02gg8t2m5Wn+K9NNTwkvFG8begXhJYwOS0vfSCg+G
         DxhA8pW8I+y7NFT7KRshPNfD1mqHjOWDkkLEs=
MIME-Version: 1.0
In-Reply-To: <1293009549.81923193@192.168.2.228>
References: <1293009549.81923193@192.168.2.228>
Date: Wed, 22 Dec 2010 10:23:54 +0000
Message-ID: <AANLkTi=xPCtAT9R-_-p0g6a8X6nB9y+XYAUPrbGaDneM@mail.gmail.com>
Subject: Re: Space/outrigger suggestions (remote iterator vs. collection)
From: Dan Creswell <dan.creswell@gmail.com>
To: river-dev@incubator.apache.org
Content-Type: multipart/alternative; boundary=001636284a3477fe7b0497fd2618

--001636284a3477fe7b0497fd2618
Content-Type: text/plain; charset=ISO-8859-1

Hey,

So the below means you are indeed following my explanation so to your
question:

Yes, you could use a remote iterator style of thing but for take it's quite
a heavyweight construct especially once you have transactions in the way.
The core implementation itself is very similar to contents and would have
for the most part similar performance. However, it'd certainly be less
performant than a straight batch take.

More of a concern though is the impact on other clients of the space
implementation: by virtue of lots of book-keeping, the most exclusive locks
on entry's and long running transactions that inflict delays on other
clients leading to poor scaling. Contents by virtue of it's read nature is a
little less painful performance wise and for a lot of applications you'd
pass no transaction which reduces performance pain further.

So I'd say that batch take is probably a better tradeoff than a take/remote
iterator combo because:

(1) One can size the batch to make best balance network bandwidth and
latency.
(2) One can limit the time a collection of exclusive locks are held under a
transaction by virtue of the timeout.
(3) Batching in this way allows multiple clients to remove and process
entrys in a more scalable fashion than with a (unbounded or no entry limit)
remote iterator.

In essence one puts the control squarely with the user so's they can get
what they want albeit at the price of some API asymmetry as you correctly
point out.

As an implementer, I could reduce my codebase a little if we did takes with
a remote iterator but being completely honest, not by enough that I'd
support a spec change for that reason alone.

HTH,

Dan.

On 22 December 2010 09:19, <jgrahn@simulexinc.com> wrote:

> My current email client is not advanced enough to do inline, but I think
> I'm following your explanation.
>
> Successive calls of contents may retrieve the same objects, so merely
> calling contents multiple times wouldn't provide the functionality of
> running through a space.   Thus, the remote iterator was introduced in order
> to provide the ability to exhaustively read space on an iterative-type
> basis.
>
> Meanwhile, the takeMultipleLimit in Outrigger that limits the returned
> collection size isn't a practical hindrance because successive takeMultiples
> will eventually grab everything from space, whether it happens all at once
> or not.   The same could be said of a client "sipping" from the space a
> couple entries at a time via maxEntries.
>
> The case for the remote iterator stands reasonably well-made, then: it
> keeps memory overhead fairly low (beholden to the size of actual entries),
> and at minimal network cost.   It could only be reasonably replaced with a
> collection of all matching entries, which would not be satisfactory for
> underpowered clients.
>

So my next question would be: why not use a remote iterator for the
> takeMultiple?
>
> Using a remote iterator would presumably eliminate things like
> takeMultipleLimit, removing the case where the client receives fewer than
> the maxEntries requested when they are available.   Indeed,
> takeMultipleLimit would effectively be replaced with
> "takeMultipleBatchSize", largely transparent to the end user.   We'd gain a
> uniform return type for multiple entry fetches.
>
> Remote iterator usage with takeMultiple would require more network use, but
> perhaps (wild speculation) not much more than a call to contents with a
> transaction.   (Would also need to compare remote iterator to successive
> calls to "take" in evaluating network cost.)   Any pitfalls I'm missing?
>
> jamesG
>
> PS: Apparently I need to study up on read lock semantics; please excuse the
> confusion.
>
> -----Original Message-----
> From: "Dan Creswell" <dan.creswell@gmail.com>
> Sent: Monday, December 20, 2010 12:15pm
> To: river-dev@incubator.apache.org
> Subject: Re: Space/outrigger suggestions
>
> K, so inline.....
>
> On 20 December 2010 16:54, <jgrahn@simulexinc.com> wrote:
>
> > Glad to explain.
> >
> > My argument is a bit simplistic; as a matter of API design, it's
> preferable
> > to have a single return mechanism for multiple returns.
> >
> > I realize there were likely technical reasons for the decision, but it
> > makes for a less uniform API and in particular becomes a greater concern
> if
> > we elect to add new method signatures returning multiple items.
> >
> > I'm not clear on what you mean by the "non-destructive" nature of
> > contents() requiring a remote iterator to be useful.   At my company, we
> > actually wrapped the method to so that we'd ultimately get a collection
> (by
> > exhausting the iterator).
> >
> >
> Non-destructive:
>
> If I have one hundred entry's in a space and I do a batch take of 10 at a
> time assuming there are no other operations I will empty the space after 10
> batch takes.
>
> The same scenario for a batch read does not work. You will never (as the
> spec is now) exhaustively search the entrys. It's entirely acceptable for
> the space to return the same 10 entrys each time you call batch read. Hence
> the need for contents which does some continuous book-keeping that ensures
> you can exhaust the space contents.
>
>
> Also, contents() presumably sets 'read' locks if a transaction is used,
> > creating reservations for future takes, so doesn't the level of
> > 'destructiveness' depend on usage?
> >
> >
> If a transaction is used, locks are set. However it's possible to not pass
> a
> transaction in which case read locks are not asserted. Note also that a
> read
> lock doesn't prevent other read locks thus reservation for a take doesn't
> simply follow.
>
>
> > Now, that's not to say I'm deadset against the remote iterator approach.
> > Remote iterators might save some memory/cpu overhead for truly massive
> > requests, particularly if the user does not necessarily want every entry
> > (though were that the case, maxEntries should have been used).
> >
> >
> How many entrys can you knowingly take/read as a batch without exhausting
> client memory? Difficult to say given one doesn't know how big marshalled
> entrys will be or indeed the amount of free space on the client or indeed
> the server. The result is that large batch takes or indeed reads are
> somewhat undesirable.
>
> Decent remote iterator implementations, incidentally, don't transfer all
> matches in one go - they parcel them out in batches. Large batches
> obviously
> take a long time to transfer and are problematic for clients that want to
> be
> somewhat responsive to their users. Imagine asking for contents of a large
> number of entrys and waiting whilst all of them are transferred (e.g.
> because you want to browse a space).
>
>
> > On the other hand, returning a collection would spare network costs of
> > sustained remote iterator interactions and the mild timing uncertainties
> its
> > usage entails.   And the remote iterator is more complex by its nature.
> >
> >
> Can you explain more about the network costs you envision?
>
> Most remote iterator impls leave the connection open so the window and
> handshake issues suffered by e.g. TCP are eliminated. The same number of
> packets will be transferred give or take the odd frame that is only
> half-full due to the end of a batch being reached.
>
>
> > In any case, I think it would be best to standardize on one or the other.
> >
> > Perhaps as someone involved with Javaspace05, you can illuminate some of
> > the decision making surrounding the current usage of both?
> >
> >
> Some of that is above so I'll stop for now and see what else you ask for
> details of, okay?
>
> Thanks for the explanation, definitely helps....
>
>
>
> > jamesG
> >
> > -----Original Message-----
> > From: "Dan Creswell" <dan.creswell@gmail.com>
> > Sent: Monday, December 20, 2010 4:19am
> > To: river-dev@incubator.apache.org
> > Subject: Re: Space/outrigger suggestions
> >
> > James G,
> >
> > Can you explain some more about this statement please?
> >
> > "3) Collections or remote iterators, not both.
> >
> > "contents" returns a remote iterator named "MatchSet", while "take (with
> > collection)" returns a collection.   I can understand the argument
> > behind both use cases, but not necessarily the argument for using both
> > simultaneously.
> >
> > "
> >
> > This has been heavily discussed in the past and contents(), by virtue of
> > it's non-destructive nature (unlike take) needs something akin to a
> remote
> > iterator to be practical/useful. Multiple takes allow you to eventually
> > exhaust a space's contents, multiple reads won't do similarly.
> >
> > So, given I'm scarred with the previous efforts of space implementation
> > including JavaSpace05 I fear my past is colouring my thinking so I'd like
> > to
> > understand more.
> >
> > Cheers,
> >
> > Dan.
> >
> >
> >
>
>
>
>

--001636284a3477fe7b0497fd2618--