river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jgr...@simulexinc.com
Subject Space/outrigger suggestions (remote iterator vs. collection)
Date Wed, 22 Dec 2010 09:19:09 GMT
My current email client is not advanced enough to do inline, but I think I'm following your

Successive calls of contents may retrieve the same objects, so merely calling contents multiple
times wouldn't provide the functionality of running through a space.   Thus, the remote iterator
was introduced in order to provide the ability to exhaustively read space on an iterative-type

Meanwhile, the takeMultipleLimit in Outrigger that limits the returned collection size isn't
a practical hindrance because successive takeMultiples will eventually grab everything from
space, whether it happens all at once or not.   The same could be said of a client "sipping"
from the space a couple entries at a time via maxEntries.

The case for the remote iterator stands reasonably well-made, then: it keeps memory overhead
fairly low (beholden to the size of actual entries), and at minimal network cost.   It could
only be reasonably replaced with a collection of all matching entries, which would not be
satisfactory for underpowered clients.

So my next question would be: why not use a remote iterator for the takeMultiple?

Using a remote iterator would presumably eliminate things like takeMultipleLimit, removing
the case where the client receives fewer than the maxEntries requested when they are available.
  Indeed, takeMultipleLimit would effectively be replaced with "takeMultipleBatchSize", largely
transparent to the end user.   We'd gain a uniform return type for multiple entry fetches.

Remote iterator usage with takeMultiple would require more network use, but perhaps (wild
speculation) not much more than a call to contents with a transaction.   (Would also need
to compare remote iterator to successive calls to "take" in evaluating network cost.)   Any
pitfalls I'm missing?


PS: Apparently I need to study up on read lock semantics; please excuse the confusion.

-----Original Message-----
From: "Dan Creswell" <dan.creswell@gmail.com>
Sent: Monday, December 20, 2010 12:15pm
To: river-dev@incubator.apache.org
Subject: Re: Space/outrigger suggestions

K, so inline.....

On 20 December 2010 16:54, <jgrahn@simulexinc.com> wrote:

> Glad to explain.
> My argument is a bit simplistic; as a matter of API design, it's preferable
> to have a single return mechanism for multiple returns.
> I realize there were likely technical reasons for the decision, but it
> makes for a less uniform API and in particular becomes a greater concern if
> we elect to add new method signatures returning multiple items.
> I'm not clear on what you mean by the "non-destructive" nature of
> contents() requiring a remote iterator to be useful.   At my company, we
> actually wrapped the method to so that we'd ultimately get a collection (by
> exhausting the iterator).

If I have one hundred entry's in a space and I do a batch take of 10 at a
time assuming there are no other operations I will empty the space after 10
batch takes.

The same scenario for a batch read does not work. You will never (as the
spec is now) exhaustively search the entrys. It's entirely acceptable for
the space to return the same 10 entrys each time you call batch read. Hence
the need for contents which does some continuous book-keeping that ensures
you can exhaust the space contents.

Also, contents() presumably sets 'read' locks if a transaction is used,
> creating reservations for future takes, so doesn't the level of
> 'destructiveness' depend on usage?
If a transaction is used, locks are set. However it's possible to not pass a
transaction in which case read locks are not asserted. Note also that a read
lock doesn't prevent other read locks thus reservation for a take doesn't
simply follow.

> Now, that's not to say I'm deadset against the remote iterator approach.
> Remote iterators might save some memory/cpu overhead for truly massive
> requests, particularly if the user does not necessarily want every entry
> (though were that the case, maxEntries should have been used).
How many entrys can you knowingly take/read as a batch without exhausting
client memory? Difficult to say given one doesn't know how big marshalled
entrys will be or indeed the amount of free space on the client or indeed
the server. The result is that large batch takes or indeed reads are
somewhat undesirable.

Decent remote iterator implementations, incidentally, don't transfer all
matches in one go - they parcel them out in batches. Large batches obviously
take a long time to transfer and are problematic for clients that want to be
somewhat responsive to their users. Imagine asking for contents of a large
number of entrys and waiting whilst all of them are transferred (e.g.
because you want to browse a space).

> On the other hand, returning a collection would spare network costs of
> sustained remote iterator interactions and the mild timing uncertainties its
> usage entails.   And the remote iterator is more complex by its nature.
Can you explain more about the network costs you envision?

Most remote iterator impls leave the connection open so the window and
handshake issues suffered by e.g. TCP are eliminated. The same number of
packets will be transferred give or take the odd frame that is only
half-full due to the end of a batch being reached.

> In any case, I think it would be best to standardize on one or the other.
> Perhaps as someone involved with Javaspace05, you can illuminate some of
> the decision making surrounding the current usage of both?
Some of that is above so I'll stop for now and see what else you ask for
details of, okay?

Thanks for the explanation, definitely helps....

> jamesG
> -----Original Message-----
> From: "Dan Creswell" <dan.creswell@gmail.com>
> Sent: Monday, December 20, 2010 4:19am
> To: river-dev@incubator.apache.org
> Subject: Re: Space/outrigger suggestions
> James G,
> Can you explain some more about this statement please?
> "3) Collections or remote iterators, not both.
> "contents" returns a remote iterator named "MatchSet", while "take (with
> collection)" returns a collection.   I can understand the argument
> behind both use cases, but not necessarily the argument for using both
> simultaneously.
> "
> This has been heavily discussed in the past and contents(), by virtue of
> it's non-destructive nature (unlike take) needs something akin to a remote
> iterator to be practical/useful. Multiple takes allow you to eventually
> exhaust a space's contents, multiple reads won't do similarly.
> So, given I'm scarred with the previous efforts of space implementation
> including JavaSpace05 I fear my past is colouring my thinking so I'd like
> to
> understand more.
> Cheers,
> Dan.

View raw message