accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <>
Subject Re: Sorted RowId suffix retrieval using Server Side Iterators
Date Mon, 10 Jul 2017 10:15:24 GMT
You might be able to take a batched approach, using server-side iterators
to gather as many S's from POS rows as possible at each tablet server up to
a memory budget, and then querying the SPO table from inside those
iterators.  (With some caution to be mindful of tablet server thread
limits, you can scan another table from inside a server-side iterator.)
 This likely has the effect of querying the same SPO data multiple times,
which may or may not be acceptable.

Another alternative is a MapReduce job.

By the way, you don't necessarily need to sort the S's in order to query
the SPO table.  It depends on how you do the query, such as by providing a
collection of ranges to a Scanner / BatchScanner or doing server-side

Cheers, Dylan

On Thu, Jul 6, 2017 at 3:05 AM, <> wrote:

> Thanks for your reply Dylan.
> *Are your range queries *small enough to fit in memory*?* Not likely,
> because given condition on POS table might result few hundred thousands as
> I
> am talking about my table would be 100M. Hence, I might not be able to
> store
> them in the memory to the Sorting and I might end up getting memory issues.
> My tables are built with RowIds as  POS in it and not on the column family
> as I am looking at each cell of my relational data into a single Row at
> accumulo.
> The 'S values' will be used to query the SPO table with prefix filter on S,
> which is stored (Subject|Predicate|Object). If my subjects are in the
> sorted
> order then I would not need to put much effort while querying with "List of
> Order Set of Subjects".
> --
> View this message in context: http://apache-accumulo.
> Server-Side-Iterators-tp21787p21791.html
> Sent from the Developers mailing list archive at

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message