accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Sorted RowId suffix retrieval using Server Side Iterators
Date Tue, 11 Jul 2017 02:15:17 GMT
On Mon, Jul 10, 2017 at 6:15 AM, Dylan Hutchison
<dhutchis@cs.washington.edu> wrote:
> You might be able to take a batched approach, using server-side iterators
> to gather as many S's from POS rows as possible at each tablet server up to
> a memory budget, and then querying the SPO table from inside those
> iterators.  (With some caution to be mindful of tablet server thread
> limits, you can scan another table from inside a server-side iterator.)
>  This likely has the effect of querying the same SPO data multiple times,
> which may or may not be acceptable.
>
> Another alternative is a MapReduce job.
>
> By the way, you don't necessarily need to sort the S's in order to query
> the SPO table.  It depends on how you do the query, such as by providing a
> collection of ranges to a Scanner / BatchScanner or doing server-side
> filtering.

+1 to that. Dropping the requirement to get a sorted list of subjects
for some pair P-O would make a server-side filter much easier. You can
also play tricks like doing a "limited" deduplication server-side. You
can hold up to N subjects server-side to avoid running out of memory,
and then perform a final deduplication client-side.

> Cheers, Dylan
>
> On Thu, Jul 6, 2017 at 3:05 AM, damodaram.sundaram@harman.com <
> damodaram.sundaram@harman.com> wrote:
>
>> Thanks for your reply Dylan.
>>
>> *Are your range queries *small enough to fit in memory*?* Not likely,
>> because given condition on POS table might result few hundred thousands as
>> I
>> am talking about my table would be 100M. Hence, I might not be able to
>> store
>> them in the memory to the Sorting and I might end up getting memory issues.
>>
>> My tables are built with RowIds as  POS in it and not on the column family
>> as I am looking at each cell of my relational data into a single Row at
>> accumulo.
>>
>> The 'S values' will be used to query the SPO table with prefix filter on S,
>> which is stored (Subject|Predicate|Object). If my subjects are in the
>> sorted
>> order then I would not need to put much effort while querying with "List of
>> Order Set of Subjects".
>>
>>
>>
>> --
>> View this message in context: http://apache-accumulo.
>> 1065345.n5.nabble.com/Sorted-RowId-suffix-retrieval-using-
>> Server-Side-Iterators-tp21787p21791.html
>> Sent from the Developers mailing list archive at Nabble.com.
>>

Mime
View raw message