accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dylan Hutchison <dhutc...@cs.washington.edu>
Subject Re: Sorted RowId suffix retrieval using Server Side Iterators
Date Thu, 06 Jul 2017 07:53:57 GMT
Let's see if I understand your question.  The queries are range queries on
P over the POS table.  Within each range, you would like to sort the S
values (a suffix of the Key) retrieved.

Are your range queries *small enough to fit in memory*?  If so, you could
gather all the entries in the range together, either at a client or in a
server-side iterator, and sort the S values.  The server-side iterator
approach will only work if your S values are stored in the Column portion
of the key (not the Row), because if they are stored in the Row then the
range query may hit multiple tablets which could be stored on separate
tablet servers.  Of course, you could construct a partial list of the S
values seen in each tablet.

If your range queries exceed memory, then you might try an external sorting
method or create an index on S.

The right choice depends on what you would like to do with the S values.

On Wed, Jul 5, 2017 at 11:39 PM, damodaram.sundaram@harman.com <
damodaram.sundaram@harman.com> wrote:

> We are storing the RDF statement data to Accumulo in the
> POS(Predicate,Object, Subject) fashion. The table is designed to store 100
> million records.
>
> Ex:
> p1|o1|s1
> p1|o1|s5
> p1|o2|s3
> p1|o2|s2
> p2|o1|s4
>
> The data is sorted based on the fist two parts of the key, (p1 & o1 etc).
>
> When I apply a prefix range with (p1|o1  to p2|o1), I could get the
> subjects
> in the order [s1, s5, s3, s2, s4].
>
> But with the my scan would perform back and forth on the table and I would
> be interested to get the list of subjects as [s1, s2, s3, s4, s5] while
> reading through the iterators.
>
> Is there anyway I can get the above result ?
>
> Also, on the same table if I apply the Range filter then I would get
> distinct order sets like [s2, s3, s5] and [s200, s150, s500] etc. Even in
> this case, how should I make the scanner to read the data in the single
> sorted order.
>
>
>
>
>
>
>
>
>
>
>
> --
> View this message in context: http://apache-accumulo.
> 1065345.n5.nabble.com/Sorted-RowId-suffix-retrieval-using-
> Server-Side-Iterators-tp21787.html
> Sent from the Developers mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message