accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <rwe...@newbrightidea.com>
Subject Re: Scan-time iterators returning out-of-order rows
Date Thu, 02 Apr 2015 17:52:05 GMT
Thanks for your response, Christopher.

Yes, I see what you mean by promoting the CQ to the CF. I thought that
would simplify things but if not I could definitely return (k,v) pairs like,

((p0, d, (r15, prop_b)), "just testing"),
((p0, d, (r8, prop_b)), "hello, world")

(leaving the timestamps and visibilities out for clarity, and assuming that
r15 and r8 are encoded such that their lexical order matches their numeric
order)

Leaving the existing schema intact is not a problem for me but it doesn't
get around the fact that the (k,v) pairs would be returned out-of-order by
the iterator. I guess another option would be to embed the "data" kv pairs
into the "index" kv pairs somehow, such as:

((p0, i, ((prop_a, 7, r15), prop_b)), "just testing")
((p0, i, ((prop_a, 8, r8), prop_b)), "hello, world")

I'm not too keen on that solution but if you're telling me that I shouldn't
rely on scan-time iterators being able to emit data out of order... I think
it's the least bad option.

Regards,
-Russ

On Thu, Apr 2, 2015 at 12:04 AM, Christopher <ctubbsii@apache.org> wrote:

> So in your example, you actually did return then in order (lexically, not
> numerically), but I grok the idea that they might not be.
>
> The problem is that your transformation promotes a portion of the cq to the
> cf. That's fine if what your iterator is returning includes only that from
> a single cf (day, the 'data' cf). But otherwise, you could get duplicates
> or out of order results, which can mess up the client's expectations when
> retrieving batches from the servers. It could work in some limited cases,
> but I'd avoid it.
>
> Instead, why not preserve order by preserving the existing schema, and just
> ignore the unused cf in the client?
>
> On Thu, Apr 2, 2015, 00:28 Russ Weeks <rweeks@newbrightidea.com> wrote:
>
> > Thanks, Christopher. It's nice to hear an unambiguous point of view :)
> >
> > Do you see any alternative way of implementing a range scan on a
> > partitioned index? The problem does not exist for exact-match scans
> because
> > the row ID in the index entry CQ provides the correct ordering.
> >
> > Thanks,
> > -Russ
> >
> > On Wed, Apr 1, 2015 at 9:11 PM, Christopher <ctubbsii@apache.org> wrote:
> >
> > > You should definitely not rely on this behavior. It goes against best
> > > practices and is prone to error. It is not recommended.
> > >
> > > On Wed, Apr 1, 2015, 20:03 Russ Weeks <rweeks@newbrightidea.com>
> wrote:
> > >
> > > > A wonderful property of scan-time iterators is that they can emit row
> > IDs
> > > > in arbitrary order. Before I go off and build an index that relies on
> > > this
> > > > behaviour, I'd like to get a sense of how likely it is to exist in
> > future
> > > > versions of Accumulo.
> > > >
> > > > I'd like to build an index like this (hopefully the ascii comes
> > through,
> > > if
> > > > not check here <
> https://gist.github.com/anonymous/1a64114da4b68a2ec822
> > > >):
> > > >
> > > >
> > > >  row   | cf  | cq                | val
> > > > -------------------------------------------------
> > > >  p0    | i   | (prop_a, 7, r15)  | 1
> > > >  p0    | i   | (prop_a, 8, r8)   | 1
> > > >  p0    | i   | (prop_a, 9, r19)  | 1
> > > > [...snip...]
> > > >  p0    | d   | (r8, prop_a)      | 8
> > > >  p0    | d   | (r8, prop_b)      | hello, world
> > > >  p0    | d   | (r15, prop_a)     | 7
> > > >  p0    | d   | (r15, prop_b)     | just testing
> > > >  p0    | d   | (r19, prop_a)     | 9
> > > >  p0    | d   | (r19, prop_b)     | something else
> > > >
> > > > Which is a pretty conventional partitioned index. I'd like to be able
> > to
> > > > issue a query like, "Tell me about prop_b for all documents where
> > prop_a
> > > <
> > > > 9" but I'm pretty sure that the only way this could work at scale is
> if
> > > > it's OK for the iterator to return (p0, r15, prop_b, "just testing")
> > > > followed by (p0, r8, prop_b, "hello, world").
> > > >
> > > > This works today - if you folks see any flaws in my reasoning please
> > let
> > > me
> > > > know - my question is, do you see this as functionality that should
> be
> > > > preserved in the future?
> > > >
> > > > Thanks,
> > > > -Russ
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message