accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Implementing Index Table for Accumulo Hive Queries
Date Mon, 09 Jan 2017 20:06:16 GMT
If you have that info, yeah I think you could.

The lifecycle of those queries is a bit strange (and, IIRC, different 
depending on the execution engine Hive uses).

Experimentation is definitely the way forward :). Let me know if you 
need any help -- I'm happy to at least try to help. If you come up with 
something generic enough, it'd be great to contribute it back to Hive 
(which I can also help with).

Fagan, Michael wrote:
> Josh,
> Thanks, it looks like If I can override the getRanges() from the AccumuloPredicateHandler
I might be able to build correct ranges based on matching index rows.
> Does this sound feasible?
> Regards,
> Mike Fagan
> On 1/9/17, 12:38 PM, "Josh Elser"<>  wrote:
>      Hi Mike,
>      As far as I understand it, the Hive storage handler APIs (which is how
>      the Accumulo integration is implemented) doesn't expose any ability to
>      do use index tables to answer some query.
>      This means that the only thing you can do to make queries faster, would
>      be to create a number of tables, pivoted on the columns you care about,
>      putting the important columns in the rowId. Then, you would have to know
>      which table to use at the application layer.
>      Admittedly, this is pretty lacking. I'd have to go look at the Hive
>      community to see if this is something that's been built there.
>      - Josh
>      Fagan, Michael wrote:
>      >  Hi,
>      >
>      >  I am looking to utilize an index table to avoid full table scans and speed
up hive queries against an external accumulo table.
>      >
>      >  Has anyone done this yet? Can someone point me in the right direction?
>      >
>      >  Regards,
>      >  Mike Fagan
>      >

View raw message