accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fagan, Michael" <>
Subject Re: Implementing Index Table for Accumulo Hive Queries
Date Thu, 12 Jan 2017 19:21:26 GMT

Thanks. I have a working solution by enhancing AccumuloRangeGenerator. Have started the process
internally to contribute back. 
I will definitely appreciate and need your help in getting the code into the community.

Mike Fagan

On 1/9/17, 1:06 PM, "Josh Elser" <> wrote:

    If you have that info, yeah I think you could.
    The lifecycle of those queries is a bit strange (and, IIRC, different 
    depending on the execution engine Hive uses).
    Experimentation is definitely the way forward :). Let me know if you 
    need any help -- I'm happy to at least try to help. If you come up with 
    something generic enough, it'd be great to contribute it back to Hive 
    (which I can also help with).
    Fagan, Michael wrote:
    > Josh,
    > Thanks, it looks like If I can override the getRanges() from the AccumuloPredicateHandler
I might be able to build correct ranges based on matching index rows.
    > Does this sound feasible?
    > Regards,
    > Mike Fagan
    > On 1/9/17, 12:38 PM, "Josh Elser"<>  wrote:
    >      Hi Mike,
    >      As far as I understand it, the Hive storage handler APIs (which is how
    >      the Accumulo integration is implemented) doesn't expose any ability to
    >      do use index tables to answer some query.
    >      This means that the only thing you can do to make queries faster, would
    >      be to create a number of tables, pivoted on the columns you care about,
    >      putting the important columns in the rowId. Then, you would have to know
    >      which table to use at the application layer.
    >      Admittedly, this is pretty lacking. I'd have to go look at the Hive
    >      community to see if this is something that's been built there.
    >      - Josh
    >      Fagan, Michael wrote:
    >      >  Hi,
    >      >
    >      >  I am looking to utilize an index table to avoid full table scans and speed
up hive queries against an external accumulo table.
    >      >
    >      >  Has anyone done this yet? Can someone point me in the right direction?
    >      >
    >      >  Regards,
    >      >  Mike Fagan
    >      >

View raw message