accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Implementing Index Table for Accumulo Hive Queries
Date Fri, 13 Jan 2017 15:59:59 GMT
Awesome! Sounds great. When your have your internal 'ducks in a row', 
feel free to ping me via email or on JIRA directly :)

Fagan, Michael wrote:
> Josh,
>
> Thanks. I have a working solution by enhancing AccumuloRangeGenerator. Have started the
process internally to contribute back.
> I will definitely appreciate and need your help in getting the code into the community.
>
> Regards,
> Mike Fagan
>
>
>
> On 1/9/17, 1:06 PM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>
>      If you have that info, yeah I think you could.
>
>      The lifecycle of those queries is a bit strange (and, IIRC, different
>      depending on the execution engine Hive uses).
>
>      Experimentation is definitely the way forward :). Let me know if you
>      need any help -- I'm happy to at least try to help. If you come up with
>      something generic enough, it'd be great to contribute it back to Hive
>      (which I can also help with).
>
>      Fagan, Michael wrote:
>      >  Josh,
>      >
>      >  Thanks, it looks like If I can override the getRanges() from the AccumuloPredicateHandler
I might be able to build correct ranges based on matching index rows.
>      >  Does this sound feasible?
>      >
>      >  Regards,
>      >  Mike Fagan
>      >
>      >  On 1/9/17, 12:38 PM, "Josh Elser"<josh.elser@gmail.com>   wrote:
>      >
>      >       Hi Mike,
>      >
>      >       As far as I understand it, the Hive storage handler APIs (which is how
>      >       the Accumulo integration is implemented) doesn't expose any ability to
>      >       do use index tables to answer some query.
>      >
>      >       This means that the only thing you can do to make queries faster, would
>      >       be to create a number of tables, pivoted on the columns you care about,
>      >       putting the important columns in the rowId. Then, you would have to know
>      >       which table to use at the application layer.
>      >
>      >       Admittedly, this is pretty lacking. I'd have to go look at the Hive
>      >       community to see if this is something that's been built there.
>      >
>      >       - Josh
>      >
>      >       Fagan, Michael wrote:
>      >       >   Hi,
>      >       >
>      >       >   I am looking to utilize an index table to avoid full table scans
and speed up hive queries against an external accumulo table.
>      >       >
>      >       >   Has anyone done this yet? Can someone point me in the right direction?
>      >       >
>      >       >   Regards,
>      >       >   Mike Fagan
>      >       >
>      >
>      >
>      >
>
>
>

Mime
View raw message