crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Urso McDaniel <>
Subject Re: HBase & Crunch: multiple scans for a single PTable
Date Mon, 10 Mar 2014 18:43:16 GMT
On Mon, Apr 8, 2013 at 2:09 PM, Micah Whitacre <> wrote:

> We have a hack of a MultiScanTableInputFormat based off of one of the
> earlier patches.  It is nice because it gives us the functionality we
> wanted but does have issues such as not honoring filters per scan object,
> limit with then number of scans that can be serialized, and some overhead
> cost kicking off the multiple scans.
> Based on that we actually took the approach of trying to get HBASE-3996
> resolved so Crunch could have a first class Source which utilizes the new
> input format.  Of course that is dependent on you coding against that API
> and us being able to upgrade to 0.94.5.  So I was asking from a "when would
> this fit onto Crunch's roadmap?" perspective.
> We actually found that a custom filter with good hints for jumping
> sections can be as performant as our forked custom
> MultiScanTableInputFormat.

Could you share the changes for the custom MultiScalTableInputFormat and
the custom filter implementation or anything new?


View raw message