hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Row Filters in TableInputFormatBase
Date Sat, 07 Feb 2009 21:21:16 GMT
On Wed, Feb 4, 2009 at 4:09 PM, Dave Latham <latham@davelink.net> wrote:

> In order to speed up a map reduce job operating on HBase input data, we
> recently added a RowFilter to the input format.  However, when trying to
> execute it, map tasks (one per region) that used to take 1-2 minutes began
> timing out after 10 minutes.  So I dug in to TableInputFormatBase to see
> how
> it handles a row filter, and it appears to take out filter and combine it
> with a StopRowFilter in order to scan the proper split, since there is no
> getScanner method that can accept both a stop row and a row filter.
>  Digging
> further in to the scanning / filtering, it looks like it continues scanning
> filterAllRemaining returns true.  However,
> StopRowFilter.filterAllRemaining() always returns false.  So if my
> understanding is correct, every split in this task will end up scanning to
> the end of the table and testing every row with the filter instead of
> simply
> stopping at the end of it's given split.  That would explain why my map
> tasks began taking longer (instead of shorter).

> 1. Is my understanding correct?  (aka is this a bug?  If so, I don't see an
> existing JIRA issue for it -- I can open one if no one else does.)

Sounds like a bug (and an explanation for long-running jobs) but, IIUC, stop
row filter supposed to have a 'stop row' embedded and once filter passes it
out, then we stop filltering?  If thats not going on, lets fix it.

P.S. Thanks for digging in.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message