hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: missing rows in MR process
Date Mon, 08 Sep 2008 17:59:33 GMT
Dru Jensen wrote:
> Hi StAck,
> No, i don't think I'm hitting this.  The first MR process is using in: 
> SequenceInputFileFormat out: TableReduce.  The second is using in: 
> TableMap out TableReduce.  I don't think the out-of-the-box TableMap 
> is using a filter, correct?
It looks like it is.

The TableMap job makes a task per region by default.  Each task then 
runs a scanner whose compass is defined by the region start/end row.  
When you get a scanner specifying a start/end row, it eventually does 
the following:

  public Scanner getScanner(final byte [][] columns,
    final byte [] startRow, final byte [] stopRow, final long timestamp)
  throws IOException {
    return getScanner(columns, startRow, timestamp,
      new WhileMatchRowFilter(new StopRowFilter(stopRow)));

... i.e. put in place a StopRowFilter.

So, maybe you are tripping over 856.


> Dru
> On Sep 5, 2008, at 3:59 PM, stack wrote:
>> This is odd Dru.  Do you think you are seeing 
>> https://issues.apache.org/jira/browse/HBASE-856?  Are you using filters?
>> St.Ack
>> Dru Jensen wrote:
>>> hbase-users,
>>> I have two MR processes that run one right after the other in a 
>>> script.  The first reads from a file and populates a table.  The 
>>> second uses a TableMap over that table that was just populated.
>>> The first MR process inserted 1950 rows successfully and everything 
>>> looked correct.  For some reason the second MR process only got 76 
>>> rows as input.  I ran the exact same MR process and the second time 
>>> it got all 1950 rows.
>>> Is there some time delay between the MR batch update of the first 
>>> process and the scan of the second?  How can i make sure this commit 
>>> is complete before launching the second MR process?
>>> This is using the Release Candidate 0.2.1 running on Hadoop
>>> thanks,
>>> Dru

View raw message