hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dru Jensen <drujen...@gmail.com>
Subject Re: missing rows in MR process
Date Mon, 08 Sep 2008 18:08:59 GMT
Aaahh Yes. Thanks.

On Sep 8, 2008, at 10:59 AM, stack wrote:

> Dru Jensen wrote:
>> Hi StAck,
>>
>> No, i don't think I'm hitting this.  The first MR process is using  
>> in: SequenceInputFileFormat out: TableReduce.  The second is using  
>> in: TableMap out TableReduce.  I don't think the out-of-the-box  
>> TableMap is using a filter, correct?
>>
> It looks like it is.
>
> The TableMap job makes a task per region by default.  Each task then  
> runs a scanner whose compass is defined by the region start/end  
> row.  When you get a scanner specifying a start/end row, it  
> eventually does the following:
>
> public Scanner getScanner(final byte [][] columns,
>   final byte [] startRow, final byte [] stopRow, final long timestamp)
> throws IOException {
>   return getScanner(columns, startRow, timestamp,
>     new WhileMatchRowFilter(new StopRowFilter(stopRow)));
> }
>
> ... i.e. put in place a StopRowFilter.
>
> So, maybe you are tripping over 856.
>
> St.Ack
>
>
>> Dru
>>
>> On Sep 5, 2008, at 3:59 PM, stack wrote:
>>
>>> This is odd Dru.  Do you think you are seeing https://issues.apache.org/jira/browse/HBASE-856?

>>>   Are you using filters?
>>> St.Ack
>>>
>>>
>>> Dru Jensen wrote:
>>>> hbase-users,
>>>>
>>>> I have two MR processes that run one right after the other in a  
>>>> script.  The first reads from a file and populates a table.  The  
>>>> second uses a TableMap over that table that was just populated.
>>>>
>>>> The first MR process inserted 1950 rows successfully and  
>>>> everything looked correct.  For some reason the second MR process  
>>>> only got 76 rows as input.  I ran the exact same MR process and  
>>>> the second time it got all 1950 rows.
>>>>
>>>> Is there some time delay between the MR batch update of the first  
>>>> process and the scan of the second?  How can i make sure this  
>>>> commit is complete before launching the second MR process?
>>>>
>>>> This is using the Release Candidate 0.2.1 running on Hadoop  
>>>> 0.17.2.1.
>>>>
>>>> thanks,
>>>> Dru
>>>>
>>>>
>>>
>>
>


Mime
View raw message