hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: MR missing lines
Date Thu, 20 Dec 2012 00:39:14 GMT
Hi Anoop,

Thanks for the hint! Even if it's not fixing my issue, at least my
tests are going to be faster.

I will take a look at the documentation to understand what
deleteColumn was doing.

JM

2012/12/19, Anoop Sam John <anoopsj@huawei.com>:
> Jean:  just one thought after seeing the description and the code.. Not
> related to the missing as such
>
> You want to delete the row fully right?
>>My table is only one CF with one C with one version
> And your code is like
>>  Delete delete_entry_proposed = new Delete(key);
>>  delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
>> KVs.get(0).getQualifier());
>
> deleteColumn() is useful when you want to delete specific column's specific
> version in a row.  In your case this may be really not needed. Just Delete
> delete_entry_proposed = new Delete(key);  may be enough so that the delete
> type is ROW delete.
>
> You can see the javadoc of the deleteColumn() API in which it clearly says
> it is an expensive op. At the server side there will be a need to do a Get
> call..
> In your case these are really unwanted over head .. I think...
>
> -Anoop-
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Tuesday, December 18, 2012 7:07 PM
> To: user@hbase.apache.org
> Subject: Re: MR missing lines
>
> I faced the issue again today...
>
> RowCounter gave me 104313 lines
> Here is the output of the job counters:
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_ADDED=81594
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_SIMILAR=434
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_NO_CHANGES=14250
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_DUPLICATE=428
> 12/12/17 22:32:52 INFO mapred.JobClient:     NON_DELETED_ROWS=0
> 12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_EXISTING=7605
> 12/12/17 22:32:52 INFO mapred.JobClient:     ROWS_PARSED=104311
>
> There is a 2 lines difference between ROWS_PARSED and he counter.
> ENTRY_ADDED, ENTRY_SIMILAR, ENTRY_NO_CHANGES, ENTRY_DUPLICATE and
> ENTRY_EXISTING are the 5 states an entry can have. Total of all those
> counters is equal to the ROWS_PARSED value, so it's alligned. Code is
> handling all the possibilities.
>
> The ROWS_PARSED counter is incremented right at the beginning like
> that (I removed the comments and javadoc for lisibility):
>                 /**
>                  * The comments ...
>                  */
>                 @Override
>                 public void map(ImmutableBytesWritable row__, Result values,
> Context
> context) throws IOException
>                 {
>
>
> context.getCounter(Counters.ROWS_PARSED).increment(1);
>                         List<KeyValue> KVs = values.list();
>                         try
>                         {
>
>                                 // Get the current row.
>                                 byte[] key = values.getRow();
>
>                                 // First thing we do, we mark this line to
> be deleted.
>                                 Delete delete_entry_proposed = new
> Delete(key);
>
> delete_entry_proposed.deleteColumn(KVs.get(0).getFamily(),
> KVs.get(0).getQualifier());
>
> deletes_entry_proposed.add(delete_entry_proposed);
>
>
> The deletes_entry_proposed is a list of rows to delete. After each
> call to the delete method, the number of remaining lines into this
> list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
> should be deleted correctly.
>
> I re-ran the rowcounter after the job, and I still have ROWS=5971
> lines into the table. I check all my "feeding process" and they are
> all closed.
>
> My table is only one CF with one C with one version.
>
> I can guess that the remaining 5971 lines into the table is an error
> on my side, but I'm not able to find where since all the counters are
> matching. I will add one counter which will add all the entries in the
> delete list before calling the delete method. This should match the
> number of rows.
>
> Again, I will re-feed the table today with fresh data and re-run the job...
>
> JM
>
> 2012/12/17, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>> The job run the morning, and of course, this time, all the rows got
>> processed ;)
>>
>> So I will give it few other tries and will keep you posted if I'm able
>> to reproduce that again.
>>
>> Thanks,
>>
>> JM
>>
>> 2012/12/16, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>>> Thanks for the suggestions.
>>>
>>> I already have logs to display all the exepctions and there is
>>> nothing. I can't display the work done, there is to much :(
>>>
>>> I have counters "counting" the rows processed and they match what is
>>> done, minus what is not processed. I have just added few other
>>> counters. One right at the beginning, and one to count what are the
>>> records remaining on the delete list, as suggested.
>>>
>>> I will run the job again tomorrow, see the result and keep you posted.
>>>
>>> JM
>>>
>>>
>>> 2012/12/16, Asaf Mesika <asaf.mesika@gmail.com>:
>>>> Did you check the returned array of the delete method to make sure all
>>>> records sent for delete have been deleted?
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari
>>>> <jean-marc@spaggiari.org>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have a table where I'm running MR each time is exceding 100 000
>>>>> rows.
>>>>>
>>>>> When the target is reached, all the feeding process are stopped.
>>>>>
>>>>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>>>>> and ran the MR.
>>>>>
>>>>> For each line, the MR is creating a delete. The delete is placed on a
>>>>> list, and when the list reached 10 elements, it's sent to the table.
>>>>> In the clean method, the list is sent to the table if there is any
>>>>> element in it.
>>>>>
>>>>> So at the en of the MR, I should have an empty table.
>>>>>
>>>>> The table is splitted over 128 regions. And I have 8 region servers.
>>>>>
>>>>> What is disturbing me is that after the MR, I had 38 lines remaining
>>>>> on the table. the MR took 348 minutes to run. So I ran the MR again,
>>>>> which this time took 2 minutes, and now I have 1 row remaining in the
>>>>> table.
>>>>>
>>>>> I looked at the logs (for the 38 lines run) and there is nothing in
>>>>> it. There is some scanner timeout exception for the run of the 100K
>>>>> rows.
>>>>>
>>>>> I'm running HBase 0.94.3.
>>>>>
>>>>> I will hava another 100K rows today, so I will re-run the job. I will
>>>>> increase the timeout to make sure I got no exception, but even when I
>>>>> ran the 38 lines with no exception one was remaining...
>>>>>
>>>>> Any idea why and where I can seach? It's not really an issue for me
>>>>> since I can just re-run the job, but this might be an issue for some
>>>>> others.
>>>>>
>>>>> JM
>>>>
>>>
>>

Mime
View raw message