hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: MR missing lines
Date Tue, 18 Dec 2012 13:37:23 GMT
I faced the issue again today...

RowCounter gave me 104313 lines
Here is the output of the job counters:
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_ADDED=81594
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_SIMILAR=434
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_NO_CHANGES=14250
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_DUPLICATE=428
12/12/17 22:32:52 INFO mapred.JobClient:     NON_DELETED_ROWS=0
12/12/17 22:32:52 INFO mapred.JobClient:     ENTRY_EXISTING=7605
12/12/17 22:32:52 INFO mapred.JobClient:     ROWS_PARSED=104311

There is a 2 lines difference between ROWS_PARSED and he counter.
ENTRY_EXISTING are the 5 states an entry can have. Total of all those
counters is equal to the ROWS_PARSED value, so it's alligned. Code is
handling all the possibilities.

The ROWS_PARSED counter is incremented right at the beginning like
that (I removed the comments and javadoc for lisibility):
		 * The comments ...
		public void map(ImmutableBytesWritable row__, Result values, Context
context) throws IOException
			List<KeyValue> KVs = values.list();

				// Get the current row.
				byte[] key = values.getRow();

				// First thing we do, we mark this line to be deleted.
				Delete delete_entry_proposed = new Delete(key);

The deletes_entry_proposed is a list of rows to delete. After each
call to the delete method, the number of remaining lines into this
list is added to NON_DELETED_ROWS which is 0 at the end, so all lines
should be deleted correctly.

I re-ran the rowcounter after the job, and I still have ROWS=5971
lines into the table. I check all my "feeding process" and they are
all closed.

My table is only one CF with one C with one version.

I can guess that the remaining 5971 lines into the table is an error
on my side, but I'm not able to find where since all the counters are
matching. I will add one counter which will add all the entries in the
delete list before calling the delete method. This should match the
number of rows.

Again, I will re-feed the table today with fresh data and re-run the job...


2012/12/17, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
> The job run the morning, and of course, this time, all the rows got
> processed ;)
> So I will give it few other tries and will keep you posted if I'm able
> to reproduce that again.
> Thanks,
> JM
> 2012/12/16, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>> Thanks for the suggestions.
>> I already have logs to display all the exepctions and there is
>> nothing. I can't display the work done, there is to much :(
>> I have counters "counting" the rows processed and they match what is
>> done, minus what is not processed. I have just added few other
>> counters. One right at the beginning, and one to count what are the
>> records remaining on the delete list, as suggested.
>> I will run the job again tomorrow, see the result and keep you posted.
>> JM
>> 2012/12/16, Asaf Mesika <asaf.mesika@gmail.com>:
>>> Did you check the returned array of the delete method to make sure all
>>> records sent for delete have been deleted?
>>> Sent from my iPhone
>>> On 16 בדצמ 2012, at 14:52, Jean-Marc Spaggiari <jean-marc@spaggiari.org>
>>> wrote:
>>>> Hi,
>>>> I have a table where I'm running MR each time is exceding 100 000 rows.
>>>> When the target is reached, all the feeding process are stopped.
>>>> Yesterday it reached 123608 rows. So I stopped the feeding process,
>>>> and ran the MR.
>>>> For each line, the MR is creating a delete. The delete is placed on a
>>>> list, and when the list reached 10 elements, it's sent to the table.
>>>> In the clean method, the list is sent to the table if there is any
>>>> element in it.
>>>> So at the en of the MR, I should have an empty table.
>>>> The table is splitted over 128 regions. And I have 8 region servers.
>>>> What is disturbing me is that after the MR, I had 38 lines remaining
>>>> on the table. the MR took 348 minutes to run. So I ran the MR again,
>>>> which this time took 2 minutes, and now I have 1 row remaining in the
>>>> table.
>>>> I looked at the logs (for the 38 lines run) and there is nothing in
>>>> it. There is some scanner timeout exception for the run of the 100K
>>>> rows.
>>>> I'm running HBase 0.94.3.
>>>> I will hava another 100K rows today, so I will re-run the job. I will
>>>> increase the timeout to make sure I got no exception, but even when I
>>>> ran the 38 lines with no exception one was remaining...
>>>> Any idea why and where I can seach? It's not really an issue for me
>>>> since I can just re-run the job, but this might be an issue for some
>>>> others.
>>>> JM

View raw message