hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Sechrist <ssechr...@gmail.com>
Subject Re: Scan isn't processing all rows
Date Tue, 22 Mar 2011 18:27:23 GMT
Okay, so I figured out what was going wrong:

The property hbase.regionserver.lease.period was 120s on the machine I
submitted the job from, but was only 60s on the RegionServer.

This caused the scanner to timeout on the region server. But when the next
HTable.ClientScanner.next() call got the UnknownScannerException sent back
from the region server, it didn't think it was actually timed out, since the
time elapsed was <120s. So instead of a ScannerTimeoutException being
thrown, it was treated as if it was a NSRE.

I haven't been able to trace through the code to see *exactly* why that
causes rows to be skipped, but I have a very simple job that will
consistently reproduce the problem, see here: http://pastebin.com/H5Ymq9UJ

-Sean

On Mon, Mar 21, 2011 at 2:28 PM, Sean Sechrist <ssechrist@gmail.com> wrote:

> The missed rows are not found near the beginning or end of a task (or
> region). It's also a *read* problem - all of the writes that it tries to do
> are fine. There is just not the corect number of input records from the
> scan.
>
> Thanks,
> Sean
>
>
> On Mon, Mar 21, 2011 at 2:13 PM, Michael Segel <michael_segel@hotmail.com>wrote:
>
>>  Which release?
>>
>> I thought that when you issued the flush() command that it was a
>> 'blocking' command in that it didn't return control back until after the
>> flush() completes?
>>
>> In Sean's response, the losses were in some of the threads and not all of
>> them. He didn't indicate where the records were in each region.
>> If as you say, the flush() command is too close to the close() statement,
>> then these records would be at the end of the region.
>>
>> Another area to look in to is what is happening when the table's regions
>> split during the write.
>>
>>
>> HTH
>>
>> -Mike
>>
>> PS. When I tried writing to user@hbase.apache.org, the e-mails bounced.
>> Not sure why...
>>
>> > Date: Mon, 21 Mar 2011 13:55:02 -0400
>> > From: javamann@cox.net
>> > To: user@hbase.apache.org; ssechrist@gmail.com
>> > Subject: Re: FW: Scan isn't processing all rows
>>
>> >
>> > I had a problem of lost rows when the flush was right before the close
>> statement.
>> > ---- Sean Sechrist <ssechrist@gmail.com> wrote:
>> >
>> > =============
>> > Accidentally dropped the user list of this email exchange. Anyone have
>> any
>> > other ideas here?
>> >
>> > But using scanner caching of 1 fixes the problem, as suspected. So now
>> I'll
>> > investigate why the scanner cache is being lost.
>> >
>> > Thanks,
>> > Sean
>> >
>> > On Mon, Mar 21, 2011 at 11:06 AM, Sean Sechrist <ssechrist@gmail.com>
>> wrote:
>> >
>> > > Hey Mike, thanks for the response.
>> > >
>> > > > This would mean that you have 184 mappers, right?
>> > >
>> > > We actually had 43 mappers (43 regions in the source table).
>> > >
>> > > > If this is correct, then it appears that you are losing only the
>> records
>> > > cached once per mapper task.
>> > > > It would be interesting to see if this happened in the first set of
>> > > cached rows, or if it happens in the last
>> > > > set of cached rows.
>> > >
>> > > So it actually happens (possibly) more than once per task. For
>> example, for
>> > > the first 10 tasks, here are the numbers of missed records:
>> > >
>> > > 0, 0, 3996, 4995, 0, 0, 999, 1998, 3996, 999
>> > >
>> > > > My next suggestion is to turn off the scan caching.
>> > >
>> > > Good idea, I'll see if that works.
>> > >
>> > > Thanks,
>> > > Sean
>> > >
>> > > On Mon, Mar 21, 2011 at 10:39 AM, Michael Segel <
>> michael_segel@hotmail.com
>> > > > wrote:
>> > >
>> > >> For some reason my e-mail to the hbase list failed....
>> > >>
>> > >>
>> > >> ------------------------------
>> > >> From: michael_segel@hotmail.com
>> > >>
>> > >> To: user@hbase.apache.org
>> > >> Subject: RE: Scan isn't processing all rows
>> > >> Date: Mon, 21 Mar 2011 09:37:06 -0500
>> > >>
>> > >> Sean,
>> > >> Ok...
>> > >>
>> > >> Lets think about this...
>> > >>
>> > >> You're saying that without the actual put, your application is
>> reading all
>> > >> of the rows and they are being processed correctly.
>> > >> You said that when you add the put() to the second table, it appears
>> that
>> > >> rows that were scanned are in the cache are lost. So that you are
>> missing
>> > >> multiples of 999 rows.
>> > >> Based on your example...
>> > >>
>> > >> > To get a sense of how many we are missing, the latest run missed
>> 183,816
>> > >> out
>> > >> > of 29,572,075 rows in the source table.
>> > >>
>> > >> This would mean that you have 184 mappers, right?
>> > >>
>> > >> If this is correct, then it appears that you are losing only the
>> records
>> > >> cached once per mapper task.
>> > >> It would be interesting to see if this happened in the first set of
>> cached
>> > >> rows, or if it happens in the last set of cached rows.
>> > >> (You can see this by seeing which rows are missing and where they are
>> in
>> > >> the HTable region based on their row key.)
>> > >>
>> > >> My next suggestion is to turn off the scan caching.
>> > >> You will obviously take a little performance hit, but that should
>> clean up
>> > >> the problem.
>> > >>
>> > >> If that works, then you should be able to start to look at your code
>> to
>> > >> see what's causing the failure.
>> > >>
>> > >> HTH
>> > >>
>> > >> -Mike
>> > >>
>> > >> > From: ssechrist@gmail.com
>> > >> > Date: Mon, 21 Mar 2011 09:01:32 -0400
>> > >> > Subject: Re: Scan isn't processing all rows
>> > >>
>> > >> > To: user@hbase.apache.org
>> > >> >
>> > >> > Okay, I've tried that test, as well as making sure speculative
>> execution
>> > >> is
>> > >> > turned off. Neither made a difference. It's not only a problem
with
>> > >> writing
>> > >> > to the target table - The number of map input records for the
job
>> is
>> > >> wrong,
>> > >> > as well. But it's correct when we run jobs that do not write to
>> HBase,
>> > >> such
>> > >> > as a row count.
>> > >> >
>> > >> > I ran another job to calculate the number of missed rows per region
>> of
>> > >> the
>> > >> > source table (which is not consistent between runs), by comparing
>> the
>> > >> source
>> > >> > table with the target table.
>> > >> >
>> > >> > An interesting thing I found is that the number of skipped rows
is
>> > >> always a
>> > >> > multiple of 999. This is especially interesting because our scanner
>> > >> caching
>> > >> > is 1000. So I think we're skipping over the scanner cache
>> sometimes.
>> > >> >
>> > >> > To get a sense of how many we are missing, the latest run missed
>> 183,816
>> > >> out
>> > >> > of 29,572,075 rows in the source table.
>> > >> >
>> > >> > Any ideas?
>> > >> >
>> > >> > Thanks,
>> > >> > Sean
>> > >> >
>> > >> > On Fri, Mar 18, 2011 at 9:58 AM, Michael Segel <
>> > >> michael_segel@hotmail.com>wrote:
>> > >> >
>> > >> > >
>> > >> > > Sean,
>> > >> > >
>> > >> > > Here's a simple test.
>> > >> > >
>> > >> > > Modify your code so that you aren't using the TableOutputFormat
>> class,
>> > >> but
>> > >> > > a null writable and inside the map() method you actually
do the
>> write
>> > >> > > yourself.
>> > >> > >
>> > >> > > Also make sure to explicitly flush and close your HTable
>> connection
>> > >> when
>> > >> > > your mapper ends.
>> > >> > >
>> > >> > >
>> > >> > >
>> > >> > > > From: ssechrist@gmail.com
>> > >> > > > Date: Fri, 18 Mar 2011 09:50:47 -0400
>> > >> > > > Subject: Scan isn't processing all rows
>> > >> > > > To: user@hbase.apache.org
>> > >> > > >
>> > >> > > > Hi all,
>> > >> > > >
>> > >> > > > We're experiencing a problem where a map-only job using
>> > >> TableInputFormat
>> > >> > > and
>> > >> > > > TableOutputFormat to export data from one table into
another is
>> not
>> > >> > > reading
>> > >> > > > all of the rows in the source table. That is, # map
input
>> records !=
>> > >> #
>> > >> > > > records in the table. Anyone have any clue how that
could
>> happen?
>> > >> > > >
>> > >> > > > Some more detail:
>> > >> > > >
>> > >> > > > It appears to only happen when we are writing results
to the
>> > >> destination
>> > >> > > > table. If I comment out the lines where where data is
written
>> from
>> > >> the
>> > >> > > > mapper (context.write), then the number of input records
is
>> correct.
>> > >> > > >
>> > >> > > > I verified that the rows that did not get written to
the output
>> > >> table, so
>> > >> > > > it's not just a counter problem. We aren't using any
filter or
>> > >> anything,
>> > >> > > > just a straight-up scan to try to read everything in
the table.
>> > >> > > >
>> > >> > > > We're on hbase-0.89.20100924.
>> > >> > > >
>> > >> > > > Thanks,
>> > >> > > > Sean
>> > >> > >
>> > >>
>> > >
>> > >
>> >
>> > --
>> >
>> > 1. If a man is standing in the middle of the forest talking, and there
>> is no woman around to hear him, is he still wrong?
>> >
>> > 2. Behind every great woman... Is a man checking out her ass
>> >
>> > 3. I am not a member of any organized political party. I am a Democrat.*
>> >
>> > 4. Diplomacy is the art of saying "Nice doggie" until you can find a
>> rock.*
>> >
>> > 5. A process is what you need when all your good people have left.
>> >
>> >
>> > *Will Rogers
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message