hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: multiple scanners on same table will cause problem? Scan results change among different tries.
Date Thu, 22 Apr 2010 22:29:08 GMT
Hi,

The implementation of how row atomicity regarding concurrent reads and
writes has changed substantially in HBase 0.20.4 - a major uptick in
improvement in terms of both correctness and speed.  In 0.20.3 there
is an attempt to make scans be more correct and not return half
written rows.

So for these test cases and problems, I will probably have to ask
people to try the new 0.20.4 RC (which will hopefully be released
soon).

HBase promises row level atomicity.  There is no transactions in
base-hbase and there is no such thing as 'dirty reads' - all reads
will only returned complete puts to a row.  But as per the spec Stack
points out we will return data that existed at the time of the scan,
and possibly newer, but not necessarily.  This is not strict
"transaction isolation", but given the lack of transactions this is
ultimately necessary.

On Thu, Apr 22, 2010 at 1:11 PM, Michael Segel
<michael_segel@hotmail.com> wrote:
>
> Thanks Tim,
>
> I suspect that it should work unless you get so many connections trying to hit the same
region that you overwhelm its ability to handle the scans properly.
> (Or there was a problem in the OP's code)
>
> Scans should be 'dirty reads' imho.
>
>
> -Mike
>
>> Date: Thu, 22 Apr 2010 18:57:01 +0200
>> Subject: Re: multiple scanners on same table will cause problem? Scan results  
      change among different tries.
>> From: timrobertson100@gmail.com
>> To: hbase-user@hadoop.apache.org
>>
>> Attached is a quickly hacked test for parallel scanning threads.  You
>> might want to increase the amount of data in the test though to test
>> properly.
>> It seems to pass consistently for me.
>>
>> Note it uses a shared HTable object across threads, but the API states:
>> "Used to communicate with a single HBase table. This class is not
>> thread safe for writes. Gets, puts, and deletes take out a row lock
>> for the duration of their operation. Scans (currently) do not respect
>> row locking."
>>
>> But I am not doing any writes in the test.
>>
>> Cheers,
>> Tim
>>
>>
>>
>> On Thu, Apr 22, 2010 at 4:22 PM, Michael Segel
>> <michael_segel@hotmail.com> wrote:
>> >
>> >
>> > Tim,
>> >
>> > Even without his code, this should be pretty straightforward on how to duplicate.
>> >
>> > Create the table with a sequence as a column in a column family.
>> > Then write a non-m/r job that has multiple threads that connect to
>> > HBase and see what they get when they hit the small table in a single region.
>> >
>> > If you can duplicate the problem, that would be the test code for the jira.
>> >
>> > -Mike
>> >
>> >> Date: Thu, 22 Apr 2010 16:13:31 +0200
>> >> Subject: Re: multiple scanners on same table will cause problem? Scan results
        change among different tries.
>> >> From: timrobertson100@gmail.com
>> >> To: hbase-user@hadoop.apache.org
>> >>
>> >> Could you please post your code that is doing the scanning Steven?
>> >>
>> >>
>> >>
>> >> On Thu, Apr 22, 2010 at 3:50 PM, Michael Segel
>> >> <michael_segel@hotmail.com> wrote:
>> >> >
>> >> > Ok...
>> >> >
>> >> > This is something that I think we'll need input from a major contributor...
>> >> >
>> >> > It looks like there may be an issue with respect to row locking...
>> >> >
>> >> > I guess the questions to ask are:
>> >> >
>> >> > - How does HBase handle row level locking?
>> >> > -Concurrent reads/fetches of the same row?
>> >> >
>> >> > To be honest and fair, HBase is still an immature product when compared
to databases and there going to be some issues that need to be fleshed out.  (Lets see where
we are in 20+ years ;-)
>> >> >
>> >> > I wish I knew more about the internals of HBase, but there are only
so many hours in the day and my wife forces me to work so I can keep up with her spending.
;-) (And if any of you happen to ever meet her, please don't bring this up, she'll kill me.
:-D   )
>> >> >
>> >> > Lets see what St.Ack or Andrew have to say. This might be a JIRA issue.
>> >> >
>> >> > Thx
>> >> >
>> >> > -Mike
>> >> >
>> >> >
>> >> >
>> >> >> Date: Thu, 22 Apr 2010 20:17:12 +0800
>> >> >> Subject: Re: multiple scanners on same table will cause problem?
Scan results         change among different tries.
>> >> >> From: steven.zhuang.1984@gmail.com
>> >> >> To: hbase-user@hadoop.apache.org
>> >> >>
>> >> >> hi, Michael,
>> >> >>
>> >> >>                Sorry for not making the question clear,
there are multiple
>> >> >> scanners scanning a single table, there might be the case multiple
scanners
>> >> >> reading from a single region.
>> >> >>        please see answers inline.
>> >> >>
>> >> >> On Thu, Apr 22, 2010 at 8:08 PM, Michael Segel <michael_segel@hotmail.com>wrote:
>> >> >>
>> >> >> >
>> >> >> > I'm sorry, but are you trying to say that you have multiple
scanners trying
>> >> >> > to read from a single region and the result sets do not match?
>> >> >> >
>> >> >> >  Yes, the result sets do not match.
>> >> >>
>> >> >> > I guess it would be an easy test, enter a bunch of rows in
to a region and
>> >> >> > have a unique integer for each row. (1,2,3,...)
>> >> >> > Then run a bunch of unfiltered scans in parallel, and generate
a sum from
>> >> >> > the scan. If any of the sums do not match, then you have a
potential issue
>> >> >> > on concurency/row locking, and row isolation level.  How
does HBase handle
>> >> >> > row level locking and isolation levels?
>> >> >> >
>> >> >> > I have iterate on the rows/columnfamilies/cells, and printed
the content of
>> >> >> each cell, found that there are some cells missing in some scan
result set.
>> >> >>
>> >> >> > -Mike
>> >> >> >
>> >> >> > > Date: Thu, 22 Apr 2010 17:07:47 +0800
>> >> >> > > Subject: multiple scanners on same table will cause problem?
Scan results
>> >> >> >     change among different tries.
>> >> >> > > From: steven.zhuang.1984@gmail.com
>> >> >> > > To: hbase-user@hadoop.apache.org
>> >> >> > >
>> >> >> > > hi, All,
>> >> >> > >           Has anybody do scan on one table using
multiple scanners at the
>> >> >> > > same time and  found some inconsistent problem?
>> >> >> > >           I am doing query on a table using dozens(20-120)
of scanners in
>> >> >> > > parallel(multiple threads), trying to take advantage
of the multiple
>> >> >> > cores.
>> >> >> > > But I found the scan results doesn't consist among several
goes. I have
>> >> >> > > checked my code, seems there is no bug in it. So I guess
the problem may
>> >> >> > > come from the HBase itself.
>> >> >> > >           My HBase version is 0.20.3.
>> >> >> >
>> >> >> > _________________________________________________________________
>> >> >> > The New Busy think 9 to 5 is a cute idea. Combine multiple
calendars with
>> >> >> > Hotmail.
>> >> >> >
>> >> >> > http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
>> >> >> >
>> >> >
>> >> > _________________________________________________________________
>> >> > Hotmail is redefining busy with tools for the New Busy. Get more from
your inbox.
>> >> > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
>> >
>> > _________________________________________________________________
>> > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail.
>> > http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
>
> _________________________________________________________________
> Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

Mime
View raw message