hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Wolf <opus...@gmail.com>
Subject Re: Confirming a Bug
Date Mon, 19 Mar 2012 18:24:22 GMT
Hello Lars and Lars,

Thank you for you help and attention.

I wrote a standalone test that exhibits the bug.

http://dl.dropbox.com/u/68001072/HBaseScanCacheBug.java

Here is the output.  It shows how the number of results and key value 
pairs varies as caching in changed, and families are included.  It shows 
the bug starting with 3 families and 5000 caching.  It also shows a new 
bug, where the query always fails with an IOException with 4 families.

CacheSize FamilyCount ResultCount KeyValueCount
1000 1 10000 10
5000 1 10000 10
10000 1 10000 10
1000 2 10000 20
5000 2 10000 20
10000 2 10000 20
1000 3 10000 30
5000 3 5000 30
10000 3 0 -1
Exception in thread "main" java.lang.RuntimeException: 
org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to 
contact region server domu-12-31-39-05-6d-02.compute-1.internal:60020 
for region bug,,1332174647830.ef906b7bd8eea8482c84edd906df24fd., row 
'\x00\x00\x00{\x00\x00\x00\x00\x00\x00\x00\x00', but failed after 10 
attempts.
Exceptions:
java.io.IOException: java.io.IOException: Call to ... failed on local 
exception: java.io.IOException: Unexpected exception receiving call 
responses
     at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1231)
     at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1170)
     at 
org.apache.hadoop.hbase.client.HTable$ClientScanner$1.hasNext(HTable.java:1275)
     ... 7 more


Here is the main().  Note that createTable() and createData() are 
commented out.  Uncomment these to populate the test table.

     public static void main(String[] args) {

         try {
             //createTable("bug");

             HBaseScanCacheBug bug = new HBaseScanCacheBug("bug");
             int id = 123;

             //bug.createData(id);

             System.out.println("CacheSize FamilyCount ResultCount 
KeyValueCount");
             for (int familyCount = 1; familyCount < 5; familyCount++) {
                 bug.scan(id, 1000, familyCount);
                 bug.scan(id, 5000, familyCount);
                 bug.scan(id, 10000, familyCount);
             }

         } catch (IOException e) {
             throw new Error(e);
         }

     }

     private static Configuration getConfiguration() {
         Configuration conf = HBaseConfiguration.create();
         conf.set("hbase.zookeeper.quorum", "Put Your Server Here");
         conf.setInt("hbase.client.prefetch.limit", 100);
         return conf;
     }



On 3/19/12 5:58 AM, Lars George wrote:
> Hi Peter,
>
> Lars #1 here again :)
>
> That is fine, the caching is done transparently for you. But what I also suggest is counting
the number of KeyValues you get back, just to confirm. In other words, iterate over the result
and check how many actual KVs you get back. The reason I am asking is that for example scanner
batching will change the behavior, you will get a Result instance per batch, not per row.
>
> Thanks for digging in!
>
> Lars
>
> On Mar 19, 2012, at 12:40 AM, Peter Wolf wrote:
>
>> Excellent!   Thank you very much (other) Lars.
>>
>> I have only tested this one one dataset, and only on a few values of caching.  I
certainly get different results with 10,000 5,000 and 1,000 caching.  1,000 gives me the same
results as default.  I also get different results when I add families to the Scan.
>>
>> I seem to be surpassing some maximum buffer size.  The number of results is always
the correct value - some multiple of the cache size.  For example, the correct value was 24,452,
but when caching was set to 10,000, I got 4,452 results.  When I then removed a family from
the scan, I got 14,452 results.
>>
>> I'll try to write a standalone program to reproduce this.  I'll get back to you soon.
>>
>> P
>>
>> P.S.  I just want to check.  The following code counts the number of results.  I
don't need to do anything to "get the next cache" or something do I?
>>
>>          Iterator<Result>    it = scanner.iterator();
>>          while (it.hasNext()) {
>>              Result result = it.next();
>>              ...
>>          }
>>
>>
>>
>>
>> On 3/18/12 5:51 PM, lars hofhansl wrote:
>>> Hi Peter,
>>>
>>> (this is the other Lars)
>>>
>>>
>>> Does this depend on your dataset at all? Does not it also happen for smaller
value of scanner caching?
>>>
>>>
>>> Any chance that you can reproduce this in a unittest and file a jira?
>>> If you do (specifically the test), I'll promise I'll look at it this week :)
>>>
>>>
>>> -- Lars (H)
>>>
>>>
>>>
>>> ________________________________
>>>   From: Peter Wolf<opus111@gmail.com>
>>> To: user@hbase.apache.org
>>> Sent: Sunday, March 18, 2012 7:13 AM
>>> Subject: Re: Confirming a Bug
>>>
>>> Hi Lars,
>>>
>>> I don't think so...  My behavior is definitely tied to the amount of
>>> data in each Result.  There definitely seems to be some sort of
>>> threshold.  Changing the caching amount produces a completely repeatable
>>> behavior.  10,000, 5,000, and 1000 each produce different repeatable
>>> results, and changing the families added as produces different reliable
>>> results. There is no "sometimes" or "occasional", and if there was a
>>> Major Compaction, it wouldn't happen that often.
>>>
>>> https://issues.apache.org/jira/browse/HBASE-5121
>>> https://issues.apache.org/jira/browse/HBASE-2856
>>>
>>> Note that with all my families added each result is a few 1000 bytes
>>> big.  Is that unusually large?
>>>
>>> Thanks
>>> P
>>>
>>>
>>>
>>> On 3/18/12 5:28 AM, Lars George wrote:
>>>> Hi Peter,
>>>>
>>>> Could you be hitting HBASE-5121? Or even HBASE-2856?
>>>>
>>>> Lars
>>>>
>>>> On Mar 17, 2012, at 20:46, Peter Wolf<opus111@gmail.com>    wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> A couple of days ago, I asked about strange behavior in my "Scan.addFamiliy
reduces results" thread.
>>>>>
>>>>> I want to confirm that I did find a bug, and if so, how to submit a bug
report.
>>>>>
>>>>> The basic strangeness is that changing the amount of caching, changes
the number of results.  In the original thread, this was confused by the fact that adding
different families also changed the number of results.  We thought it was a filtering problem.
>>>>>
>>>>> However, changing nothing but the setCaching() value changes the number
of results.  Furthermore, the result difference is a multiple of the setCaching() value.
>>>>>
>>>>> Here is the pseudo code:
>>>>>
>>>>>           Scan scan = new Scan(...);
>>>>>           scan.addFamily(...);
>>>>>           Filter filter = ...
>>>>>           scan.setFilter(filter);
>>>>>
>>>>> -->        scan.setCaching(10000);<--
>>>>>
>>>>>           scanner = hTable.getScanner(scan);
>>>>>           Iterator<Result>    it = scanner.iterator();
>>>>>           while (it.hasNext()) {
>>>>>               Result result = it.next();
>>>>>               ...
>>>>>           }
>>>>>
>>>>>
>>>>> Thank you
>>>>> Peter


Mime
View raw message