hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Merging regions issue
Date Thu, 06 Dec 2012 00:30:07 GMT
When a region merge is failing because of HBASE-1212, it leaves the
system in inconsistency form. I have created HBASE-7287 to fix that
until HBASE-1212 is resolved.

2012/12/5, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
> Ok. Seems that I'm facing HBASE-1212....
>
> The only issue is that when the merge is failing, hbck become inconsistent.
>
> JM
>
> 2012/12/5, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>> Hi everyone,
>>
>> Sorry, I did not figured I was only replying to Marcos ;)
>>
>> So here are more details about this issue.
>>
>> I'm using HBase 0.94.3 and Hadoop 1.0.3.
>>
>> The merge seems to be failing when there is to many merges done.
>>
>> I just gave it another try... Each time I'm doing a major_compact
>> before trying the merges. And hbck.
>>
>> Build the table with 4 regions and 1000 rows. Keys are 8 bytes long
>> and value is 512 bytes. Everything went well, and hbck is not
>> reporting additionnal errors.
>>
>> Another try with 16 regions and 10000 rows worked well too.
>>
>> Another try with 54 regions and 10000 rows but this one is not working
>> fine. I'm getting some errors and the hbck is giving that:
>> ERROR: Orphan region in HDFS: Unable to load .regioninfo from table
>> testtable in hdfs dir
>> hdfs://node3:9000/hbase/testtable/88203ca27c9beedb02004d93e7181f94!
>> It may be an invalid format or version file.  Treating as an orphaned
>> regiondir.
>> ERROR: Orphan region in HDFS: Unable to load .regioninfo from table
>> testtable in hdfs dir
>> hdfs://node3:9000/hbase/testtable/c359655af1e7beb8138123e8aed4c382!
>> It may be an invalid format or version file.  Treating as an orphaned
>> regiondir.
>> ERROR: Orphan region in HDFS: Unable to load .regioninfo from table
>> testtable in hdfs dir
>> hdfs://node3:9000/hbase/testtable/e537a847f8c5a549993001b2bb9c0102!
>> It may be an invalid format or version file.  Treating as an orphaned
>> regiondir.
>> ERROR: Orphan region in HDFS: Unable to load .regioninfo from table
>> testtable in hdfs dir
>> hdfs://node3:9000/hbase/testtable/ef558e802a90b493677b5c07325b12fd!
>> It may be an invalid format or version file.  Treating as an orphaned
>> regiondir.
>>
>> ERROR: Region { meta => null, hdfs =>
>> hdfs://node3:9000/hbase/testtable/88203ca27c9beedb02004d93e7181f94,
>> deployed =>  } on HDFS, but not listed in META or deployed on any
>> region server
>> ERROR: Region { meta => null, hdfs =>
>> hdfs://node3:9000/hbase/testtable/c359655af1e7beb8138123e8aed4c382,
>> deployed =>  } on HDFS, but not listed in META or deployed on any
>> region server
>> ERROR: Region { meta => null, hdfs =>
>> hdfs://node3:9000/hbase/testtable/e537a847f8c5a549993001b2bb9c0102,
>> deployed =>  } on HDFS, but not listed in META or deployed on any
>> region server
>> ERROR: Region { meta => null, hdfs =>
>> hdfs://node3:9000/hbase/testtable/ef558e802a90b493677b5c07325b12fd,
>> deployed =>  } on HDFS, but not listed in META or deployed on any
>> region server
>>
>> And below is the log from the merge application. After that it's
>> listing all the regions in the server (a lot).
>>
>> I'm not sure if it's the first occurance of the issue or not.
>>
>> Should I open a JIRA for that? It's difficult to reproduce because
>> it's not a fixed pattern, but I still can get is failing easily.
>>
>> I will activate the DEBUG logs on the HRegion class, and give it again
>> and again another try ;)
>>
>> Thanks,
>>
>> JM
>>
>> Merging
>> testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727393540.3dca6fdbf95546ac71f47403a047fd10.
>> with
>> testtable,?\xEC\xA0_\xCE+\xB7),1354727402292.a5a42e51f3414fc45d77c1759378b58e.
>> 12/12/05 12:18:56 INFO util.Merge: Verifying that file system is
>> available...
>> 12/12/05 12:18:56 INFO util.Merge: Verifying that HBase is not running...
>> 12/12/05 12:18:56 INFO zookeeper.ZooKeeper: Initiating client
>> connection, connectString=latitude:2181,cube:2181,node3:2181
>> sessionTimeout=180000 watcher=hconnection
>> 12/12/05 12:18:56 INFO zookeeper.ClientCnxn: Opening socket connection
>> to server /192.168.23.1:2181
>> 12/12/05 12:18:56 INFO zookeeper.RecoverableZooKeeper: The identifier
>> of this process is 13131@node3
>> 12/12/05 12:18:56 INFO client.ZooKeeperSaslClient: Client will not
>> SASL-authenticate because the default JAAS configuration section
>> 'Client' could not be found. If you are not using SASL, you may ignore
>> this. On the other hand, if you expected SASL to work, please fix your
>> JAAS configuration.
>> 12/12/05 12:18:56 INFO zookeeper.ClientCnxn: Socket connection
>> established to cube/192.168.23.1:2181, initiating session
>> 12/12/05 12:18:56 INFO zookeeper.ClientCnxn: Session establishment
>> complete on server cube/192.168.23.1:2181, sessionid =
>> 0x13b6b25f2d900b6, negotiated timeout = 40000
>> 12/12/05 12:18:56 INFO
>> client.HConnectionManager$HConnectionImplementation: ZooKeeper
>> available but no active master location found
>> 12/12/05 12:18:56 INFO
>> client.HConnectionManager$HConnectionImplementation: getMaster attempt
>> 0 of 1 failed; no more retrying.
>> org.apache.hadoop.hbase.MasterNotRunningException
>> 	at
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:674)
>> 	at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:110)
>> 	at
>> org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:1733)
>> 	at org.apache.hadoop.hbase.util.Merge.run(Merge.java:94)
>> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> 	at MassMerger.mergeAllRegionsFromTable(MassMerger.java:81)
>> 	at MassMerger.main(MassMerger.java:120)
>> 12/12/05 12:18:56 INFO
>> client.HConnectionManager$HConnectionImplementation: Closed zookeeper
>> sessionid=0x13b6b25f2d900b6
>> 12/12/05 12:18:56 INFO zookeeper.ZooKeeper: Session: 0x13b6b25f2d900b6
>> closed
>> 12/12/05 12:18:56 INFO zookeeper.ClientCnxn: EventThread shut down
>> 12/12/05 12:18:56 INFO util.Merge: Merging regions
>> testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727393540.3dca6fdbf95546ac71f47403a047fd10.
>> and
>> testtable,?\xEC\xA0_\xCE+\xB7),1354727402292.a5a42e51f3414fc45d77c1759378b58e.
>> in table testtable
>> 12/12/05 12:18:56 INFO wal.HLog: FileSystem doesn't support
>> getDefaultBlockSize
>> 12/12/05 12:18:56 INFO wal.HLog: HLog configuration: blocksize=64 MB,
>> rollsize=60.8 MB, enabled=true, optionallogflushinternal=1000ms
>> 12/12/05 12:18:56 INFO wal.HLog:  for
>> /user/hbase/.logs_1354727936650/hlog.1354727936704
>> 12/12/05 12:18:56 INFO wal.HLog: Using getNumCurrentReplicas--HDFS-826
>> 12/12/05 12:18:56 INFO regionserver.HRegion: Setting up
>> tabledescriptor config now ...
>> 12/12/05 12:18:56 INFO regionserver.Store: time to purge deletes set
>> to 0ms in store null
>> 12/12/05 12:18:56 INFO regionserver.HRegion: Onlined
>> -ROOT-,,0.70236052; next sequenceid=75763280
>> 12/12/05 12:18:56 INFO util.Merge: Found meta for region1 .META.,,1,
>> meta for region2 .META.,,1
>> 12/12/05 12:18:56 INFO regionserver.HRegion: Setting up
>> tabledescriptor config now ...
>> 12/12/05 12:18:56 INFO regionserver.Store: time to purge deletes set
>> to 0ms in store null
>> 12/12/05 12:18:56 INFO regionserver.StoreFile$Reader: Loaded Delete
>> Family Bloom (CompoundBloomFilter) metadata for
>> d8b6e5c639c64854ac65893373e9632a
>> 12/12/05 12:18:56 INFO regionserver.StoreFile$Reader: Loaded Delete
>> Family Bloom (CompoundBloomFilter) metadata for
>> ebed1be1e7ff455e982fd2739cfc63f1
>> 12/12/05 12:18:56 INFO regionserver.HRegion: Onlined
>> .META.,,1.1028785192; next sequenceid=75764228
>> 12/12/05 12:18:56 INFO regionserver.HRegion: Starting compaction on
>> info in region .META.,,1.1028785192
>> 12/12/05 12:18:56 INFO regionserver.Store: Starting compaction of 3
>> file(s) in info of .META.,,1.1028785192 into
>> tmpdir=hdfs://node3:9000/hbase/.META./1028785192/.tmp, seqid=75764227,
>> totalSize=1,4m
>> 12/12/05 12:18:56 INFO regionserver.StoreFile: Delete Family Bloom
>> filter type for
>> hdfs://node3:9000/hbase/.META./1028785192/.tmp/fd817fe85b8347319ae085f3696f3d98:
>> CompoundBloomFilterWriter
>> 12/12/05 12:18:57 INFO regionserver.StoreFile: NO General Bloom and NO
>> DeleteFamily was added to HFile
>> (hdfs://node3:9000/hbase/.META./1028785192/.tmp/fd817fe85b8347319ae085f3696f3d98)
>> 12/12/05 12:18:57 INFO regionserver.Store: Renaming compacted file at
>> hdfs://node3:9000/hbase/.META./1028785192/.tmp/fd817fe85b8347319ae085f3696f3d98
>> to
>> hdfs://node3:9000/hbase/.META./1028785192/info/fd817fe85b8347319ae085f3696f3d98
>> 12/12/05 12:18:57 INFO regionserver.Store: Completed major compaction
>> of 3 file(s) in info of .META.,,1.1028785192 into
>> fd817fe85b8347319ae085f3696f3d98, size=1,4m; total size for store is
>> 1,4m
>> 12/12/05 12:18:57 INFO util.MetaUtils: OPENING META .META.,,1.1028785192
>> 12/12/05 12:18:57 INFO regionserver.HRegion: Setting up
>> tabledescriptor config now ...
>> 12/12/05 12:18:57 INFO regionserver.Store: time to purge deletes set
>> to 0ms in store null
>> 12/12/05 12:18:57 INFO regionserver.HRegion: Onlined
>> testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727393540.3dca6fdbf95546ac71f47403a047fd10.;
>> next sequenceid=75533930
>> 12/12/05 12:18:57 INFO regionserver.HRegion: Setting up
>> tabledescriptor config now ...
>> 12/12/05 12:18:57 INFO regionserver.Store: time to purge deletes set
>> to 0ms in store null
>> 12/12/05 12:18:57 INFO regionserver.HRegion: Onlined
>> testtable,?\xEC\xA0_\xCE+\xB7),1354727402292.a5a42e51f3414fc45d77c1759378b58e.;
>> next sequenceid=75533930
>> 12/12/05 12:18:57 INFO regionserver.HRegion: Starting compaction on f
>> in region
>> testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727393540.3dca6fdbf95546ac71f47403a047fd10.
>> 12/12/05 12:18:57 INFO regionserver.Store: Starting compaction of 1
>> file(s) in f of
>> testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727393540.3dca6fdbf95546ac71f47403a047fd10.
>> into
>> tmpdir=hdfs://node3:9000/hbase/testtable/3dca6fdbf95546ac71f47403a047fd10/.tmp,
>> seqid=75533929, totalSize=130,4k
>> 12/12/05 12:18:57 INFO regionserver.StoreFile: Delete Family Bloom
>> filter type for
>> hdfs://node3:9000/hbase/testtable/3dca6fdbf95546ac71f47403a047fd10/.tmp/5a4a63f5e5e64664bbf4bdf7b13f5593:
>> CompoundBloomFilterWriter
>> 12/12/05 12:18:57 INFO regionserver.StoreFile: NO General Bloom and NO
>> DeleteFamily was added to HFile
>> (hdfs://node3:9000/hbase/testtable/3dca6fdbf95546ac71f47403a047fd10/.tmp/5a4a63f5e5e64664bbf4bdf7b13f5593)
>> 12/12/05 12:18:57 INFO regionserver.Store: Renaming compacted file at
>> hdfs://node3:9000/hbase/testtable/3dca6fdbf95546ac71f47403a047fd10/.tmp/5a4a63f5e5e64664bbf4bdf7b13f5593
>> to
>> hdfs://node3:9000/hbase/testtable/3dca6fdbf95546ac71f47403a047fd10/f/5a4a63f5e5e64664bbf4bdf7b13f5593
>> 12/12/05 12:18:57 INFO regionserver.Store: Completed major compaction
>> of 1 file(s) in f of
>> testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727393540.3dca6fdbf95546ac71f47403a047fd10.
>> into 5a4a63f5e5e64664bbf4bdf7b13f5593, size=130,4k; total size for
>> store is 130,4k
>> 12/12/05 12:18:57 INFO regionserver.HRegion: Starting compaction on f
>> in region
>> testtable,?\xEC\xA0_\xCE+\xB7),1354727402292.a5a42e51f3414fc45d77c1759378b58e.
>> 12/12/05 12:18:57 INFO regionserver.Store: Starting compaction of 1
>> file(s) in f of
>> testtable,?\xEC\xA0_\xCE+\xB7),1354727402292.a5a42e51f3414fc45d77c1759378b58e.
>> into
>> tmpdir=hdfs://node3:9000/hbase/testtable/a5a42e51f3414fc45d77c1759378b58e/.tmp,
>> seqid=75533929, totalSize=130,4k
>> 12/12/05 12:18:57 INFO regionserver.StoreFile: Delete Family Bloom
>> filter type for
>> hdfs://node3:9000/hbase/testtable/a5a42e51f3414fc45d77c1759378b58e/.tmp/93e719c402384573b6587d91099f3226:
>> CompoundBloomFilterWriter
>> 12/12/05 12:18:58 INFO regionserver.StoreFile: NO General Bloom and NO
>> DeleteFamily was added to HFile
>> (hdfs://node3:9000/hbase/testtable/a5a42e51f3414fc45d77c1759378b58e/.tmp/93e719c402384573b6587d91099f3226)
>> 12/12/05 12:18:58 INFO regionserver.Store: Renaming compacted file at
>> hdfs://node3:9000/hbase/testtable/a5a42e51f3414fc45d77c1759378b58e/.tmp/93e719c402384573b6587d91099f3226
>> to
>> hdfs://node3:9000/hbase/testtable/a5a42e51f3414fc45d77c1759378b58e/f/93e719c402384573b6587d91099f3226
>> 12/12/05 12:18:58 INFO regionserver.Store: Completed major compaction
>> of 1 file(s) in f of
>> testtable,?\xEC\xA0_\xCE+\xB7),1354727402292.a5a42e51f3414fc45d77c1759378b58e.
>> into 93e719c402384573b6587d91099f3226, size=130,4k; total size for
>> store is 130,4k
>> 12/12/05 12:18:58 INFO regionserver.HRegion: Creating new region {NAME
>> =>
>> 'testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727938387.88203ca27c9beedb02004d93e7181f94.',
>> STARTKEY => '?\xEC\x1B\x90^\xDB\xC9\xA5', ENDKEY => '?\xED 3\x862\x89
>> ', ENCODED => 88203ca27c9beedb02004d93e7181f94,}
>> 12/12/05 12:18:58 INFO regionserver.HRegion: starting merge of
>> regions:
>> testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727393540.3dca6fdbf95546ac71f47403a047fd10.
>> and
>> testtable,?\xEC\xA0_\xCE+\xB7),1354727402292.a5a42e51f3414fc45d77c1759378b58e.
>> into new region {NAME =>
>> 'testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727938387.88203ca27c9beedb02004d93e7181f94.',
>> STARTKEY => '?\xEC\x1B\x90^\xDB\xC9\xA5', ENDKEY => '?\xED 3\x862\x89
>> ', ENCODED => 88203ca27c9beedb02004d93e7181f94,} with start key
>> <?\xEC\x1B\x90^\xDB\xC9\xA5> and end key <?\xED 3\x862\x89 >
>> 12/12/05 12:18:58 INFO regionserver.HRegion: Closed
>> testtable,?\xEC\x1B\x90^\xDB\xC9\xA5,1354727393540.3dca6fdbf95546ac71f47403a047fd10.
>> 12/12/05 12:18:58 INFO regionserver.HRegion: Closed
>> testtable,?\xEC\xA0_\xCE+\xB7),1354727402292.a5a42e51f3414fc45d77c1759378b58e.
>> 12/12/05 12:18:58 FATAL util.Merge: Merge failed
>> java.io.IOException: Files have same sequenceid: 75533929
>> 	at org.apache.hadoop.hbase.regionserver.HRegion.merge(HRegion.java:4080)
>> 	at org.apache.hadoop.hbase.util.Merge.merge(Merge.java:291)
>> 	at org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.java:242)
>> 	at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111)
>> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> 	at MassMerger.mergeAllRegionsFromTable(MassMerger.java:81)
>> 	at MassMerger.main(MassMerger.java:120)
>> 12/12/05 12:18:58 INFO regionserver.HRegion: Setting up
>> tabledescriptor config now ...
>> 12/12/05 12:18:58 INFO regionserver.Store: time to purge deletes set
>> to 0ms in store null
>>
>>
>> 2012/12/4, Marcos Ortiz <mlortiz@uci.cu>:
>>> One last question, Jean-Marc.
>>> Exactly, What version of HBase are you using?
>>> What version of Hadoop are you using?
>>>
>>> On 12/04/2012 09:02 PM, Marcos Ortiz wrote:
>>>> Regards, Jean-Marc
>>>> On 12/04/2012 05:54 PM, Jean-Marc Spaggiari wrote:
>>>>> Sorry for replying so quickly to myself.
>>>>>
>>>>> So, here is what I did.
>>>>>
>>>>> I had a table with only "few" lines". about 20 000.
>>>>>
>>>>> Table was split over 16 regions.
>>>>>
>>>>> I merged all the regions into one, then asked HBase via the HTML
>>>>> interface to split it until I got more than 64 regions.
>>>>>
>>>>> Then I tried to re-merged them all together again into a single one.
>>>>>
>>>>> Now, bin/hbase hbck is giving me 65 inconsistencies detected. for
>>>>> this table.
>>>>>
>>>>> All the inconsistencies are related to the table I played with.
>>>>>
>>>>> I don't know at what stage the issue happend, so it'S a bit difficult
>>>>> to reproduce, but seems something went wrong in the process.
>>>>>
>>>>> JM
>>>>>
>>>>> 2012/12/4, Jean-Marc Spaggiari <jean-marc@spaggiari.org>:
>>>>>> Hi,
>>>>>>
>>>>>> While merging many regions, I'm getting this error for some of them:
>>>>>>
>>>>>> 12/12/04 17:45:16 FATAL util.Merge: Merge failed
>>>>>> java.io.IOException: Files have same sequenceid: 75533866
>>>>>>     at
>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.merge(HRegion.java:4080)
>>>>>>     at org.apache.hadoop.hbase.util.Merge.merge(Merge.java:291)
>>>>>>     at
>>>>>> org.apache.hadoop.hbase.util.Merge.mergeTwoRegions(Merge.java:242)
>>>>>>     at org.apache.hadoop.hbase.util.Merge.run(Merge.java:111)
>>>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>     at org.apache.hadoop.hbase.util.Merge.main(Merge.java:387)
>>>> It seems there's an error with Merge process, because it' seems which
>>>> is repeating the ID for
>>>> files when you repeat the process of Merging regions.
>>>> Have you looked in HBASE's JIRAs about this problem?
>>>>
>>>>>>
>>>>>> Any idea why?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> JM
>>>>>>
>>>>>
>>>>
>>>
>>> --
>>>
>>> Marcos Luis Ortíz Valmaseda
>>> about.me/marcosortiz <http://about.me/marcosortiz>
>>> @marcosluis2186 <http://twitter.com/marcosluis2186>
>>>
>>>
>>>
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>>
>

Mime
View raw message