hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: HBase file encryption, inconsistencies observed and data loss
Date Sun, 27 Jul 2014 11:43:12 GMT
SecureProtobufLogReader can read encrypted as well as unencrypted files.

Anoop

On Sunday, July 27, 2014, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:
> I think in the above case though encryption is disabled we will need to
use
> the securelogreader  only for the new files also that will be created? I
> don have code with me now. But if that is the case need to see it as I
feel
> only the existing one should be read with securelogreader. The new wal
> should be read using log reader.
> Moving to corrupt folder is fine unless we could bring it back to the main
> working for.
> Sent from mobile excuse any typos.
> On Jul 27, 2014 10:07 AM, "Anoop John" <anoop.hbase@gmail.com> wrote:
>
>> As per Shankar he can get things work with below configs
>>
>> <property>
>>         <name>hbase.regionserver.hlog.reader.impl</name>
>>
>>
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>> </property>
>> <property>
>>         <name>hbase.regionserver.hlog.writer.impl</name>
>>
>>
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>> </property>
>> <property>
>>         <name>hbase.regionserver.wal.encryption</name>
>>         <value>false</value>
>> </property>
>>
>> Once the RS crash happened,  the config is maintained above way. See that
>> WAL encryption is disabled now.  Still note that the reader is
>> SecureProtobufLogReader. The existing WAL files are with encryption and
>> only SecureProtobufLogReader can read them.  So if that is not
configured,
>> the default reader is. ProtobufLogReader  can not read them back
>> correctly.    So this is the issue that Shankar faced.
>>
>> Also when the file can not be read, this is not moved under corrupt logs
is
>> a concerning thing.  Need to look at that.
>>
>> -Anoop-
>>
>> On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <
andrew.purtell@gmail.com
>> >
>> wrote:
>>
>> > My attempt to reproduce this issue:
>> >
>> > 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on a
dev
>> > box.
>> >
>> > 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver
also
>> on
>> > this dev box.
>> >
>> > 3. Set dfs.replication and
>> hbase.regionserver.hlog.tolerable.lowreplication
>> > to 1. Set up a keystore and enabled WAL encryption.
>> >
>> > 4. Created a test table.
>> >
>> > 5. Used YCSB to write 1000 rows to the test table. No flushes observed.
>> >
>> > 6. Used the shell to count the number of records in the test table.
>> Count =
>> > 1000 rows
>> >
>> > 7. kill -9 the regionserver process.
>> >
>> > 8. Started a new regionserver process. Observed log splitting and
replay
>> in
>> > the regionserver log, no errors.
>> >
>> > 9. Used the shell to count the number of records in the test table.
>> Count =
>> > 1000 rows
>> >
>> > Tried this a few times.
>> >
>> > Shankar, can you try running through the above and let us know if the
>> > outcome is different?
>> >
>> >
>> >
>> > On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <
>> andrew.purtell@gmail.com>
>> > wrote:
>> >
>> > > Thanks for the detail. So to summarize:
>> > >
>> > > 0. HBase 0.98.3 and HDFS 2.4.1
>> > >
>> > > 1. All data before failure has not yet been flushed so only exists in
>> the
>> > > WAL files.
>> > >
>> > > 2. During distributed splitting, the WAL has either not been written
>> out
>> > > or is unreadable:
>> > >
>> > >
>> > > 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > codec.BaseDecoder: Partial cell read caused by EOF:
>> java.io.IOException:
>> > > Premature EOF from inputStream
>> > >
>> > >
>> > > 3. This file is still moved to oldWALs even though splitting failed.
>> > >
>> > > 4. Setting 'hbase.regionserver.wal.encryption' to false allows for
data
>> > > recovery in your scenario.
>> > >
>> > > See https://issues.apache.org/jira/browse/HBASE-11595
>> > >
>> > >
>> > >
>> > >
>> > > On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
>> > shankar.hiremath@huawei.com>
>> > > wrote:
>> > >
>> > >
>> > > Hi Andrew,
>> > >
>> > >
>> > > Please find the details
>> > >
>> > >
>> > > Hbase 0.98.3 & hadoop 2.4.1
>> > >
>> > > Hbase root file system on hdfs
>> > >
>> > >
>> > > On Hmaster side there is no failure or error message in the log file
>> > >
>> > > On Region Server side the the below error message reported as below
>> > >
>> > >
>> > > Region Server Log:
>> > >
>> > > 2014-07-26 19:29:15,904 DEBUG
>> [regionserver60020-SendThread(host2:2181)]
>> > > zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c,
>> packet::
>> > > clientPath:null serverPath:null finished:false header:: 172,4
>> > >  replyHeader:: 172,4294988825,0  request:: '/hbase/table/hbase:acl,F
>> > >  response::
>> > >
>> >
>>
#ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc15042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,31,0,4294967476}
>> > >
>> > > 2014-07-26 19:29:15,905 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,905 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,905 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,906 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
>> > >
>> > > 2014-07-26 19:29:15,907 DEBUG
>> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
>> > > wal.HLogSplitter: Writer thread
>> > > Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
>> > >
>> > >
>> > > 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > codec.BaseDecoder: Partial cell read caused by EOF:
>> java.io.IOException:
>> > > Premature EOF from inputStream
>> > >
>> > >
>> > > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > wal.HLogSplitter: Finishing writing output logs and closing down.
>> > >
>> > > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > wal.HLogSplitter: Waiting for split writer threads to finish
>> > >
>> > > 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > wal.HLogSplitter: Split writers finished
>> > >
>> > > 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
>> > > wal.HLogSplitter: Processed 0 edits across 0 regions; log
>> > >
>> >
>>
file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-splitting/host1%2C60020%2C1406383007151.1406383069334.meta
>> > > is corrupted = false progress failed = false
>> > >
>> > > 2014-07-26 19:29:16,184 DEBUG
>> [regionserver60020-SendThread(host2:2181)]
>> > > zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
>> > >
>> > >
>> > >
>> > > When I query the table data, which was in WAL files(before the
>> > > RegionServer machine went down) is not coming,
>> > >
>> > > One more thing what I observed is even when the WAL file not
>> successfully
>> > > processed then also it is moving to /oldWALs folder.
>> > >
>> > > So when I revert back the below 3 configuration in Region Server side
>> and
>> > > restart, since the WAL is already moved to oldWALS/ folder,
>> > >
>> > > So it will not get processed.
>> > >
>> > >
>> > > <property>
>> > >
>> > >    <name>hbase.regionserver.hlog.reader.impl</name>
>> > >
>> > >
>> > >
>> >
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.hlog.writer.impl</name>
>> > >
>> > >
>> > >
>> >
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.wal.encryption</name>
>> > >
>> > >   <value>true</value>
>> > >
>> > > </property>
>> > >
>> > >
>> > >
>> > >
>> >
>>
-------------------------------------------------------------------------------------------------------------
>> > >
>> > >
>> > > And one more scenario I tried (Anoop suggested), with the below
>> > > configuration (instead of deleting the below 3 config paramters
>> > >
>> > > Kepp all but make only 'hbase.regionserver.wal.encryption=false') the
>> > > encrypted wal file is getting processed
>> > >
>> > > Successfully, and the query table is giving the WAL data (before the
>> > > RegionServer machine went down) correctly.
>> > >
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.hlog.reader.impl</name>
>> > >
>> > >
>> > >
>> >
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.hlog.writer.impl</name>
>> > >
>> > >
>> > >
>> >
>>
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > >   <name>hbase.regionserver.wal.encryption</name>
>> > >
>> > >   <value>false</value>
>> > >
>> > > </property>
>> > >
>> > >
>> > >
>> > > Regards
>> > >
>> > > -Shankar
>> > >
>> > >
>> > > This e-mail and its attachments contain confidential information from
>> > > HUAWEI, which is intended only for the person or entity whose address
>> is
>> > > listed above. Any use of the information contained herein in any way
>> > > (including, but not limited to, total or partial disclosure,
>> > reproduction,
>> > > or dissemination) by persons other than the intended recipient(s) is
>> > > prohibited. If you receive this e-mail in error, please notify the
>> sender
>> > > by phone or email immediately and delete it!
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > -----Original Message-----
>> > >
>> > > From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
>> > > <andrew.purtell@gmail.com>] On Behalf Of Andrew Purtell
>> > >
>> > > Sent: 26 July 2014 AM 02:21
>> > >
>> > > To: user@hbase.apache.org
>> > >
>> > > Subject: Re: HBase file encryption, inconsistencies observed and data
>> > loss
>> > >
>> > >
>> > > Encryption (or the lack of it) doesn't explain missing HFiles.
>> > >
>> > >
>> > > Most likely if you are having a problem with encryption, this will
>> > > manifest as follows: HFiles will be present. However, you will find
>> many
>> > > IOExceptions in the regionserver logs as they attempt to open the
>> HFiles
>> > > but fail because the data is unreadable.
>> > >
>> > >
>> > > We should start by looking at more basic issues. What could explain
the
>> > > total disappearance of HFiles.
>> > >
>> > >
>> > > Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or
on
>> > > the local filesystem (fs URL starts with file://)?
>> > >
>> > >
>> > > In your email you provide only exceptions printed by the client. What
>> > kind
>> > > of exceptions appear in the regionserver logs? Or appear in the
master
>> > log?
>> > >
>> > > If the logs are large your best bet is to pastebin them and then send
>> the
>> > > URL to the paste in your response.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
>> > > shankar.hiremath@huawei.com> wrote:
>> > >
>> > >
>> > > HBase file encryption some inconsistencies observed and data loss
>> > >
>> > > happens after running the hbck tool,
>> > >
>> > > the operation steps are as below.    (one thing what I observed is,
on
>> > >
>> > > startup of HMaster if it is not able to process the WAL file, then
>> > >
>> > > also it moved to /oldWALs)
>> > >
>> > >
>> > > Procedure:
>> > >
>> > > 1. Start the Hbase services (HMaster & region Server) 2. Enable HFile
>> > >
>> > > encryption and WAL file encryption as below, and perform 'table4-0'
>> > >
>> > > put operations (100 records added) <property>
>> > >
>> > > <name>hbase.crypto.keyprovider</name>
>> > >
>> > > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.keyprovider.parameters</name>
>> > >
>> > > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>> > >
>> > > </value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.master.key.name</name>
>> > >
>> > > <value>hdfs</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hfile.format.version</name>
>> > >
>> > > <value>3</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.hlog.reader.impl</name>
>> > >
>> > >
>> > >
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>> > >
>> > > r</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.hlog.writer.impl</name>
>> > >
>> > >
>> > >
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>> > >
>> > > r</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.wal.encryption</name>
>> > >
>> > > <value>true</value>
>> > >
>> > > </property>
>> > >
>> > > 3. Machine went down, so all process went down
>> > >
>> > >
>> > > 4. We disabled the WAL file encryption for performance reason, and
>> > >
>> > > keep encryption only for Hfile, as below <property>
>> > >
>> > > <name>hbase.crypto.keyprovider</name>
>> > >
>> > > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.keyprovider.parameters</name>
>> > >
>> > > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>> > >
>> > > </value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.master.key.name</name>
>> > >
>> > > <value>hdfs</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hfile.format.version</name>
>> > >
>> > > <value>3</value>
>> > >
>> > > </property>
>> > >
>> > > 5. Start the Region Server and query the 'table4-0' data
>> > >
>> > > hbase(main):003:0> count 'table4-0'
>> > >
>> > > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>> > >
>> > > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>> > >
>> > > online on
>> > >
>> > > XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>> > >
>> > > ame(HRegionServer.java:2685)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>> > >
>> > > rver.java:4119)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>> > >
>> > > java:3066)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>> > >
>> > > 2.callBlockingMethod(ClientProtos.java:29497)
>> > >
>> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>> > >
>> > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>> > >
>> > > cheduler.java:168)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>> > >
>> > > eduler.java:39)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>> > >
>> > > r.java:111) at java.lang.Thread.run(Thread.java:662)
>> > >
>> > > 6. Not able to read the data, so we decided to revert back the
>> > >
>> > > configuration (as original) 7. Kill/Stop the Region Server, revert
all
>> > >
>> > > the configurations as original, as below <property>
>> > >
>> > > <name>hbase.crypto.keyprovider</name>
>> > >
>> > > <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.keyprovider.parameters</name>
>> > >
>> > > <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
>> > >
>> > > </value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.crypto.master.key.name</name>
>> > >
>> > > <value>hdfs</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hfile.format.version</name>
>> > >
>> > > <value>3</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.hlog.reader.impl</name>
>> > >
>> > >
>> > >
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReade
>> > >
>> > > r</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.hlog.writer.impl</name>
>> > >
>> > >
>> > >
<value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWrite
>> > >
>> > > r</value>
>> > >
>> > > </property>
>> > >
>> > > <property>
>> > >
>> > > <name>hbase.regionserver.wal.encryption</name>
>> > >
>> > > <value>true</value>
>> > >
>> > > </property>
>> > >
>> > > 7. Start the Region Server, and perform the 'table4-0' query
>> > >
>> > > hbase(main):003:0> count 'table4-0'
>> > >
>> > > ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
>> > >
>> > > table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
>> > >
>> > > online on
>> > >
>> > > XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedN
>> > >
>> > > ame(HRegionServer.java:2685)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionSe
>> > >
>> > > rver.java:4119)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
>> > >
>> > > java:3066)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>> > >
>> > > 2.callBlockingMethod(ClientProtos.java:29497)
>> > >
>> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>> > >
>> > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcS
>> > >
>> > > cheduler.java:168)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcSch
>> > >
>> > > eduler.java:39)
>> > >
>> > > at
>> > >
>> > >
org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSchedule
>> > >
>> > > r.java:111) at java.lang.Thread.run(Thread.java:662)
>> > >
>> > > 8. Run the hbase hbck to repair, as below ./hbase hbck -details
>> > >
>> > > .........................
>> > >
>> > > Summary:
>> > >
>> > > table1-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table2-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table3-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table4-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table5-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table6-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table7-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table8-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > table9-0 is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > hbase:meta is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > hbase:namespace is okay.
>> > >
>> > > Number of regions: 0
>> > >
>> > > Deployed on:
>> > >
>> > > 22 inconsistencies detected.
>> > >
>> > > Status: INCONSISTENT
>> > >
>> > > 2014-07-24 19:13:05,532 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing master
>> > >
>> > > protocol: MasterService
>> > >
>> > > 2014-07-24 19:13:05,533 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing
zookeeper
>> > >
>> > > sessionid=0x1475d1611611bcf
>> > >
>> > > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
>> > session:
>> > >
>> > > 0x1475d1611611bcf
>> > >
>> > > 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
>> > >
>> > > client for session: 0x1475d1611611bcf
>> > >
>> > > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
>> packet::
>> > >
>> > > clientPath:null serverPath:null finished:false header:: 6,-11
>> > replyHeader::
>> > >
>> > > 6,4295102074,0 request:: null response:: null
>> > >
>> > > 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
>> > >
>> > > Disconnecting client for session: 0x1475d1611611bcf
>> > >
>> > > 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: An exception was thrown while closing send
>> > >
>> > > thread for session 0x1475d1611611bcf : Unable to read additional data
>> > >
>> > > from server sessionid 0x1475d1611611bcf, likely server has closed
>> > >
>> > > socket
>> > >
>> > > 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
>> > >
>> > > EventThread shut down
>> > >
>> > > 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
>> > >
>> > > 0x1475d1611611bcf closed
>> > >
>> > > shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
>> > >
>> > > 9. Fix the assignments as below
>> > >
>> > > ./hbase hbck -fixAssignments
>> > >
>> > > Summary:
>> > >
>> > > table1-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table2-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table3-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table4-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table5-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table6-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table7-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table8-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table9-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > 0 inconsistencies detected.
>> > >
>> > > Status: OK
>> > >
>> > > 2014-07-24 19:44:55,194 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing master
>> > >
>> > > protocol: MasterService
>> > >
>> > > 2014-07-24 19:44:55,194 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing
zookeeper
>> > >
>> > > sessionid=0x2475d15f7b31b73
>> > >
>> > > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
>> > session:
>> > >
>> > > 0x2475d15f7b31b73
>> > >
>> > > 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
>> > >
>> > > client for session: 0x2475d15f7b31b73
>> > >
>> > > 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
>> packet::
>> > >
>> > > clientPath:null serverPath:null finished:false header:: 7,-11
>> > replyHeader::
>> > >
>> > > 7,4295102377,0 request:: null response:: null
>> > >
>> > > 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
>> > >
>> > > Disconnecting client for session: 0x2475d15f7b31b73
>> > >
>> > > 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: An exception was thrown while closing send
>> > >
>> > > thread for session 0x2475d15f7b31b73 : Unable to read additional data
>> > >
>> > > from server sessionid 0x2475d15f7b31b73, likely server has closed
>> > >
>> > > socket
>> > >
>> > > 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
>> > >
>> > > 0x2475d15f7b31b73 closed
>> > >
>> > > 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
>> > >
>> > > EventThread shut down
>> > >
>> > > 10. Fix the assignments as below
>> > >
>> > > ./hbase hbck -fixAssignments -fixMeta
>> > >
>> > > Summary:
>> > >
>> > > table1-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table2-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table3-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table4-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table5-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table6-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table7-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table8-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > table9-0 is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is okay.
>> > >
>> > > Number of regions: 1
>> > >
>> > > Deployed on: XX-XX-XX-XX,60020,1406209023146
>> > >
>> > > 0 inconsistencies detected.
>> > >
>> > > Status: OK
>> > >
>> > > 2014-07-24 19:46:16,290 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing master
>> > >
>> > > protocol: MasterService
>> > >
>> > > 2014-07-24 19:46:16,290 INFO [main]
>> > >
>> > > client.HConnectionManager$HConnectionImplementation: Closing
zookeeper
>> > >
>> > > sessionid=0x3475d1605321be9
>> > >
>> > > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
>> > session:
>> > >
>> > > 0x3475d1605321be9
>> > >
>> > > 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
>> > >
>> > > client for session: 0x3475d1605321be9
>> > >
>> > > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
>> packet::
>> > >
>> > > clientPath:null serverPath:null finished:false header:: 6,-11
>> > replyHeader::
>> > >
>> > > 6,4295102397,0 request:: null response:: null
>> > >
>> > > 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
>> > >
>> > > Disconnecting client for session: 0x3475d1605321be9
>> > >
>> > > 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
>> > >
>> > > zookeeper.ClientCnxn: An exception was thrown while closing send
>> > >
>> > > thread for session 0x3475d1605321be9 : Unable to read additional data
>> > >
>> > > from server sessionid 0x3475d1605321be9, likely server has closed
>> > >
>> > > socket
>> > >
>> > > 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
>> > >
>> > > 0x3475d1605321be9 closed
>> > >
>> > > 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
>> > >
>> > > EventThread shut down
>> > >
>> > > hbase(main):006:0> count 'table4-0'
>> > >
>> > > 0 row(s) in 0.0200 seconds
>> > >
>> > > => 0
>> > >
>> > > hbase(main):007:0>
>> > >
>> > > Complete data loss happened,
>> > >
>> > > WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>> > >
>> > >
>> > >
>> > >
>> > > [X]
>> > >
>> > > This e-mail and its attachments contain confidential information from
>> > >
>> > > HUAWEI, which is intended only for the person or entity whose address
>> > >
>> > > is listed above. Any use of the information contained herein in any
>> > >
>> > > way (including, but not limited to, total or partial disclosure,
>> > >
>> > > reproduction, or dissemination) by persons other than the intended
>> > >
>> > > recipient(s) is prohibited. If you receive this e-mail in error,
>> > >
>> > > please notify the sender by phone or email immediately and delete it!
>> > >
>> > > [X]
>> > >
>> > >
>> > >
>> > > --
>> > >
>> > > Best regards,
>> > >
>> > >
>> > >  - Andy
>> > >
>> > >
>> > > Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>> > > (via Tom White)
>> > >
>> > >
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message