hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiran Kumar.M.R" <Kiran.Kumar...@huawei.com>
Subject RE: HBase file encryption, inconsistencies observed and data loss
Date Wed, 30 Jul 2014 07:08:59 GMT



This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!




> -----Original Message-----
> From: Anoop John [mailto:anoop.hbase@gmail.com]
> Sent: Tuesday, July 29, 2014 07:36
> To: user@hbase.apache.org
> Subject: Re: HBase file encryption, inconsistencies observed and data
> loss
> 
> Yes in btw the restart the config was changed. In steps the #4 was that.
>  Wal encryption config is changed to false. Well that is ok but the
> reader can not be changed. Because we dont find reader by looking at
> wal file meta that whether this file is encrypted or not. Wal reading
> was this way with user has to configure correct reader. So not sure
> whether any code change needed or not.  Once the wal encryption was
> done, even after changing it back to off the reader should continue to
> be SecureProtobufLogReader. (At least till all existing wals are
> replayed)
> 
> And files moved to old logs but not corrupt folder is something tobe
> checked. Any chance for a look there and patch Shankar?

[Kiran]  Anoop, we are checking this issue. Will submit a patch if needed.
> 
> Anoop
> 
> 
> Anoop
> 
> 
> 
> 
> On Sunday, July 27, 2014, Andrew Purtell <andrew.purtell@gmail.com>
> wrote:
> > So the regionserver configuration was changed after it crashed but
> > before
> it was restarted ?
> >
> > The impression given by the initial report is that simply using
> > encrypted
> WALs will cause data loss. That's not the case as I have confirmed.
> There could be an edge case somewhere but the original reporter has
> left out important detail about how to reproduce the problem. The below
> is not written in clear language either, so I'm not following along.
> I'd be happy to help look at this more once clear steps for reproducing
> the problem are available. Otherwise since you're talking with Shankar
> somehow offline already I'll leave you to it Anoop.
> >
> >> Also when the file can not be read, this is not moved under corrupt
> >> logs
> is a concerning thing.  Need to look at that.
> >
> > Agreed.
> >
> >
> >> On Jul 27, 2014, at 1:07 AM, Anoop John <anoop.hbase@gmail.com>
> wrote:
> >>
> >> As per Shankar he can get things work with below configs
> >>
> >> <property>
> >>        <name>hbase.regionserver.hlog.reader.impl</name>
> >>
> >>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> >> </property>
> >> <property>
> >>        <name>hbase.regionserver.hlog.writer.impl</name>
> >>
> >>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> >> </property>
> >> <property>
> >>        <name>hbase.regionserver.wal.encryption</name>
> >>        <value>false</value>
> >> </property>
> >>
> >> Once the RS crash happened,  the config is maintained above way. See
> >> that WAL encryption is disabled now.  Still note that the reader is
> >> SecureProtobufLogReader. The existing WAL files are with encryption
> >> and only SecureProtobufLogReader can read them.  So if that is not
> configured,
> >> the default reader is. ProtobufLogReader  can not read them back
> >> correctly.    So this is the issue that Shankar faced.
> >>
> >> Also when the file can not be read, this is not moved under corrupt
> >> logs
> is
> >> a concerning thing.  Need to look at that.
> >>
> >> -Anoop-
> >>
> >> On Sat, Jul 26, 2014 at 11:17 PM, Andrew Purtell <
> andrew.purtell@gmail.com>
> >> wrote:
> >>
> >>> My attempt to reproduce this issue:
> >>>
> >>> 1. Set up Hadoop 2.4.1 namenode, secondarynamenode, and datanode on
> >>> a
> dev
> >>> box.
> >>>
> >>> 2. Set up HBase 0.98.5-SNAPSHOT hosted zk, master, and regionserver
> also on
> >>> this dev box.
> >>>
> >>> 3. Set dfs.replication and
> hbase.regionserver.hlog.tolerable.lowreplication
> >>> to 1. Set up a keystore and enabled WAL encryption.
> >>>
> >>> 4. Created a test table.
> >>>
> >>> 5. Used YCSB to write 1000 rows to the test table. No flushes
> observed.
> >>>
> >>> 6. Used the shell to count the number of records in the test table.
> Count =
> >>> 1000 rows
> >>>
> >>> 7. kill -9 the regionserver process.
> >>>
> >>> 8. Started a new regionserver process. Observed log splitting and
> replay in
> >>> the regionserver log, no errors.
> >>>
> >>> 9. Used the shell to count the number of records in the test table.
> Count =
> >>> 1000 rows
> >>>
> >>> Tried this a few times.
> >>>
> >>> Shankar, can you try running through the above and let us know if
> >>> the outcome is different?
> >>>
> >>>
> >>>
> >>> On Sat, Jul 26, 2014 at 8:54 AM, Andrew Purtell <
> andrew.purtell@gmail.com>
> >>> wrote:
> >>>
> >>>> Thanks for the detail. So to summarize:
> >>>>
> >>>> 0. HBase 0.98.3 and HDFS 2.4.1
> >>>>
> >>>> 1. All data before failure has not yet been flushed so only exists
> >>>> in
> the
> >>>> WAL files.
> >>>>
> >>>> 2. During distributed splitting, the WAL has either not been
> >>>> written
> out
> >>>> or is unreadable:
> >>>>
> >>>>
> >>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> codec.BaseDecoder: Partial cell read caused by EOF:
> java.io.IOException:
> >>>> Premature EOF from inputStream
> >>>>
> >>>>
> >>>> 3. This file is still moved to oldWALs even though splitting
> failed.
> >>>>
> >>>> 4. Setting 'hbase.regionserver.wal.encryption' to false allows for
> >>>> data recovery in your scenario.
> >>>>
> >>>> See https://issues.apache.org/jira/browse/HBASE-11595
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Jul 26, 2014, at 6:50 AM, Shankar hiremath <
> >>> shankar.hiremath@huawei.com>
> >>>> wrote:
> >>>>
> >>>>
> >>>> Hi Andrew,
> >>>>
> >>>>
> >>>> Please find the details
> >>>>
> >>>>
> >>>> Hbase 0.98.3 & hadoop 2.4.1
> >>>>
> >>>> Hbase root file system on hdfs
> >>>>
> >>>>
> >>>> On Hmaster side there is no failure or error message in the log
> >>>> file
> >>>>
> >>>> On Region Server side the the below error message reported as
> below
> >>>>
> >>>>
> >>>> Region Server Log:
> >>>>
> >>>> 2014-07-26 19:29:15,904 DEBUG
> [regionserver60020-SendThread(host2:2181)]
> >>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1476d8c83e5012c,
> packet::
> >>>> clientPath:null serverPath:null finished:false header:: 172,4
> >>>> replyHeader:: 172,4294988825,0  request::
> '/hbase/table/hbase:acl,F
> >>>> response::
> >>>
> #ffffffff000146d61737465723a36303030303372ffffffeb39ffffffbbf15ffffffc1
> 5042554680,s{4294967476,4294967480,1406293600844,1406293601414,2,0,0,0,
> 31,0,4294967476}
> >>>>
> >>>> 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-0,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-1,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,905 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-2,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-3,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-4,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-5,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-6,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-7,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,906 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-8,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-9,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-10,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-11,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-12,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-13,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-14,5,main]: starting
> >>>>
> >>>> 2014-07-26 19:29:15,907 DEBUG
> [RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15]
> >>>> wal.HLogSplitter: Writer thread
> >>>> Thread[RS_LOG_REPLAY_OPS-host1:60020-0-Writer-15,5,main]: starting
> >>>>
> >>>>
> >>>> 2014-07-26 19:29:16,160 ERROR [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> codec.BaseDecoder: Partial cell read caused by EOF:
> java.io.IOException:
> >>>> Premature EOF from inputStream
> >>>>
> >>>>
> >>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> wal.HLogSplitter: Finishing writing output logs and closing down.
> >>>>
> >>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> wal.HLogSplitter: Waiting for split writer threads to finish
> >>>>
> >>>> 2014-07-26 19:29:16,161 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> wal.HLogSplitter: Split writers finished
> >>>>
> >>>> 2014-07-26 19:29:16,162 INFO  [RS_LOG_REPLAY_OPS-host1:60020-0]
> >>>> wal.HLogSplitter: Processed 0 edits across 0 regions; log
> >>>
> file=hdfs://hacluster/hbase/WALs/host1,60020,1406383007151-
> splitting/host1%2C60020%2C1406383007151.1406383069334.meta
> >>>> is corrupted = false progress failed = false
> >>>>
> >>>> 2014-07-26 19:29:16,184 DEBUG
> [regionserver60020-SendThread(host2:2181)]
> >>>> zookeeper.ClientCnxn: Got notification sessionid:0x1476d8c83e5012c
> >>>>
> >>>>
> >>>>
> >>>> When I query the table data, which was in WAL files(before the
> >>>> RegionServer machine went down) is not coming,
> >>>>
> >>>> One more thing what I observed is even when the WAL file not
> successfully
> >>>> processed then also it is moving to /oldWALs folder.
> >>>>
> >>>> So when I revert back the below 3 configuration in Region Server
> >>>> side
> and
> >>>> restart, since the WAL is already moved to oldWALS/ folder,
> >>>>
> >>>> So it will not get processed.
> >>>>
> >>>>
> >>>> <property>
> >>>>
> >>>>   <name>hbase.regionserver.hlog.reader.impl</name>
> >>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.hlog.writer.impl</name>
> >>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.wal.encryption</name>
> >>>>
> >>>>  <value>true</value>
> >>>>
> >>>> </property>
> >>>
> -----------------------------------------------------------------------
> --------------------------------------
> >>>>
> >>>>
> >>>> And one more scenario I tried (Anoop suggested), with the below
> >>>> configuration (instead of deleting the below 3 config paramters
> >>>>
> >>>> Kepp all but make only 'hbase.regionserver.wal.encryption=false')
> >>>> the encrypted wal file is getting processed
> >>>>
> >>>> Successfully, and the query table is giving the WAL data (before
> >>>> the RegionServer machine went down) correctly.
> >>>>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.hlog.reader.impl</name>
> >>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader
> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.hlog.writer.impl</name>
> >>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter
> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>>  <name>hbase.regionserver.wal.encryption</name>
> >>>>
> >>>>  <value>false</value>
> >>>>
> >>>> </property>
> >>>>
> >>>>
> >>>>
> >>>> Regards
> >>>>
> >>>> -Shankar
> >>>>
> >>>>
> >>>> This e-mail and its attachments contain confidential information
> >>>> from HUAWEI, which is intended only for the person or entity whose
> >>>> address
> is
> >>>> listed above. Any use of the information contained herein in any
> >>>> way (including, but not limited to, total or partial disclosure,
> >>> reproduction,
> >>>> or dissemination) by persons other than the intended recipient(s)
> >>>> is prohibited. If you receive this e-mail in error, please notify
> >>>> the
> sender
> >>>> by phone or email immediately and delete it!
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>>
> >>>> From: andrew.purtell@gmail.com [mailto:andrew.purtell@gmail.com
> >>>> <andrew.purtell@gmail.com>] On Behalf Of Andrew Purtell
> >>>>
> >>>> Sent: 26 July 2014 AM 02:21
> >>>>
> >>>> To: user@hbase.apache.org
> >>>>
> >>>> Subject: Re: HBase file encryption, inconsistencies observed and
> >>>> data
> >>> loss
> >>>>
> >>>>
> >>>> Encryption (or the lack of it) doesn't explain missing HFiles.
> >>>>
> >>>>
> >>>> Most likely if you are having a problem with encryption, this will
> >>>> manifest as follows: HFiles will be present. However, you will
> find
> many
> >>>> IOExceptions in the regionserver logs as they attempt to open the
> HFiles
> >>>> but fail because the data is unreadable.
> >>>>
> >>>>
> >>>> We should start by looking at more basic issues. What could
> explain
> >>>> the total disappearance of HFiles.
> >>>>
> >>>>
> >>>> Is the HBase root filesystem on HDFS (fs URL starts with hdfs://)
> >>>> or on the local filesystem (fs URL starts with file://)?
> >>>>
> >>>>
> >>>> In your email you provide only exceptions printed by the client.
> >>>> What
> >>> kind
> >>>> of exceptions appear in the regionserver logs? Or appear in the
> >>>> master
> >>> log?
> >>>>
> >>>> If the logs are large your best bet is to pastebin them and then
> >>>> send
> the
> >>>> URL to the paste in your response.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
> >>>> shankar.hiremath@huawei.com> wrote:
> >>>>
> >>>>
> >>>> HBase file encryption some inconsistencies observed and data loss
> >>>>
> >>>> happens after running the hbck tool,
> >>>>
> >>>> the operation steps are as below.    (one thing what I observed is,
> on
> >>>>
> >>>> startup of HMaster if it is not able to process the WAL file, then
> >>>>
> >>>> also it moved to /oldWALs)
> >>>>
> >>>>
> >>>> Procedure:
> >>>>
> >>>> 1. Start the Hbase services (HMaster & region Server) 2. Enable
> >>>> HFile
> >>>>
> >>>> encryption and WAL file encryption as below, and perform 'table4-
> 0'
> >>>>
> >>>> put operations (100 records added) <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider</name>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value
> >>>> >
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider.parameters</name>
> >>>>
> >>>>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@2
> >>>> 34
> >>>>
> >>>> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.master.key.name</name>
> >>>>
> >>>> <value>hdfs</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hfile.format.version</name>
> >>>>
> >>>> <value>3</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.hlog.reader.impl</name>
> >>>>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogRe
> >>>> ade
> >>>>
> >>>> r</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.hlog.writer.impl</name>
> >>>>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWr
> >>>> ite
> >>>>
> >>>> r</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.wal.encryption</name>
> >>>>
> >>>> <value>true</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> 3. Machine went down, so all process went down
> >>>>
> >>>>
> >>>> 4. We disabled the WAL file encryption for performance reason, and
> >>>>
> >>>> keep encryption only for Hfile, as below <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider</name>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value
> >>>> >
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider.parameters</name>
> >>>>
> >>>>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@2
> >>>> 34
> >>>>
> >>>> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.master.key.name</name>
> >>>>
> >>>> <value>hdfs</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hfile.format.version</name>
> >>>>
> >>>> <value>3</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> 5. Start the Region Server and query the 'table4-0' data
> >>>>
> >>>> hbase(main):003:0> count 'table4-0'
> >>>>
> >>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> >>>>
> >>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> >>>>
> >>>> online on
> >>>>
> >>>> XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncod
> >>>> edN
> >>>>
> >>>> ame(HRegionServer.java:2685)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegio
> >>>> nSe
> >>>>
> >>>> rver.java:4119)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> >>>>
> >>>> java:3066)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientServi
> >>>> ce$
> >>>>
> >>>> 2.callBlockingMethod(ClientProtos.java:29497)
> >>>>
> >>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> >>>>
> >>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleR
> >>>> pcS
> >>>>
> >>>> cheduler.java:168)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpc
> >>>> Sch
> >>>>
> >>>> eduler.java:39)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSched
> >>>> ule
> >>>>
> >>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
> >>>>
> >>>> 6. Not able to read the data, so we decided to revert back the
> >>>>
> >>>> configuration (as original) 7. Kill/Stop the Region Server, revert
> >>>> all
> >>>>
> >>>> the configurations as original, as below <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider</name>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value
> >>>> >
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.keyprovider.parameters</name>
> >>>>
> >>>>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@2
> >>>> 34
> >>>>
> >>>> </value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.crypto.master.key.name</name>
> >>>>
> >>>> <value>hdfs</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hfile.format.version</name>
> >>>>
> >>>> <value>3</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.hlog.reader.impl</name>
> >>>>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogRe
> >>>> ade
> >>>>
> >>>> r</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.hlog.writer.impl</name>
> >>>>
> >>>>
> >>>>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWr
> >>>> ite
> >>>>
> >>>> r</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> <property>
> >>>>
> >>>> <name>hbase.regionserver.wal.encryption</name>
> >>>>
> >>>> <value>true</value>
> >>>>
> >>>> </property>
> >>>>
> >>>> 7. Start the Region Server, and perform the 'table4-0' query
> >>>>
> >>>> hbase(main):003:0> count 'table4-0'
> >>>>
> >>>> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> >>>>
> >>>> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not
> >>>>
> >>>> online on
> >>>>
> >>>> XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncod
> >>>> edN
> >>>>
> >>>> ame(HRegionServer.java:2685)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegio
> >>>> nSe
> >>>>
> >>>> rver.java:4119)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.
> >>>>
> >>>> java:3066)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientServi
> >>>> ce$
> >>>>
> >>>> 2.callBlockingMethod(ClientProtos.java:29497)
> >>>>
> >>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> >>>>
> >>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleR
> >>>> pcS
> >>>>
> >>>> cheduler.java:168)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpc
> >>>> Sch
> >>>>
> >>>> eduler.java:39)
> >>>>
> >>>> at
> >>>>
> >>>>
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcSched
> >>>> ule
> >>>>
> >>>> r.java:111) at java.lang.Thread.run(Thread.java:662)
> >>>>
> >>>> 8. Run the hbase hbck to repair, as below ./hbase hbck -details
> >>>>
> >>>> .........................
> >>>>
> >>>> Summary:
> >>>>
> >>>> table1-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table2-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table3-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table4-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table5-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table6-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table7-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table8-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> table9-0 is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> hbase:meta is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> hbase:namespace is okay.
> >>>>
> >>>> Number of regions: 0
> >>>>
> >>>> Deployed on:
> >>>>
> >>>> 22 inconsistencies detected.
> >>>>
> >>>> Status: INCONSISTENT
> >>>>
> >>>> 2014-07-24 19:13:05,532 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> master
> >>>>
> >>>> protocol: MasterService
> >>>>
> >>>> 2014-07-24 19:13:05,533 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> >>>> zookeeper
> >>>>
> >>>> sessionid=0x1475d1611611bcf
> >>>>
> >>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing
> >>> session:
> >>>>
> >>>> 0x1475d1611611bcf
> >>>>
> >>>> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing
> >>>>
> >>>> client for session: 0x1475d1611611bcf
> >>>>
> >>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf,
> packet::
> >>>>
> >>>> clientPath:null serverPath:null finished:false header:: 6,-11
> >>> replyHeader::
> >>>>
> >>>> 6,4295102074,0 request:: null response:: null
> >>>>
> >>>> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn:
> >>>>
> >>>> Disconnecting client for session: 0x1475d1611611bcf
> >>>>
> >>>> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: An exception was thrown while closing send
> >>>>
> >>>> thread for session 0x1475d1611611bcf : Unable to read additional
> >>>> data
> >>>>
> >>>> from server sessionid 0x1475d1611611bcf, likely server has closed
> >>>>
> >>>> socket
> >>>>
> >>>> 2014-07-24 19:13:05,546 INFO [main-EventThread]
> zookeeper.ClientCnxn:
> >>>>
> >>>> EventThread shut down
> >>>>
> >>>> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> >>>>
> >>>> 0x1475d1611611bcf closed
> >>>>
> >>>> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> >>>>
> >>>> 9. Fix the assignments as below
> >>>>
> >>>> ./hbase hbck -fixAssignments
> >>>>
> >>>> Summary:
> >>>>
> >>>> table1-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table2-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table3-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table4-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table5-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table6-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table7-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table8-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table9-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is
> okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> 0 inconsistencies detected.
> >>>>
> >>>> Status: OK
> >>>>
> >>>> 2014-07-24 19:44:55,194 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> master
> >>>>
> >>>> protocol: MasterService
> >>>>
> >>>> 2014-07-24 19:44:55,194 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> >>>> zookeeper
> >>>>
> >>>> sessionid=0x2475d15f7b31b73
> >>>>
> >>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing
> >>> session:
> >>>>
> >>>> 0x2475d15f7b31b73
> >>>>
> >>>> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing
> >>>>
> >>>> client for session: 0x2475d15f7b31b73
> >>>>
> >>>> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73,
> packet::
> >>>>
> >>>> clientPath:null serverPath:null finished:false header:: 7,-11
> >>> replyHeader::
> >>>>
> >>>> 7,4295102377,0 request:: null response:: null
> >>>>
> >>>> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn:
> >>>>
> >>>> Disconnecting client for session: 0x2475d15f7b31b73
> >>>>
> >>>> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: An exception was thrown while closing send
> >>>>
> >>>> thread for session 0x2475d15f7b31b73 : Unable to read additional
> >>>> data
> >>>>
> >>>> from server sessionid 0x2475d15f7b31b73, likely server has closed
> >>>>
> >>>> socket
> >>>>
> >>>> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> >>>>
> >>>> 0x2475d15f7b31b73 closed
> >>>>
> >>>> 2014-07-24 19:44:55,204 INFO [main-EventThread]
> zookeeper.ClientCnxn:
> >>>>
> >>>> EventThread shut down
> >>>>
> >>>> 10. Fix the assignments as below
> >>>>
> >>>> ./hbase hbck -fixAssignments -fixMeta
> >>>>
> >>>> Summary:
> >>>>
> >>>> table1-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table2-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table3-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table4-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table5-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table6-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table7-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table8-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> table9-0 is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:meta is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:acl is okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146 hbase:namespace is
> okay.
> >>>>
> >>>> Number of regions: 1
> >>>>
> >>>> Deployed on: XX-XX-XX-XX,60020,1406209023146
> >>>>
> >>>> 0 inconsistencies detected.
> >>>>
> >>>> Status: OK
> >>>>
> >>>> 2014-07-24 19:46:16,290 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> master
> >>>>
> >>>> protocol: MasterService
> >>>>
> >>>> 2014-07-24 19:46:16,290 INFO [main]
> >>>>
> >>>> client.HConnectionManager$HConnectionImplementation: Closing
> >>>> zookeeper
> >>>>
> >>>> sessionid=0x3475d1605321be9
> >>>>
> >>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing
> >>> session:
> >>>>
> >>>> 0x3475d1605321be9
> >>>>
> >>>> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing
> >>>>
> >>>> client for session: 0x3475d1605321be9
> >>>>
> >>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9,
> packet::
> >>>>
> >>>> clientPath:null serverPath:null finished:false header:: 6,-11
> >>> replyHeader::
> >>>>
> >>>> 6,4295102397,0 request:: null response:: null
> >>>>
> >>>> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn:
> >>>>
> >>>> Disconnecting client for session: 0x3475d1605321be9
> >>>>
> >>>> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> >>>>
> >>>> zookeeper.ClientCnxn: An exception was thrown while closing send
> >>>>
> >>>> thread for session 0x3475d1605321be9 : Unable to read additional
> >>>> data
> >>>>
> >>>> from server sessionid 0x3475d1605321be9, likely server has closed
> >>>>
> >>>> socket
> >>>>
> >>>> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> >>>>
> >>>> 0x3475d1605321be9 closed
> >>>>
> >>>> 2014-07-24 19:46:16,300 INFO [main-EventThread]
> zookeeper.ClientCnxn:
> >>>>
> >>>> EventThread shut down
> >>>>
> >>>> hbase(main):006:0> count 'table4-0'
> >>>>
> >>>> 0 row(s) in 0.0200 seconds
> >>>>
> >>>> => 0
> >>>>
> >>>> hbase(main):007:0>
> >>>>
> >>>> Complete data loss happened,
> >>>>
> >>>> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any
> >>>> data
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> [X]
> >>>>
> >>>> This e-mail and its attachments contain confidential information
> >>>> from
> >>>>
> >>>> HUAWEI, which is intended only for the person or entity whose
> >>>> address
> >>>>
> >>>> is listed above. Any use of the information contained herein in
> any
> >>>>
> >>>> way (including, but not limited to, total or partial disclosure,
> >>>>
> >>>> reproduction, or dissemination) by persons other than the intended
> >>>>
> >>>> recipient(s) is prohibited. If you receive this e-mail in error,
> >>>>
> >>>> please notify the sender by phone or email immediately and delete
> it!
> >>>>
> >>>> [X]
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Best regards,
> >>>>
> >>>>
> >>>> - Andy
> >>>>
> >>>>
> >>>> Problems worthy of attack prove their worth by hitting back. -
> Piet
> Hein
> >>>> (via Tom White)
> >>>
> >
Mime
View raw message