hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase file encryption, inconsistencies observed and data loss
Date Fri, 25 Jul 2014 20:50:42 GMT
Encryption (or the lack of it) doesn't explain missing HFiles.

Most likely if you are having a problem with encryption, this will manifest
as follows: HFiles will be present. However, you will find many
IOExceptions in the regionserver logs as they attempt to open the HFiles
but fail because the data is unreadable.

We should start by looking at more basic issues. What could explain the
total disappearance of HFiles.

Is the HBase root filesystem on HDFS (fs URL starts with hdfs://) or on the
local filesystem (fs URL starts with file://)?

In your email you provide only exceptions printed by the client. What kind
of exceptions appear in the regionserver logs? Or appear in the master log?
If the logs are large your best bet is to pastebin them and then send the
URL to the paste in your response.




On Fri, Jul 25, 2014 at 7:08 AM, Shankar hiremath <
shankar.hiremath@huawei.com> wrote:

> HBase file encryption some inconsistencies observed and data loss happens
> after running the hbck tool,
> the operation steps are as below.    (one thing what I observed is, on
> startup of HMaster if it is not able to process the WAL file, then also it
> moved to /oldWALs)
>
> Procedure:
> 1. Start the Hbase services (HMaster & region Server)
> 2. Enable HFile encryption and WAL file encryption as below, and perform
> 'table4-0' put operations (100 records added)
> <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 3. Machine went down, so all process went down
>
> 4. We disabled the WAL file encryption for performance reason, and keep
> encryption only for Hfile, as below
> <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> 5. Start the Region Server and query the 'table4-0' data
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online on
> XX-XX-XX-XX,60020,1406209023146
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
> at java.lang.Thread.run(Thread.java:662)
> 6. Not able to read the data, so we decided to revert back the
> configuration (as original)
> 7. Kill/Stop the Region Server, revert all the configurations as original,
> as below
> <property>
> <name>hbase.crypto.keyprovider</name>
> <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
> <name>hbase.crypto.keyprovider.parameters</name>
> <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234
> </value>
> </property>
> <property>
> <name>hbase.crypto.master.key.name</name>
> <value>hdfs</value>
> </property>
> <property>
> <name>hfile.format.version</name>
> <value>3</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.reader.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
> <name>hbase.regionserver.hlog.writer.impl</name>
>
> <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
> <name>hbase.regionserver.wal.encryption</name>
> <value>true</value>
> </property>
> 7. Start the Region Server, and perform the 'table4-0' query
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region
> table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332. is not online on
> XX-XX-XX-XX,60020,1406209023146
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
> at
> org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
> at java.lang.Thread.run(Thread.java:662)
> 8. Run the hbase hbck to repair, as below
> ./hbase hbck -details
> .........................
> Summary:
> table1-0 is okay.
> Number of regions: 0
> Deployed on:
> table2-0 is okay.
> Number of regions: 0
> Deployed on:
> table3-0 is okay.
> Number of regions: 0
> Deployed on:
> table4-0 is okay.
> Number of regions: 0
> Deployed on:
> table5-0 is okay.
> Number of regions: 0
> Deployed on:
> table6-0 is okay.
> Number of regions: 0
> Deployed on:
> table7-0 is okay.
> Number of regions: 0
> Deployed on:
> table8-0 is okay.
> Number of regions: 0
> Deployed on:
> table9-0 is okay.
> Number of regions: 0
> Deployed on:
> hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:acl is okay.
> Number of regions: 0
> Deployed on:
> hbase:namespace is okay.
> Number of regions: 0
> Deployed on:
> 22 inconsistencies detected.
> Status: INCONSISTENT
> 2014-07-24 19:13:05,532 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:13:05,533 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session:
> 0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing client
> for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x1475d1611611bcf, packet::
> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
> 6,4295102074,0 request:: null response:: null
> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: Disconnecting
> client for session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread for
> session 0x1475d1611611bcf : Unable to read additional data from server
> sessionid 0x1475d1611611bcf, likely server has closed socket
> 2014-07-24 19:13:05,546 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> 2014-07-24 19:13:05,546 INFO [main] zookeeper.ZooKeeper: Session:
> 0x1475d1611611bcf closed
> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> 9. Fix the assignments as below
> ./hbase hbck -fixAssignments
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:44:55,194 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session:
> 0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing client
> for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x2475d15f7b31b73, packet::
> clientPath:null serverPath:null finished:false header:: 7,-11 replyHeader::
> 7,4295102377,0 request:: null response:: null
> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: Disconnecting
> client for session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread for
> session 0x2475d15f7b31b73 : Unable to read additional data from server
> sessionid 0x2475d15f7b31b73, likely server has closed socket
> 2014-07-24 19:44:55,204 INFO [main] zookeeper.ZooKeeper: Session:
> 0x2475d15f7b31b73 closed
> 2014-07-24 19:44:55,204 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> 10. Fix the assignments as below
> ./hbase hbck -fixAssignments -fixMeta
> Summary:
> table1-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table2-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table3-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table4-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table5-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table6-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table7-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table8-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> table9-0 is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:meta is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:acl is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> hbase:namespace is okay.
> Number of regions: 1
> Deployed on: XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing master
> protocol: MasterService
> 2014-07-24 19:46:16,290 INFO [main]
> client.HConnectionManager$HConnectionImplementation: Closing zookeeper
> sessionid=0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session:
> 0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing client
> for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: Reading reply sessionid:0x3475d1605321be9, packet::
> clientPath:null serverPath:null finished:false header:: 6,-11 replyHeader::
> 6,4295102397,0 request:: null response:: null
> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: Disconnecting
> client for session: 0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)]
> zookeeper.ClientCnxn: An exception was thrown while closing send thread for
> session 0x3475d1605321be9 : Unable to read additional data from server
> sessionid 0x3475d1605321be9, likely server has closed socket
> 2014-07-24 19:46:16,300 INFO [main] zookeeper.ZooKeeper: Session:
> 0x3475d1605321be9 closed
> 2014-07-24 19:46:16,300 INFO [main-EventThread] zookeeper.ClientCnxn:
> EventThread shut down
> hbase(main):006:0> count 'table4-0'
> 0 row(s) in 0.0200 seconds
> => 0
> hbase(main):007:0>
> Complete data loss happened,
> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data
>
>
>
> [X]
> This e-mail and its attachments contain confidential information from
> HUAWEI, which is intended only for the person or entity whose address is
> listed above. Any use of the information contained herein in any way
> (including, but not limited to, total or partial disclosure, reproduction,
> or dissemination) by persons other than the intended recipient(s) is
> prohibited. If you receive this e-mail in error, please notify the sender
> by phone or email immediately and delete it!
> [X]
>
>
>
>
>
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message