hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-11584) HBase file encryption, consistences observed and data loss
Date Thu, 24 Jul 2014 19:16:39 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Purtell resolved HBASE-11584.
------------------------------------

    Resolution: Invalid

Please mail user@hbase.apache.org for reporting potential problems and community assistance
in troubleshooting. JIRA isn't the correct forum for reporting problems until the cause is
known. Nothing reported on this issue indicates encryption is more than an incidental detail.
We don't encrypt the META table. There could be many reasons why you lost your HFiles. Did
you keep the test configuration that puts the HBase root in /tmp for example. Anyway, please
don't reply here, take this to user@hbase.apache.org. 

> HBase file encryption, consistences observed and data loss
> ----------------------------------------------------------
>
>                 Key: HBASE-11584
>                 URL: https://issues.apache.org/jira/browse/HBASE-11584
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck, HFile
>    Affects Versions: 0.98.3
>         Environment: SuSE 11 SP3
>            Reporter: shankarlingayya
>            Priority: Critical
>
> HBase file encryption some consistences observed and data loss happens after running
the hbck tool,
> the operation steps are as below.
> Procedure:
> 1. Start the Hbase services (HMaster & region Server)
> 2. Enable HFile encryption and WAL file encryption as below, and perform 'table4-0' put
operations (100 records added)
> <property>
>  <name>hbase.crypto.keyprovider</name>
>  <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
>  <name>hbase.crypto.keyprovider.parameters</name>
>  <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
> </property>
> <property>
>  <name>hbase.crypto.master.key.name</name>
>  <value>hdfs</value>
> </property>
> <property>
>  <name>hfile.format.version</name>
>  <value>3</value>
> </property>
> <property>
>  <name>hbase.regionserver.hlog.reader.impl</name>
>  <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
>  <name>hbase.regionserver.hlog.writer.impl</name>
>  <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
>  <name>hbase.regionserver.wal.encryption</name>
>  <value>true</value>
> </property>
>  
> 3. Machine went down, so all process went down
> 4. We disabled the WAL file encryption for performance reason, and keep encryption only
for Hfile, as below
> <property>
>  <name>hbase.crypto.keyprovider</name>
>  <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
>  <name>hbase.crypto.keyprovider.parameters</name>
>  <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
> </property>
> <property>
>  <name>hbase.crypto.master.key.name</name>
>  <value>hdfs</value>
> </property>
> <property>
>  <name>hfile.format.version</name>
>  <value>3</value>
> </property>
> 5. Start the Region Server and query the 'table4-0' data
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332.
is not online on XX-XX-XX-XX,60020,1406209023146
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
>         at java.lang.Thread.run(Thread.java:662)
> 6. Not able to read the data, so we decided to revert back the configuration (as original)
> 7. Kill/Stop the Region Server, revert all the configurations as original, as below
> <property>
>  <name>hbase.crypto.keyprovider</name>
>  <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
>  <name>hbase.crypto.keyprovider.parameters</name>
>  <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
> </property>
> <property>
>  <name>hbase.crypto.master.key.name</name>
>  <value>hdfs</value>
> </property>
> <property>
>  <name>hfile.format.version</name>
>  <value>3</value>
> </property>
> <property>
>  <name>hbase.regionserver.hlog.reader.impl</name>
>  <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
>  <name>hbase.regionserver.hlog.writer.impl</name>
>  <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
>  <name>hbase.regionserver.wal.encryption</name>
>  <value>true</value>
> </property>
> 7. Start the Region Server, and perform the 'table4-0' query 
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332.
is not online on XX-XX-XX-XX,60020,1406209023146
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
>         at java.lang.Thread.run(Thread.java:662)
> 8. Run the hbase hbck to repair, as below
> ./hbase hbck -details
> .........................
> Summary:
>   table1-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table2-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table3-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table4-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table5-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table6-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table7-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table8-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table9-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   hbase:meta is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:acl is okay.
>     Number of regions: 0
>     Deployed on:
>   hbase:namespace is okay.
>     Number of regions: 0
>     Deployed on:
> 22 inconsistencies detected.
> Status: INCONSISTENT
> 2014-07-24 19:13:05,532 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
> 2014-07-24 19:13:05,533 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing client for session:
0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
Reading reply sessionid:0x1475d1611611bcf, packet:: clientPath:null serverPath:null finished:false
header:: 6,-11  replyHeader:: 6,4295102074,0  request:: null response:: null
> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
An exception was thrown while closing send thread for session 0x1475d1611611bcf : Unable to
read additional data from server sessionid 0x1475d1611611bcf, likely server has closed socket
> 2014-07-24 19:13:05,546 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut
down
> 2014-07-24 19:13:05,546 INFO  [main] zookeeper.ZooKeeper: Session: 0x1475d1611611bcf
closed
> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> 9. Fix the assignments as below
> ./hbase hbck -fixAssignments
> Summary:
>   table1-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table2-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table3-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table4-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table5-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table6-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table7-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table8-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table9-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:meta is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:acl is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:namespace is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:44:55,194 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
> 2014-07-24 19:44:55,194 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing client for session:
0x2475d15f7b31b73
> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
Reading reply sessionid:0x2475d15f7b31b73, packet:: clientPath:null serverPath:null finished:false
header:: 7,-11  replyHeader:: 7,4295102377,0  request:: null response:: null
> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x2475d15f7b31b73
> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
An exception was thrown while closing send thread for session 0x2475d15f7b31b73 : Unable to
read additional data from server sessionid 0x2475d15f7b31b73, likely server has closed socket
> 2014-07-24 19:44:55,204 INFO  [main] zookeeper.ZooKeeper: Session: 0x2475d15f7b31b73
closed
> 2014-07-24 19:44:55,204 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut
down
> 10. Fix the assignments as below
> ./hbase hbck -fixAssignments -fixMeta
> Summary:
>   table1-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table2-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table3-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table4-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table5-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table6-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table7-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table8-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table9-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:meta is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:acl is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:namespace is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:46:16,290 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
> 2014-07-24 19:46:16,290 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing client for session:
0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
Reading reply sessionid:0x3475d1605321be9, packet:: clientPath:null serverPath:null finished:false
header:: 6,-11  replyHeader:: 6,4295102397,0  request:: null response:: null
> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
An exception was thrown while closing send thread for session 0x3475d1605321be9 : Unable to
read additional data from server sessionid 0x3475d1605321be9, likely server has closed socket
> 2014-07-24 19:46:16,300 INFO  [main] zookeeper.ZooKeeper: Session: 0x3475d1605321be9
closed
> 2014-07-24 19:46:16,300 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut
down
> hbase(main):006:0> count 'table4-0'
> 0 row(s) in 0.0200 seconds
> => 0
> hbase(main):007:0> 
> Complete data loss happened,
> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message