hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shankarlingayya (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-11584) HBase file encryption, consistences observed and data loss
Date Thu, 24 Jul 2014 14:40:38 GMT

     [ https://issues.apache.org/jira/browse/HBASE-11584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

shankarlingayya updated HBASE-11584:
------------------------------------

    Description: 



Procedure:
1. Start the Hbase services (HMaster & region Server)
2. Enable HFile encryption and WAL file encryption as below, and perform 'table4-0' put operations
(100 records added)
<property>
 <name>hbase.crypto.keyprovider</name>
 <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
 <name>hbase.crypto.keyprovider.parameters</name>
 <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
</property>
<property>
 <name>hbase.crypto.master.key.name</name>
 <value>hdfs</value>
</property>
<property>
 <name>hfile.format.version</name>
 <value>3</value>
</property>
<property>
 <name>hbase.regionserver.hlog.reader.impl</name>
 <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
 <name>hbase.regionserver.hlog.writer.impl</name>
 <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
 <name>hbase.regionserver.wal.encryption</name>
 <value>true</value>
</property>
 
3. Machine went down, so all process went down
4. We disabled the WAL file encryption for performance reason, and keep encryption only for
Hfile, as below
<property>
 <name>hbase.crypto.keyprovider</name>
 <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
 <name>hbase.crypto.keyprovider.parameters</name>
 <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
</property>
<property>
 <name>hbase.crypto.master.key.name</name>
 <value>hdfs</value>
</property>
<property>
 <name>hfile.format.version</name>
 <value>3</value>
</property>

5. Start the Region Server and query the 'table4-0' data
hbase(main):003:0> count 'table4-0'
ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332.
is not online on XX-XX-XX-XX,60020,1406209023146
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
        at java.lang.Thread.run(Thread.java:662)

6. Not able to read the data, so we decided to revert back the configuration (as original)

7. Kill/Stop the Region Server, revert all the configurations as original, as below

<property>
 <name>hbase.crypto.keyprovider</name>
 <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
 <name>hbase.crypto.keyprovider.parameters</name>
 <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
</property>
<property>
 <name>hbase.crypto.master.key.name</name>
 <value>hdfs</value>
</property>
<property>
 <name>hfile.format.version</name>
 <value>3</value>
</property>
<property>
 <name>hbase.regionserver.hlog.reader.impl</name>
 <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
 <name>hbase.regionserver.hlog.writer.impl</name>
 <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
 <name>hbase.regionserver.wal.encryption</name>
 <value>true</value>
</property>

7. Start the Region Server, and perform the 'table4-0' query 
hbase(main):003:0> count 'table4-0'
ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332.
is not online on XX-XX-XX-XX,60020,1406209023146
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
        at java.lang.Thread.run(Thread.java:662)

8. Run the hbase hbck to repair, as below
./hbase hbck -details
.........................
Summary:
  table1-0 is okay.
    Number of regions: 0
    Deployed on:
  table2-0 is okay.
    Number of regions: 0
    Deployed on:
  table3-0 is okay.
    Number of regions: 0
    Deployed on:
  table4-0 is okay.
    Number of regions: 0
    Deployed on:
  table5-0 is okay.
    Number of regions: 0
    Deployed on:
  table6-0 is okay.
    Number of regions: 0
    Deployed on:
  table7-0 is okay.
    Number of regions: 0
    Deployed on:
  table8-0 is okay.
    Number of regions: 0
    Deployed on:
  table9-0 is okay.
    Number of regions: 0
    Deployed on:
  hbase:meta is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:acl is okay.
    Number of regions: 0
    Deployed on:
  hbase:namespace is okay.
    Number of regions: 0
    Deployed on:
22 inconsistencies detected.
Status: INCONSISTENT
2014-07-24 19:13:05,532 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
2014-07-24 19:13:05,533 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x1475d1611611bcf
2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x1475d1611611bcf
2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing client for session: 0x1475d1611611bcf
2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: Reading
reply sessionid:0x1475d1611611bcf, packet:: clientPath:null serverPath:null finished:false
header:: 6,-11  replyHeader:: 6,4295102074,0  request:: null response:: null
2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x1475d1611611bcf
2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: An
exception was thrown while closing send thread for session 0x1475d1611611bcf : Unable to read
additional data from server sessionid 0x1475d1611611bcf, likely server has closed socket
2014-07-24 19:13:05,546 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2014-07-24 19:13:05,546 INFO  [main] zookeeper.ZooKeeper: Session: 0x1475d1611611bcf closed
shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>


9. Fix the assignments as below
./hbase hbck -fixAssignments
Summary:
  table1-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table2-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table3-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table4-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table5-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table6-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table7-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table8-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table9-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:meta is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:acl is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:namespace is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
0 inconsistencies detected.
Status: OK
2014-07-24 19:44:55,194 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
2014-07-24 19:44:55,194 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x2475d15f7b31b73
2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x2475d15f7b31b73
2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing client for session: 0x2475d15f7b31b73
2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: Reading
reply sessionid:0x2475d15f7b31b73, packet:: clientPath:null serverPath:null finished:false
header:: 7,-11  replyHeader:: 7,4295102377,0  request:: null response:: null
2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x2475d15f7b31b73
2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: An
exception was thrown while closing send thread for session 0x2475d15f7b31b73 : Unable to read
additional data from server sessionid 0x2475d15f7b31b73, likely server has closed socket
2014-07-24 19:44:55,204 INFO  [main] zookeeper.ZooKeeper: Session: 0x2475d15f7b31b73 closed
2014-07-24 19:44:55,204 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down

10. Fix the assignments as below
./hbase hbck -fixAssignments -fixMeta
Summary:
  table1-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table2-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table3-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table4-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table5-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table6-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table7-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table8-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table9-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:meta is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:acl is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:namespace is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
0 inconsistencies detected.
Status: OK
2014-07-24 19:46:16,290 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
2014-07-24 19:46:16,290 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x3475d1605321be9
2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x3475d1605321be9
2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing client for session: 0x3475d1605321be9
2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: Reading
reply sessionid:0x3475d1605321be9, packet:: clientPath:null serverPath:null finished:false
header:: 6,-11  replyHeader:: 6,4295102397,0  request:: null response:: null
2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x3475d1605321be9
2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: An
exception was thrown while closing send thread for session 0x3475d1605321be9 : Unable to read
additional data from server sessionid 0x3475d1605321be9, likely server has closed socket
2014-07-24 19:46:16,300 INFO  [main] zookeeper.ZooKeeper: Session: 0x3475d1605321be9 closed
2014-07-24 19:46:16,300 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down

hbase(main):006:0> count 'table4-0'
0 row(s) in 0.0200 seconds

=> 0
hbase(main):007:0> 

Complete data loss happened,

WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data


  was:
Procedure:
1. Start the Hbase services (HMaster & region Server)
2. Enable HFile encryption and WAL file encryption as below, and perform 'table4-0' put operations
(100 records added)
<property>
 <name>hbase.crypto.keyprovider</name>
 <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
 <name>hbase.crypto.keyprovider.parameters</name>
 <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
</property>
<property>
 <name>hbase.crypto.master.key.name</name>
 <value>hdfs</value>
</property>
<property>
 <name>hfile.format.version</name>
 <value>3</value>
</property>
<property>
 <name>hbase.regionserver.hlog.reader.impl</name>
 <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
 <name>hbase.regionserver.hlog.writer.impl</name>
 <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
 <name>hbase.regionserver.wal.encryption</name>
 <value>true</value>
</property>
 
3. Machine went down, so all process went down
4. We disabled the WAL file encryption for performance reason, and keep encryption only for
Hfile, as below
<property>
 <name>hbase.crypto.keyprovider</name>
 <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
 <name>hbase.crypto.keyprovider.parameters</name>
 <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
</property>
<property>
 <name>hbase.crypto.master.key.name</name>
 <value>hdfs</value>
</property>
<property>
 <name>hfile.format.version</name>
 <value>3</value>
</property>

5. Start the Region Server and query the 'table4-0' data
hbase(main):003:0> count 'table4-0'
ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332.
is not online on XX-XX-XX-XX,60020,1406209023146
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
        at java.lang.Thread.run(Thread.java:662)

6. Not able to read the data, so we decided to revert back the configuration (as original)

7. Kill/Stop the Region Server, revert all the configurations as original, as below

<property>
 <name>hbase.crypto.keyprovider</name>
 <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
</property>
<property>
 <name>hbase.crypto.keyprovider.parameters</name>
 <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
</property>
<property>
 <name>hbase.crypto.master.key.name</name>
 <value>hdfs</value>
</property>
<property>
 <name>hfile.format.version</name>
 <value>3</value>
</property>
<property>
 <name>hbase.regionserver.hlog.reader.impl</name>
 <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
</property>
<property>
 <name>hbase.regionserver.hlog.writer.impl</name>
 <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
</property>
<property>
 <name>hbase.regionserver.wal.encryption</name>
 <value>true</value>
</property>

7. Start the Region Server, and perform the 'table4-0' query 
hbase(main):003:0> count 'table4-0'
ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332.
is not online on XX-XX-XX-XX,60020,1406209023146
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
        at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
        at java.lang.Thread.run(Thread.java:662)

8. Run the hbase hbck to repair, as below
./hbase hbck -details
.........................
Summary:
  table1-0 is okay.
    Number of regions: 0
    Deployed on:
  table2-0 is okay.
    Number of regions: 0
    Deployed on:
  table3-0 is okay.
    Number of regions: 0
    Deployed on:
  table4-0 is okay.
    Number of regions: 0
    Deployed on:
  table5-0 is okay.
    Number of regions: 0
    Deployed on:
  table6-0 is okay.
    Number of regions: 0
    Deployed on:
  table7-0 is okay.
    Number of regions: 0
    Deployed on:
  table8-0 is okay.
    Number of regions: 0
    Deployed on:
  table9-0 is okay.
    Number of regions: 0
    Deployed on:
  hbase:meta is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:acl is okay.
    Number of regions: 0
    Deployed on:
  hbase:namespace is okay.
    Number of regions: 0
    Deployed on:
22 inconsistencies detected.
Status: INCONSISTENT
2014-07-24 19:13:05,532 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
2014-07-24 19:13:05,533 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x1475d1611611bcf
2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x1475d1611611bcf
2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing client for session: 0x1475d1611611bcf
2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: Reading
reply sessionid:0x1475d1611611bcf, packet:: clientPath:null serverPath:null finished:false
header:: 6,-11  replyHeader:: 6,4295102074,0  request:: null response:: null
2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x1475d1611611bcf
2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: An
exception was thrown while closing send thread for session 0x1475d1611611bcf : Unable to read
additional data from server sessionid 0x1475d1611611bcf, likely server has closed socket
2014-07-24 19:13:05,546 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
2014-07-24 19:13:05,546 INFO  [main] zookeeper.ZooKeeper: Session: 0x1475d1611611bcf closed
shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>


9. Fix the assignments as below
./hbase hbck -fixAssignments
Summary:
  table1-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table2-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table3-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table4-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table5-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table6-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table7-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table8-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table9-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:meta is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:acl is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:namespace is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
0 inconsistencies detected.
Status: OK
2014-07-24 19:44:55,194 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
2014-07-24 19:44:55,194 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x2475d15f7b31b73
2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x2475d15f7b31b73
2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing client for session: 0x2475d15f7b31b73
2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: Reading
reply sessionid:0x2475d15f7b31b73, packet:: clientPath:null serverPath:null finished:false
header:: 7,-11  replyHeader:: 7,4295102377,0  request:: null response:: null
2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x2475d15f7b31b73
2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: An
exception was thrown while closing send thread for session 0x2475d15f7b31b73 : Unable to read
additional data from server sessionid 0x2475d15f7b31b73, likely server has closed socket
2014-07-24 19:44:55,204 INFO  [main] zookeeper.ZooKeeper: Session: 0x2475d15f7b31b73 closed
2014-07-24 19:44:55,204 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down

10. Fix the assignments as below
./hbase hbck -fixAssignments -fixMeta
Summary:
  table1-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table2-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table3-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table4-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table5-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table6-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table7-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table8-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  table9-0 is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:meta is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:acl is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
  hbase:namespace is okay.
    Number of regions: 1
    Deployed on:  XX-XX-XX-XX,60020,1406209023146
0 inconsistencies detected.
Status: OK
2014-07-24 19:46:16,290 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
2014-07-24 19:46:16,290 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x3475d1605321be9
2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x3475d1605321be9
2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing client for session: 0x3475d1605321be9
2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: Reading
reply sessionid:0x3475d1605321be9, packet:: clientPath:null serverPath:null finished:false
header:: 6,-11  replyHeader:: 6,4295102397,0  request:: null response:: null
2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x3475d1605321be9
2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn: An
exception was thrown while closing send thread for session 0x3475d1605321be9 : Unable to read
additional data from server sessionid 0x3475d1605321be9, likely server has closed socket
2014-07-24 19:46:16,300 INFO  [main] zookeeper.ZooKeeper: Session: 0x3475d1605321be9 closed
2014-07-24 19:46:16,300 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down

hbase(main):006:0> count 'table4-0'
0 row(s) in 0.0200 seconds

=> 0
hbase(main):007:0> 

Complete data loss happened,

WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data



> HBase file encryption, consistences observed and data loss
> ----------------------------------------------------------
>
>                 Key: HBASE-11584
>                 URL: https://issues.apache.org/jira/browse/HBASE-11584
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck, HFile
>    Affects Versions: 0.98.3
>         Environment: SuSE 11 SP3
>            Reporter: shankarlingayya
>            Priority: Critical
>
> Procedure:
> 1. Start the Hbase services (HMaster & region Server)
> 2. Enable HFile encryption and WAL file encryption as below, and perform 'table4-0' put
operations (100 records added)
> <property>
>  <name>hbase.crypto.keyprovider</name>
>  <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
>  <name>hbase.crypto.keyprovider.parameters</name>
>  <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
> </property>
> <property>
>  <name>hbase.crypto.master.key.name</name>
>  <value>hdfs</value>
> </property>
> <property>
>  <name>hfile.format.version</name>
>  <value>3</value>
> </property>
> <property>
>  <name>hbase.regionserver.hlog.reader.impl</name>
>  <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
>  <name>hbase.regionserver.hlog.writer.impl</name>
>  <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
>  <name>hbase.regionserver.wal.encryption</name>
>  <value>true</value>
> </property>
>  
> 3. Machine went down, so all process went down
> 4. We disabled the WAL file encryption for performance reason, and keep encryption only
for Hfile, as below
> <property>
>  <name>hbase.crypto.keyprovider</name>
>  <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
>  <name>hbase.crypto.keyprovider.parameters</name>
>  <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
> </property>
> <property>
>  <name>hbase.crypto.master.key.name</name>
>  <value>hdfs</value>
> </property>
> <property>
>  <name>hfile.format.version</name>
>  <value>3</value>
> </property>
> 5. Start the Region Server and query the 'table4-0' data
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332.
is not online on XX-XX-XX-XX,60020,1406209023146
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
>         at java.lang.Thread.run(Thread.java:662)
> 6. Not able to read the data, so we decided to revert back the configuration (as original)
> 7. Kill/Stop the Region Server, revert all the configurations as original, as below
> <property>
>  <name>hbase.crypto.keyprovider</name>
>  <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
> </property>
> <property>
>  <name>hbase.crypto.keyprovider.parameters</name>
>  <value>jceks:///opt/shankar1/kdc_keytab/hbase.jks?password=Hadoop@234</value>
> </property>
> <property>
>  <name>hbase.crypto.master.key.name</name>
>  <value>hdfs</value>
> </property>
> <property>
>  <name>hfile.format.version</name>
>  <value>3</value>
> </property>
> <property>
>  <name>hbase.regionserver.hlog.reader.impl</name>
>  <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
> </property>
> <property>
>  <name>hbase.regionserver.hlog.writer.impl</name>
>  <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
> </property>
> <property>
>  <name>hbase.regionserver.wal.encryption</name>
>  <value>true</value>
> </property>
> 7. Start the Region Server, and perform the 'table4-0' query 
> hbase(main):003:0> count 'table4-0'
> ERROR: org.apache.hadoop.hbase.NotServingRegionException: Region table4-0,,1406207815456.fc10620a3dcc14e004ab034420f7d332.
is not online on XX-XX-XX-XX,60020,1406209023146
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:2685)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:4119)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3066)
>         at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29497)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2084)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.consumerLoop(SimpleRpcScheduler.java:168)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler.access$000(SimpleRpcScheduler.java:39)
>         at org.apache.hadoop.hbase.ipc.SimpleRpcScheduler$1.run(SimpleRpcScheduler.java:111)
>         at java.lang.Thread.run(Thread.java:662)
> 8. Run the hbase hbck to repair, as below
> ./hbase hbck -details
> .........................
> Summary:
>   table1-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table2-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table3-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table4-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table5-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table6-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table7-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table8-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   table9-0 is okay.
>     Number of regions: 0
>     Deployed on:
>   hbase:meta is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:acl is okay.
>     Number of regions: 0
>     Deployed on:
>   hbase:namespace is okay.
>     Number of regions: 0
>     Deployed on:
> 22 inconsistencies detected.
> Status: INCONSISTENT
> 2014-07-24 19:13:05,532 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
> 2014-07-24 19:13:05,533 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x1475d1611611bcf
> 2014-07-24 19:13:05,533 DEBUG [main] zookeeper.ClientCnxn: Closing client for session:
0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
Reading reply sessionid:0x1475d1611611bcf, packet:: clientPath:null serverPath:null finished:false
header:: 6,-11  replyHeader:: 6,4295102074,0  request:: null response:: null
> 2014-07-24 19:13:05,546 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x1475d1611611bcf
> 2014-07-24 19:13:05,546 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
An exception was thrown while closing send thread for session 0x1475d1611611bcf : Unable to
read additional data from server sessionid 0x1475d1611611bcf, likely server has closed socket
> 2014-07-24 19:13:05,546 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut
down
> 2014-07-24 19:13:05,546 INFO  [main] zookeeper.ZooKeeper: Session: 0x1475d1611611bcf
closed
> shankar1@XX-XX-XX-XX:~/DataSight/hbase/bin>
> 9. Fix the assignments as below
> ./hbase hbck -fixAssignments
> Summary:
>   table1-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table2-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table3-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table4-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table5-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table6-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table7-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table8-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table9-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:meta is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:acl is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:namespace is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:44:55,194 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
> 2014-07-24 19:44:55,194 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x2475d15f7b31b73
> 2014-07-24 19:44:55,194 DEBUG [main] zookeeper.ClientCnxn: Closing client for session:
0x2475d15f7b31b73
> 2014-07-24 19:44:55,203 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
Reading reply sessionid:0x2475d15f7b31b73, packet:: clientPath:null serverPath:null finished:false
header:: 7,-11  replyHeader:: 7,4295102377,0  request:: null response:: null
> 2014-07-24 19:44:55,203 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x2475d15f7b31b73
> 2014-07-24 19:44:55,204 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
An exception was thrown while closing send thread for session 0x2475d15f7b31b73 : Unable to
read additional data from server sessionid 0x2475d15f7b31b73, likely server has closed socket
> 2014-07-24 19:44:55,204 INFO  [main] zookeeper.ZooKeeper: Session: 0x2475d15f7b31b73
closed
> 2014-07-24 19:44:55,204 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut
down
> 10. Fix the assignments as below
> ./hbase hbck -fixAssignments -fixMeta
> Summary:
>   table1-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table2-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table3-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table4-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table5-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table6-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table7-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table8-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   table9-0 is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:meta is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:acl is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
>   hbase:namespace is okay.
>     Number of regions: 1
>     Deployed on:  XX-XX-XX-XX,60020,1406209023146
> 0 inconsistencies detected.
> Status: OK
> 2014-07-24 19:46:16,290 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing master protocol: MasterService
> 2014-07-24 19:46:16,290 INFO  [main] client.HConnectionManager$HConnectionImplementation:
Closing zookeeper sessionid=0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ZooKeeper: Closing session: 0x3475d1605321be9
> 2014-07-24 19:46:16,290 DEBUG [main] zookeeper.ClientCnxn: Closing client for session:
0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
Reading reply sessionid:0x3475d1605321be9, packet:: clientPath:null serverPath:null finished:false
header:: 6,-11  replyHeader:: 6,4295102397,0  request:: null response:: null
> 2014-07-24 19:46:16,300 DEBUG [main] zookeeper.ClientCnxn: Disconnecting client for session:
0x3475d1605321be9
> 2014-07-24 19:46:16,300 DEBUG [main-SendThread(XX-XX-XX-XX:2181)] zookeeper.ClientCnxn:
An exception was thrown while closing send thread for session 0x3475d1605321be9 : Unable to
read additional data from server sessionid 0x3475d1605321be9, likely server has closed socket
> 2014-07-24 19:46:16,300 INFO  [main] zookeeper.ZooKeeper: Session: 0x3475d1605321be9
closed
> 2014-07-24 19:46:16,300 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut
down
> hbase(main):006:0> count 'table4-0'
> 0 row(s) in 0.0200 seconds
> => 0
> hbase(main):007:0> 
> Complete data loss happened,
> WALs, oldWALs & /hbase/data/default/table4-0/ does not have any data



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message