hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Magana-zook, Steven Alan" <maganazo...@llnl.gov>
Subject Is it possible for HTable.put(Š) to not make it into the table and silently fail?
Date Fri, 22 Aug 2014 16:26:29 GMT

I have written a program in Java that is supposed to update rows in a Hbase table that do
not yet have a value in a certain column (blob values of between 5k and 50k). The program
keeps track of how many puts have been added to the table along with how long the program
is running. These pieces of information are used to calculate a speed for data ingestion (records
per second). After running the program for multiple days, and based on the average speed reported,
the result from the RowCounter program is far fewer records than I expected. The essential
parts of the code are shown below (error handling and other potentially not important code
omitted) along with the command I use to see how many rows have been updated.

Is it possible that the put method call on Htable does not actually put the record in the
database while also not throwing an exception?
Could the output of RowCounter be incorrect?
Am I doing something below that is obviously incorrect?

Row counter command (does frequently report OutOfOrderScannerNextException during execution):
hbase org.apache.hadoop.hbase.mapreduce.RowCounter mytable cf:BLOBDATACOLUMN

Code that is essentially what I am doing in my program:
Scan scan = new Scan();

HTable targetTable = new HTable(hbaseConfiguration, Bytes.toBytes(tblTarget));

int batchSize = 10;
Date startTime = new Date();
numFilesSent = 0;

Result[] rows = resultScanner.next(batchSize);
while (rows != null) {
for (Result row : rows) {
byte[] rowKey = row.getRow();
byte[] byteArrayBlobData = getFileContentsForRow(rowKey);

Put put = new Put(rowKey);
put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
targetTable.put(put); // Auto-flush is on by default
float elapsedSeconds = (new Date().getTime() - startTime.getTime()) / 1000.0f;
float speed = numFilesSent / elapsedSeconds;
System.out.println("Speed(rows/sec): " + speed); // routinely says from 80 to 200+
rows = resultScanner.next(batchSize);


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message