hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Is it possible for HTable.put(Š) to not make it into the table and silently fail?
Date Fri, 22 Aug 2014 19:27:27 GMT
What does CellCounter return?
St.Ack


On Fri, Aug 22, 2014 at 10:17 AM, Magana-zook, Steven Alan <
maganazook1@llnl.gov> wrote:

> Hi Ted,
>
> For example, if the program reports an average speed of 88 records a
> second, and I let the program run for 24 hours, then I would expect the
> RowCounter program to report a number around 88
> (rows/second)*24(hours)*(60min/hour)*60(seconds/min) = 7,603,200 rows.
>
> In actuality, RowCounter returns:
>
> org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
>         ROWS=1356588
>
>
> The vast difference between ~7 million rows and ~1 million rows has me
> confused about what happened to the other rows that should have been in
> the table.
>
> Thanks for your reply,
> Steven
>
>
>
>
>
>
> On 8/22/14 9:53 AM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>
> >bq. the result from the RowCounter program is far fewer records than I
> >expected.
> >
> >Can you give more detailed information about the gap ?
> >
> >Which hbase release are you running ?
> >
> >Cheers
> >
> >
> >On Fri, Aug 22, 2014 at 9:26 AM, Magana-zook, Steven Alan <
> >maganazook1@llnl.gov> wrote:
> >
> >> Hello,
> >>
> >> I have written a program in Java that is supposed to update rows in a
> >> Hbase table that do not yet have a value in a certain column (blob
> >>values
> >> of between 5k and 50k). The program keeps track of how many puts have
> >>been
> >> added to the table along with how long the program is running. These
> >>pieces
> >> of information are used to calculate a speed for data ingestion (records
> >> per second). After running the program for multiple days, and based on
> >>the
> >> average speed reported, the result from the RowCounter program is far
> >>fewer
> >> records than I expected. The essential parts of the code are shown below
> >> (error handling and other potentially not important code omitted) along
> >> with the command I use to see how many rows have been updated.
> >>
> >> Is it possible that the put method call on Htable does not actually put
> >> the record in the database while also not throwing an exception?
> >> Could the output of RowCounter be incorrect?
> >> Am I doing something below that is obviously incorrect?
> >>
> >> Row counter command (does frequently report
> >>OutOfOrderScannerNextException
> >> during execution): hbase org.apache.hadoop.hbase.mapreduce.RowCounter
> >> mytable cf:BLOBDATACOLUMN
> >>
> >> Code that is essentially what I am doing in my program:
> >> ...
> >> Scan scan = new Scan();
> >> scan.setCaching(200);
> >>
> >> HTable targetTable = new HTable(hbaseConfiguration,
> >> Bytes.toBytes(tblTarget));
> >> targetTable.getScanner(scan);
> >>
> >> int batchSize = 10;
> >> Date startTime = new Date();
> >> numFilesSent = 0;
> >>
> >> Result[] rows = resultScanner.next(batchSize);
> >> while (rows != null) {
> >> for (Result row : rows) {
> >> byte[] rowKey = row.getRow();
> >> byte[] byteArrayBlobData = getFileContentsForRow(rowKey);
> >>
> >> Put put = new Put(rowKey);
> >> put.add(COLUMN_FAMILY, BLOB_COLUMN, byteArrayBlobData);
> >> targetTable.put(put); // Auto-flush is on by default
> >> numFilesSent++;
> >> float elapsedSeconds = (new Date().getTime() - startTime.getTime()) /
> >> 1000.0f;
> >> float speed = numFilesSent / elapsedSeconds;
> >> System.out.println("Speed(rows/sec): " + speed); // routinely says from
> >>80
> >> to 200+
> >> }
> >> rows = resultScanner.next(batchSize);
> >> }
> >> ...
> >>
> >> Thanks,
> >> Steven
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message