hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ChiaPing Tsai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13492) The estimation of result size may be different between ClientScanner and RSRpcServices
Date Fri, 17 Apr 2015 11:29:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499665#comment-14499665
] 

ChiaPing Tsai commented on HBASE-13492:
---------------------------------------

I scan the data from HFile and Table for estimating heap size of cell by using CellUtil.estimatedHeapSizeOf(cell).
The result is as follows:

187 bytes by using HFileScanner to scan HFile 
128 bytes by using ClientScanner to scan Table
(rowLength = 13, familytLength = 1, qualifierLaength = 21, valueLangth = 8, tagLength = 0)

So the more cells in single row (wide table), the smaller result size and the larger caching
are more likely to introduce this trouble

> The estimation of result size may be different between ClientScanner and RSRpcServices
> --------------------------------------------------------------------------------------
>
>                 Key: HBASE-13492
>                 URL: https://issues.apache.org/jira/browse/HBASE-13492
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ChiaPing Tsai
>
> The ClientScanner try to find next scanner if remainingResultSize and countdown are bigger
than zero. 
> The remainingResultSize is calculated by CellUtil.estimatedHeapSizeOf(cell)
> {code:title=Bar.java|borderStyle=solid}
> @Override
>     public Result next() throws IOException {
>         ....
>         do {
>           ...
>           if (values != null && values.length > 0) {
>             for (Result rs : values) {
>               cache.add(rs);
>               // We don't make Iterator here
>               for (Cell cell : rs.rawCells()) {
>                 remainingResultSize -= CellUtil.estimatedHeapSizeOf(cell);
>               }
>               countdown--;
>               this.lastResult = rs;
>             }
>           }
>        }while (remainingResultSize > 0 && countdown > 0 &&
>             possiblyNextScanner(countdown, values == null));
>     }
> {code}
> RSRpcServices also use CellUtil.estimatedHeapSizeOf(cell) to calculate the result size
> {code:title=Bar.java|borderStyle=solid}
>   @Override
>   public ScanResponse scan(final RpcController controller, final ScanRequest request)
>   throws ServiceException {
>   ...
>             if (!results.isEmpty()) {
>               for (Result r : results) {
>                 for (Cell cell : r.rawCells()) {
>                   currentScanResultSize += CellUtil.estimatedHeapSizeOf(cell);
>                   totalCellSize += CellUtil.estimatedSerializedSizeOf(cell);
>                 }
>               }
>             }
>   ...
>  }
> {code}
> If we encode the data block, like FastDiff, the cell read from HFile is implemented by
ClonedSeekerState. And it's heap size is bigger than KeyValue. 
> So the RSRpcServices return the results to client with insufficient caching due to result
size reaches the limit. ClientScanner consider that current region has no more data, and remainingResultSize
and countdown are both bigger than zero. In fact, the remainingResultSize should be smaller
than zero, and current region still have more data for reading.
> Does result size calculated by RSRpcServices should be return to client for checking
the remainingResultSize ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message