hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shriram (Jira)" <j...@apache.org>
Subject [jira] [Created] (HBASE-26183) Size of the Result object while querying huge data from HBASE table
Date Mon, 09 Aug 2021 06:54:00 GMT
shriram created HBASE-26183:
-------------------------------

             Summary: Size of the Result object while querying huge data from HBASE table
                 Key: HBASE-26183
                 URL: https://issues.apache.org/jira/browse/HBASE-26183
             Project: HBase
          Issue Type: New Feature
          Components: scan
    Affects Versions: 1.1.13
            Reporter: shriram


 
I am trying to query hbase table with rowkeys. We have the following structure
 * index table which has rowkeys of the actual table
 * actual table which contains json data in compressed format.

When i am trying to query hbase, i have to scan first index table for rowkeys using scan with
some filters which will results to byte array.(row keys). Once we obtained rowkeys, we are
invoking listofGets() in Table object. Once obtained we are iterating the object and prepare
a list which contains compressed json objects. Here we are not sure about the size and number
of the objects. In case of number of objects is huge we may result in OOM. Do we have any
options to return Iterator or buffering the results so that we can avoid OOM.
 {{for (byte[] rowkey : indexTableOutput)
{    Get get = new Get(rowkey).addFamily(Bytes.toBytes(columnFamilty)).setMaxVersions(MAX_VERSIONS);
    listOfget.add(get);
}}}
The above piece of code which is used to retrieve the keys from index table.
 {{TableName tableName = TableName.valueOf("table1");Table tableObj = conn.getTable(tableName);
Result[] results = tableObj.get(listOfget);}}
>From the above piece of code we have few queries. Any help would be appreciated.
 * If we have a huge number of data, Result[] will contain all the results?
 * How to return a iterator kind of object so that we can leave it to consumer because keeping
all the data and doing processing will result in OOM
 * Any other options to return a limited data so that consumer do processing and continue

I could find a resultscanner is returning for scan objects. But couldn't find any other options
for list of Get's. Here we know the exact keys from index table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message