hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omkar Joshi <Omkar.Jo...@lntinfotech.com>
Subject HBase : get(...) vs scan and in-memory table
Date Wed, 11 Sep 2013 09:20:18 GMT
I'm executing MR over HBase.
The business logic in the reducer heavily accesses two tables, say T1(40k rows) and T2(90k
rows). Currently, I'm executing the following steps :
1.In the constructor of the reducer class, doing something like this :
HBaseCRUD hbaseCRUD = new HBaseCRUD();

HTableInterface t1= hbaseCRUD.getTable("T1",
                            "CF1", null, "C1", "C2");
HTableInterface t2= hbaseCRUD.getTable("T2",
                            "CF1", null, "C1", "C2");
In the reduce(...)
 String lowercase = ....;

/* Start : HBase code */
* TRY using get(...) on the table rather than a
* Scan!
Scan scan = new Scan();

/*scan will return a single row*/
ResultScanner resultScanner = t1.getScanner(scan);

for (Result result : resultScanner) {
/*business logic*/
Though not sure if the above code is sensible in first place, I have a question - would a
get(...) provide any performance benefit over the scan?
Get get = new Get(lowercase.getBytes());
Result getResult = t1.get(get);
Since T1 and T2 will be read-only(mostly), I think if kept in-memory, the performance will
improve. As per HBase doc., I will have to re-create the tables T1 and T2. Please verify the
correctness of my understanding :
public void createTables(String tableName, boolean readOnly,
            boolean blockCacheEnabled, boolean inMemory,
            String... columnFamilyNames) throws IOException {
        // TODO Auto-generated method stub

        HTableDescriptor tableDesc = new HTableDescriptor(tableName);
        /* not sure !!! */

        HColumnDescriptor columnFamily = null;

        if (!(columnFamilyNames == null || columnFamilyNames.length == 0)) {

            for (String columnFamilyName : columnFamilyNames) {

                columnFamily = new HColumnDescriptor(columnFamilyName);
                 * Start : Do these steps ensure that the column
                 * family(actually, the column data) is in-memory???
                 * End : Do these steps ensure that the column family(actually,
                 * the column data) is in-memory???


Once done :

 1.  How to verify that the columns are in-memory and accessed from there and not the disk?
 2.  Is the from-memory or from-disk read transparent to the client? In simple words, do I
need to change the HTable access code in my reducer class? If yes, what are the changes?

Omkar Joshi

The contents of this e-mail and any attachment(s) may contain confidential or privileged information
for the intended recipient(s). Unintended recipients are prohibited from taking action on
the basis of information in this e-mail and using or disseminating the information, and must
notify the sender and delete it from their system. L&T Infotech will not accept responsibility
or liability for the accuracy or completeness of, or the presence of any virus or disabling
code in this e-mail"

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message