hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhupendra Kumar Jain (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-14520) Optimnize the number of calls for tags creation in bulk load
Date Wed, 30 Sep 2015 07:24:04 GMT
Bhupendra Kumar Jain created HBASE-14520:
--------------------------------------------

             Summary: Optimnize the number of calls for tags creation in bulk load
                 Key: HBASE-14520
                 URL: https://issues.apache.org/jira/browse/HBASE-14520
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 2.0.0
            Reporter: Bhupendra Kumar Jain
            Assignee: Bhupendra Kumar Jain


At present, ttl and Visibility expr is one per tsv line i.e. the values and the tags remain
same for all the columns present in that line. As per the code, List of tags are created for
each cell, Instead of creating new tags for each cell, tags created once for the line can
be reused by other cells.  

Assume 1Million rows and 1000 columns. Currently tags creation will happen for 1M * 1000 times.
If reuse the tags, the tags creation can reduce to 1M times. (i.e. one per tsv line). 

This is applicable in both TsvImporterMapper and TextSortReducer logic. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message