hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14520) Optimize the number of calls for tags creation in bulk load
Date Tue, 06 Oct 2015 13:44:26 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Yu updated HBASE-14520:
---------------------------
      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)

Thanks for the patch, Bhupendra

Thanks for the review, Anoop.

> Optimize the number of calls for tags creation in bulk load
> -----------------------------------------------------------
>
>                 Key: HBASE-14520
>                 URL: https://issues.apache.org/jira/browse/HBASE-14520
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.0.0
>            Reporter: Bhupendra Kumar Jain
>            Assignee: Bhupendra Kumar Jain
>             Fix For: 2.0.0
>
>         Attachments: HBASE-14520.patch
>
>
> At present, ttl and Visibility expr is one per tsv line i.e. the values and the tags
remain same for all the columns present in that line. As per the code, List of tags are created
for each cell, Instead of creating new tags for each cell, tags created once for the line
can be reused by other cells.  
> Assume 1Million rows and 1000 columns. Currently tags creation will happen for 1M * 1000
times. If reuse the tags, the tags creation can reduce to 1M times. (i.e. one per tsv line).

> This is applicable in both TsvImporterMapper and TextSortReducer logic. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message