hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5401) PerformanceEvaluation generates 10x the number of expected mappers
Date Wed, 21 Dec 2016 01:26:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765781#comment-15765781
] 

Yi Liang commented on HBASE-5401:
---------------------------------

I have used this command and also encounter this issue, for example:
when I run hbase org.apache.hadoop.hbase.PerformanceEvaluation  --rows=m randomWrite n

if we use --nomapred, this will create n threads(clients) and each thread write m/n rows into
hbase
if we use default mapreduce, this will create 10*n mappers, and each mapper will put m/(n*10)
rows into hbase.
   I think the static int {code}static int TASKS_PER_CLIENT = 10{code} here is unnecessary,
   1. If user want more mappers they can just change client numbers, however, if *10 is here,
user can only create 10, 20, 30... mappers for different number of client, this is not flexible.
 
   2. The TASKS_PER_CLIENT = 10 is hardcoded and invisible to user, sometime may be user just
want 5 mappers for their job, and current code will create 50 mappers.
   3. when <nclients> = 5, it means 5 threads and 50 mappers, which is a little inconsistent,
PS. I do not mean mapper is same as thread´╝î but it is better to keep them same.  

What do you guys think?

> PerformanceEvaluation generates 10x the number of expected mappers
> ------------------------------------------------------------------
>
>                 Key: HBASE-5401
>                 URL: https://issues.apache.org/jira/browse/HBASE-5401
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.0.0
>            Reporter: Oliver Meyn
>             Fix For: 2.0.0
>
>         Attachments: HBASE-5401-V1.patch
>
>
> With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation randomWrite
10' there are 100 mappers spawned, rather than the expected 10.  The culprit appears to be
the outer loop in writeInputFile which sets up 10 splits for every "asked-for client".  I
think the fix is just to remove that outer loop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message