phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Kishore Valeti (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-2292) Improve performance of direct HBase API index build
Date Thu, 15 Oct 2015 21:12:05 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959625#comment-14959625
] 

Ravi Kishore Valeti commented on PHOENIX-2292:
----------------------------------------------

Setup: 8 node cluster
Data Table: 1B rows Wide table (20 Columns)
Node Manager Max Memory Configured : 10 GB
Map Red max memory Configured: 2 GB
Input Splits: 128
Parallel Mappers run: 39

Run had completed in 12 hrs with 3.5 hrs as avg map time.

However, with batching on no.of rows and not on size, there is a chance of flooding Region
Servers with too many write requests (from 39 parallel mappers) which will lead to Region
server throwing RegionTooBusyException and the clients re-trying after 10 seconds (idle time!)
with further failed re-tries with a backoff time. In which case, Job execution will get much
delayed than usual!

> Improve performance of direct HBase API index build
> ---------------------------------------------------
>
>                 Key: PHOENIX-2292
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2292
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Ravi Kishore Valeti
>         Attachments: PHOENIX-2292.patch
>
>
> The direct HBase API index build _should_ be almost as fast as the native Phoenix index
build, but we're seeing a big difference:
> |  | 100M narrow table (min) | 1B narrow table (min) | 1B wide table (min)
> | Non MR | 10 | 76 | 511
> | HFile MR | 17 | 161 | 1,375
> | Direct HBase APIs  | 24 | 84 | 1,450
> These results are for a 8 node cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message