phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karan Mehta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4489) HBase Connection leak in Phoenix MR Jobs
Date Mon, 08 Jan 2018 04:32:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16315655#comment-16315655
] 

Karan Mehta commented on PHOENIX-4489:
--------------------------------------

[~vincentpoon] 
Technically, as we discussed it shouldn't be a problem since we go out of scope real quick
after the generateSplits() method is executed and the connection object should be garbage
collected. However, if you checkout PHOENIX-4503, the client is trying to read multiple spark
dataframes inside a loop (almost 50 times). Such a code will get executed fast and will result
in lots of HConnections and ZKConnections getting created in a short span of time and I suspect
that even though GC gets triggered to clear them, it might actually take some time before
this to happen (until JVM feels the need). This can cause issues with the application. I see
many issues filed in this regard. 

Also, since the connections are not instantiated via factory, it is difficult to catch their
quantity and limit the resources by having a custom implementation. What do you think?

FYI, [~aertoria]

> HBase Connection leak in Phoenix MR Jobs
> ----------------------------------------
>
>                 Key: PHOENIX-4489
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4489
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Karan Mehta
>            Assignee: Karan Mehta
>         Attachments: PHOENIX-4489.001.patch
>
>
> Phoenix MR jobs uses a custom class {{PhoenixInputFormat}} to determine the splits and
the parallelism of the work. The class directly opens up a HBase connection, which is not
closed after the usage. Independently running MR jobs should not have any concern, however
jobs that run through Phoenix-Spark can cause leak issues if this is left unclosed (since
those jobs run as a part of same JVM). 
> Apart from this, the connection should be instantiated with {{HBaseFactoryProvider.getHConnectionFactory()}}
instead of the default one. It can be useful if a separate client is trying to run jobs and
wants to provide a custom implementation of {{HConnection}}. 
> [~jmahonin] Any ideas?
> [~jamestaylor] [~vincentpoon] Any concerns around this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message