hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Koifman (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11683) Hive Streaming may overload the metastore
Date Wed, 16 Sep 2015 19:46:45 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eugene Koifman updated HIVE-11683:
----------------------------------
    Component/s: Metastore

> Hive Streaming may overload the metastore
> -----------------------------------------
>
>                 Key: HIVE-11683
>                 URL: https://issues.apache.org/jira/browse/HIVE-11683
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Hive, Metastore, Transactions
>    Affects Versions: 1.0.0
>            Reporter: Eugene Koifman
>            Assignee: Roshan Naik
>
> HiveEndPoint represents a way to write to a specific partition transactionally.
> Each HiveEndPoint creates TransactionBatch(es) and commits transactions.
> Suppose you have 10 instances of Storm Hive bolt using Streaming API.
> Each instance will create HiveEndPoints on demand when it sees an event for particular
partition value.
> If events are uniformly distributed wrt partition values and the table has 1000 partitions
(for example it's partitioned by CustomerId), each of 10 bolt instances may create 1000 HiveEndPoints
and thus > 10,000 (actually 10K * num_txn_per_batch) concurrent transactions.
> This creates huge amount of Metastore traffic.
> HIVE-11672 is investigating how some sort of "shuffle" phase can be added route events
for a particular bucket to the same bolt instance.
> The same idea should explored to route events based on partition value.
> cc [~alangates],[~sriharsha],[~rbains]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message