hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michał Czerwiński <mic...@qubitproducts.com>
Subject Re: hcatalog takes minutes talking to mysql metadata
Date Wed, 28 Aug 2013 10:11:20 GMT
Also what is worth mentioning I have tried running 0.4.0-cdh4.3.0-SNAPSHOT
jars (from
https://repository.cloudera.com/content/groups/public/org/apache/hcatalog/hcatalog-core/)
with exactly the same issue. That could possibly indicate that problem may
be related to the actual hive-metastore component and the way it interacts
with metastore, thoughts?


On 27 August 2013 18:41, Michał Czerwiński <michal@qubitproducts.com> wrote:

> In PIG I am doing query like this:
>
> sdp1 = load 'db1.table1' using org.apache.hcatalog.pig.HCatLoader;
> sdp = FILTER sdp1 BY key1=='value1' AND key2=='value2';
> ll = LIMIT sdp 100;
> dump ll;
>
> and hcatalog starts talking for few minutes to mysql asking for metadata,
> in the meantime after few seconds pig
> does: org.apache.thrift.transport.TTransportException:
> java.net.SocketTimeoutException: Read timed out
>
> Number of partitions I have:
> hive -e 'use db1; show partitions table1' |wc -l
> Time taken: 1.467 seconds
> 37748
>
> When I run the same query on a different environment where I have only
> ~1000 partitions all works fine.
>
> Also problem does not exist on cdh3 and hcatalog-0.4.0.
>
> In hcatalog's logs I can see:
> (note the timestamp, I run the query at 17:10:45,216)
>
> 2013-08-27 17:10:46,275 INFO  DataNucleus.MetaData
> (Log4JLogger.java:info(77)) - Listener found initialisation for persistable
> class org.apache.hadoop.hive.metastore.model.MPartition
>
> 2013-08-27 17:14:23,661 DEBUG metastore.ObjectStore
> (ObjectStore.java:listMPartitionsByFilter(1832)) - Done retrieving all
> objects for listMPartitionsByFilter
>
> 2013-08-27 17:22:32,410 INFO  metastore.ObjectStore
> (ObjectStore.java:getPartitionsByFilter(1699)) - # parts after pruning =
> 37748
>
> After that the hcatalog continues to:
> 2013-08-27 17:30:14,631 DEBUG DataNucleus.Transaction
> (Log4JLogger.java:debug(58)) - Transaction committed in 462221 ms
>
> Please note that I have datanucleus set to DEBUG and that slows things
> down significantly, without that, it still takes around 7 minutes for
> hcatalog to settle.
>
> Also datanucleus settings from the hcatalog's logs:
>
>  datanucleus.autoStartMechanismMode = checked
>  javax.jdo.option.Multithreaded = true
>  datanucleus.identifierFactory = datanucleus
>  datanucleus.transactionIsolation = read
>  datanucleus.validateTables = false
>  javax.jdo.option.ConnectionURL = jdbc:mysql://XXX
>  javax.jdo.option.DetachAllOnCommit = true
>  javax.jdo.option.NonTransactionalRead = true
>  datanucleus.validateConstraints = false
>  javax.jdo.option.ConnectionDriverName = com.mysql.jdbc.Driver
>  javax.jdo.option.ConnectionUserName = hive
>  datanucleus.validateColumns = false
>  datanucleus.cache.level2 = false
>  datanucleus.plugin.pluginRegistryBundleCheck = LOG
>  datanucleus.cache.level2.type = none
>  javax.jdo.PersistenceManagerFactoryClass =
> org.datanucleus.jdo.JDOPersistenceManagerFactory
>  datanucleus.autoCreateSchema = true
>  datanucleus.storeManagerType = rdbms
>  datanucleus.connectionPoolingType = DBCP
>
> This runs on CDH4 4.3.0
> hcatalog version: 0.5.0+9-1.cdh4.3.0.p0.12~precise-cdh4.3.0
>
> Ideas?
>

Mime
View raw message