hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ning Zhang <nzh...@facebook.com>
Subject Re: Analyze table compute statistics errors and OOM
Date Tue, 05 Oct 2010 16:35:07 GMT

On Oct 5, 2010, at 4:38 AM, Terje Marthinussen wrote:

> Just tested analyze table with a trunk build (from yesterday, oct 4th).
> 
> tried various variations (with or without partitions) of it, but regardless
> of what I try, I either get:
> --
> analyze table normalized  compute
> statistics;
> 
> FAILED: Error in semantic analysis: Table is partitioned and partition
> specification is needed
> --
> Fair enough if it is not supported, but specifying no partitions seems to be
> supported according to the docs at
> http://wiki.apache.org/hadoop/Hive/StatsDev ?
> 
Sorry the design spec was out-dated. I've updated the wiki to reflect the syntax change.

> --
> analyze table normalized  partition(intdate) compute statistics;
> FAILED: Error in semantic analysis: line 1:36 Dynamic partition cannot be
> the parent of a static partition intdate
> --
If you have multiple partition columns, the order of them are important since they reflect
the hierarchical DFS directory structure. So you have to specify the parent partition first
and then sub-partitions. The partition spec has to be able to be mapped a *one* HDFS directory.
 So partition (parent='val', subpart) is allowed but not partition (parent, subpart='val')
or without giving parent in the spec. 

> ok, I may understand this (or maybe not :)) may be good to add some notes
> about it on the wiki though
> 
This may be a bug when the partition spec includes non-partition columns. I'll verify and
file a JIRA for that. So in the partition spec you can only include partition columns, in
the order they appear in the CREATE TABLE DDL. 

> Then the OOM:
> analyze table normalized
> partition(intdate,country,logtype,service,hostname,filedate,filedate_ext)
> compute statistics;
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>    at java.util.zip.InflaterInputStream.<init>(InflaterInputStream.java:71)
>    at java.util.zip.ZipFile$1.<init>(ZipFile.java:212)
>    at java.util.zip.ZipFile.getInputStream(ZipFile.java:212)
>    at java.util.zip.ZipFile.getInputStream(ZipFile.java:180)
>    at java.util.jar.JarFile.getManifestFromReference(JarFile.java:167)
>    at java.util.jar.JarFile.getManifest(JarFile.java:148)
>    at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:696)
>    at java.net.URLClassLoader.defineClass(URLClassLoader.java:228)
>    at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>    at
> org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:262)
>    at
> org.datanucleus.jdo.state.JDOStateManagerImpl.isLoaded(JDOStateManagerImpl.java:2020)
>    at
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.jdoGetsortCols(MStorageDescriptor.java)
>    at
> org.apache.hadoop.hive.metastore.model.MStorageDescriptor.getSortCols(MStorageDescriptor.java:206)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:759)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:859)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToParts(ObjectStore.java:896)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:886)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1333)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1330)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:234)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:1330)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_ps(HiveMetaStore.java:1760)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitions(HiveMetaStoreClient.java:515)
>    at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:1267)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.setupStats(SemanticAnalyzer.java:5793)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genTablePlan(SemanticAnalyzer.java:5603)
> 
> 
> the actual stack is different from each execution of analyze.
> 
> Another version:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>    at java.util.Arrays.copyOf(Arrays.java:2882)
>    at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>    at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:597)
>    at java.lang.StringBuilder.append(StringBuilder.java:212)
>    at
> org.datanucleus.JDOClassLoaderResolver.newCacheKey(JDOClassLoaderResolver.java:382)
>    at
> org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:173)
>    at
> org.datanucleus.JDOClassLoaderResolver.classForName(JDOClassLoaderResolver.java:412)
>    at
> org.datanucleus.store.mapped.mapping.EmbeddedMapping.getJavaType(EmbeddedMapping.java:574)
>    at
> org.datanucleus.store.mapped.mapping.EmbeddedMapping.getObject(EmbeddedMapping.java:455)
>    at
> org.datanucleus.store.mapped.scostore.ListStoreIterator.<init>(ListStoreIterator.java:94)
>    at
> org.datanucleus.store.rdbms.scostore.RDBMSListStoreIterator.<init>(RDBMSListStoreIterator.java:41)
>    at
> org.datanucleus.store.rdbms.scostore.RDBMSJoinListStore.listIterator(RDBMSJoinListStore.java:158)
>    at
> org.datanucleus.store.mapped.scostore.AbstractListStore.listIterator(AbstractListStore.java:84)
>    at
> org.datanucleus.store.mapped.scostore.AbstractListStore.iterator(AbstractListStore.java:74)
>    at
> org.datanucleus.store.types.sco.backed.List.loadFromStore(List.java:241)
>    at org.datanucleus.store.types.sco.backed.List.iterator(List.java:494)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToFieldSchemas(ObjectStore.java:706)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToStorageDescriptor(ObjectStore.java:759)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToPart(ObjectStore.java:859)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.convertToParts(ObjectStore.java:896)
>    at
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:886)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1333)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler$21.run(HiveMetaStore.java:1330)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.executeWithRetry(HiveMetaStore.java:234)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions(HiveMetaStore.java:1330)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_ps(HiveMetaStore.java:1760)
>    at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.listPartitions(HiveMetaStoreClient.java:515)
>    at org.apache.hadoop.hive.ql.metadata.Hive.getPartitions(Hive.java:1267)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.setupStats(SemanticAnalyzer.java:5793)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genTablePlan(SemanticAnalyzer.java:5603)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:5834)
>    at
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6432)
> 
> 
> Makes no difference if I limit this to a single partition or not or any
> other variation of the partition specification.
> 
> It is a sequence file based table, dynamic and static partitions as well as
> compression.
> 
> Best regards,
> Terje


Mime
View raw message