hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabi Kazav (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4730) Join on more than 2^31 records on single reducer failed (wrong results)
Date Sat, 15 Jun 2013 13:27:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684193#comment-13684193
] 

Gabi Kazav commented on HIVE-4730:
----------------------------------

ignore the script error line : /hive/hive/bin/hive: line 72: [: /hive/hive/lib/hive-exec-0.10.0.jar:
binary operator expected
i had 2 jar files (old and new), fixed it, but i still have the exception:

hive> show tables;
FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.hive.metastore.api.MetaException
javax.jdo.JDODataStoreException: An exception was thrown while adding/validating class(es)
: Constraint 'COLUMNS_PK' already exists in Schema 'APP'.
java.sql.SQLException: Constraint 'COLUMNS_PK' already exists in Schema 'APP'.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
        at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
        at org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
        at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:730)
        at org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatementList(AbstractTable.java:681)
        at org.datanucleus.store.rdbms.table.AbstractTable.create(AbstractTable.java:402)
        at org.datanucleus.store.rdbms.table.AbstractTable.exists(AbstractTable.java:458)
        at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:2689)
        at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:2503)
        at org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2148)
        at org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:113)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:986)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:952)
        at org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:919)
        at org.datanucleus.store.mapped.MappedStoreManager.getDatastoreClass(MappedStoreManager.java:356)
        at org.datanucleus.store.rdbms.query.legacy.ExtentHelper.getExtent(ExtentHelper.java:48)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.getExtent(RDBMSStoreManager.java:1332)
        at org.datanucleus.ObjectManagerImpl.getExtent(ObjectManagerImpl.java:4149)
        at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compileCandidates(JDOQLQueryCompiler.java:411)
        at org.datanucleus.store.rdbms.query.legacy.QueryCompiler.executionCompile(QueryCompiler.java:312)
        at org.datanucleus.store.rdbms.query.legacy.JDOQLQueryCompiler.compile(JDOQLQueryCompiler.java:225)
        at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.compileInternal(JDOQLQuery.java:175)
        at org.datanucleus.store.query.Query.executeQuery(Query.java:1628)
        at org.datanucleus.store.rdbms.query.legacy.JDOQLQuery.executeQuery(JDOQLQuery.java:245)
        at org.datanucleus.store.query.Query.executeWithArray(Query.java:1499)
        at org.datanucleus.jdo.JDOQuery.execute(JDOQuery.java:243)
        at org.apache.hadoop.hive.metastore.ObjectStore.getTables(ObjectStore.java:781)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
        at $Proxy6.getTables(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_tables(HiveMetaStore.java:2327)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
        at $Proxy7.get_tables(Unknown Source)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTables(HiveMetaStoreClient.java:817)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:74)
        at $Proxy8.getTables(Unknown Source)
        at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:1009)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:983)
        at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:2215)
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:334)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:138)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1336)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1122)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:935)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:259)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:412)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:755)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:613)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.sql.SQLException: Constraint 'COLUMNS_PK' already exists in Schema 'APP'.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
Source)
        ... 74 more
Caused by: ERROR X0Y32: Constraint 'COLUMNS_PK' already exists in Schema 'APP'.
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.duplicateDescriptorException(Unknown
Source)
        at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.addDescriptor(Unknown Source)
        at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.addDescriptor(Unknown Source)
        at org.apache.derby.impl.sql.catalog.DataDictionaryImpl.addConstraintDescriptor(Unknown
Source)
        at org.apache.derby.impl.sql.execute.CreateConstraintConstantAction.executeConstantAction(Unknown
Source)
        at org.apache.derby.impl.sql.execute.AlterTableConstantAction.executeConstantAction(Unknown
Source)
        at org.apache.derby.impl.sql.execute.MiscResultSet.open(Unknown Source)
        at org.apache.derby.impl.sql.GenericPreparedStatement.execute(Unknown Source)
        ... 68 more

NestedThrowables:
java.sql.SQLException: Constraint 'COLUMNS_PK' already exists in Schema 'APP'.)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

                
> Join on more than 2^31 records on single reducer failed (wrong results)
> -----------------------------------------------------------------------
>
>                 Key: HIVE-4730
>                 URL: https://issues.apache.org/jira/browse/HIVE-4730
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.7.1, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.11.0
>            Reporter: Gabi Kazav
>            Assignee: Navis
>            Priority: Critical
>         Attachments: HIVE-4730.D11283.1.patch
>
>
> join on more than 2^31 rows leads to wrong results. for example:
> Create table small_table (p1 string) ROW FORMAT DELIMITED    LINES TERMINATED BY  '\n';
> Create table big_table (p1 string) ROW FORMAT DELIMITED    LINES TERMINATED BY  '\n';
> Loading 1 row to small_table (the value 1).
> Loading 2149580800 rows to big_table with the same value (1 on this case).
> create table output as select a.p1 from  big_table a join small_table b on (a.p1=b.p1);
> select count(*) from output ; will return only 1 row...
> the reducer syslog:
> ...
> 2013-06-13 17:20:59,254 INFO ExecReducer: ExecReducer: processing 2147000000 rows: used
memory = 32925960
> 2013-06-13 17:21:00,745 INFO ExecReducer: ExecReducer: processing 2148000000 rows: used
memory = 12815184
> 2013-06-13 17:21:02,205 INFO ExecReducer: ExecReducer: processing 2149000000 rows: used
memory = 26684552   <-- looks like wrong value..
> ...
> 2013-06-13 17:21:04,062 INFO ExecReducer: ExecReducer: processed 2149580801 rows: used
memory = 17715896
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 finished.
closing...
> 2013-06-13 17:21:04,062 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 forwarded
1 rows
> 2013-06-13 17:21:05,791 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 finished.
closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarded
1 rows
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 finished.
closing...
> 2013-06-13 17:21:05,792 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 6 forwarded
0 rows
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: TABLE_ID_1_ROWCOUNT:1
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 5 Close done
> 2013-06-13 17:21:05,946 INFO org.apache.hadoop.hive.ql.exec.JoinOperator: 4 Close done

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message