hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9333) Move parquet serialize implementation to DataWritableWriter to improve write speeds
Date Tue, 27 Jan 2015 20:44:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294145#comment-14294145
] 

Hive QA commented on HIVE-9333:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12694817/HIVE-9333.3.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 7396 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udaf_histogram_numeric
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2535/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2535/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2535/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12694817 - PreCommit-HIVE-TRUNK-Build

> Move parquet serialize implementation to DataWritableWriter to improve write speeds
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-9333
>                 URL: https://issues.apache.org/jira/browse/HIVE-9333
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-9333.2.patch, HIVE-9333.3.patch
>
>
> The serialize process on ParquetHiveSerDe parses a Hive object
> to a Writable object by looping through all the Hive object children,
> and creating new Writables objects per child. These final writables
> objects are passed in to the Parquet writing function, and parsed again
> on the DataWritableWriter class by looping through the ArrayWritable
> object. These two loops (ParquetHiveSerDe.serialize() and DataWritableWriter.write()
 may be reduced to use just one loop into the DataWritableWriter.write() method in order to
increment the writing process speed for Hive parquet.
> In order to achieve this, we can wrap the Hive object and object inspector
> on ParquetHiveSerDe.serialize() method into an object that implements the Writable object
and thus avoid the loop that serialize() does, and leave the loop parser to the DataWritableWriter.write()
method. We can see how ORC does this with the OrcSerde.OrcSerdeRow class.
> Writable objects are organized differently on any kind of storage formats, so I don't
think it is necessary to create and keep the writable objects in the serialize() method as
they won't be used until the writing process starts (DataWritableWriter.write()).
> This performance issue was found using microbenchmark tests from HIVE-8121.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message