hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Holman Lan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-13770) Improve Thrift result set streaming when serializing thrift ResultSets in tasks
Date Mon, 16 May 2016 22:21:12 GMT
Holman Lan created HIVE-13770:
---------------------------------

             Summary: Improve Thrift result set streaming when serializing thrift ResultSets
in tasks
                 Key: HIVE-13770
                 URL: https://issues.apache.org/jira/browse/HIVE-13770
             Project: Hive
          Issue Type: Improvement
            Reporter: Holman Lan


When serializing the Thrift result set in final task, i.e. the hive.server2.thrift.resultset.serialize.in.tasks
property is set to true, HS2 does not start sending the results until the entire result set
has been written to HDFS.

This is not efficient and we should find a way for HS2 to start sending the results as soon
as a block of result becomes available. The advantage for this is two folds. One, the client
can start consuming the results much sooner. Two, we can start reclaiming the storage space
in HDFS used by a particular result set block as soon as the result set block has been successfully
sent to the client.

It's worth checking if this is also the case when not serializing the Thrift result set in
final task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message