hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15908) OperationLog's LogFile writer should have autoFlush turned on
Date Fri, 17 Feb 2017 20:52:42 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872503#comment-15872503
] 

Thejas M Nair commented on HIVE-15908:
--------------------------------------

Are you testing with the master branch ?
HiveStatemet.DEFAULT_FETCH_SIZE has been 1000 for a while. But I am not sure why that would
have an impact.
HIVE-14618 has changes to have shorter timeouts for the getOperationStatus long polling calls,
which has similar impact like what Hue is doing. That could be what you are hitting.
But it looks like it didn't change the beeline sleep timeouts for log fetches. We could have
a step function for that as well.



> OperationLog's LogFile writer should have autoFlush turned on
> -------------------------------------------------------------
>
>                 Key: HIVE-15908
>                 URL: https://issues.apache.org/jira/browse/HIVE-15908
>             Project: Hive
>          Issue Type: Improvement
>          Components: HiveServer2
>    Affects Versions: 0.13.0
>            Reporter: Harsh J
>            Priority: Minor
>         Attachments: HIVE-15908.000.patch
>
>
> The HS2 offers an API to fetch Operation Log results from the maintained OperationLog
file. The reader used inside class OperationLog$LogFile class reads line-by-line on its input
stream, for any lines available from the OS's file input perspective.
> The writer inside the same class uses PrintStream to write to the file in parallel. However,
the PrintStream constructor used sets PrintStream's {{autoFlush}} feature in an OFF state.
This causes the BufferedWriter used by PrintStream to accumulate 8k worth of bytes in memory
as the buffer before flushing the writes to disk, causing a slowness in the logs streamed
back to the client. Every line must be ideally flushed entirely as-its-written, for a smoother
experience.
> I suggest changing the line inside {{OperationLog$LogFile}} that appears as below:
> {code}
> out = new PrintStream(new FileOutputStream(file));
> {code}
> Into:
> {code}
> out = new PrintStream(new FileOutputStream(file), true);
> {code}
> This will cause it to use the described autoFlush feature of PrintStream and make for
a better reader-log-results-streaming experience: https://docs.oracle.com/javase/7/docs/api/java/io/PrintStream.html#PrintStream(java.io.OutputStream,%20boolean)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message