hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-4710) Turning off HDFS short-circuit checksums unexpectedly slows down Hive
Date Thu, 18 Apr 2013 08:49:16 GMT
Gopal V created HDFS-4710:
-----------------------------

             Summary: Turning off HDFS short-circuit checksums unexpectedly slows down Hive
                 Key: HDFS-4710
                 URL: https://issues.apache.org/jira/browse/HDFS-4710
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.0.4-alpha
         Environment: Centos (EC2) + short-circuit reads on
            Reporter: Gopal V
            Priority: Minor


When short-circuit reads are on, HDFS client slows down when checksums are turned off.

With checksums on, the query takes 45.341 seconds and with it turned off, it takes 56.345
seconds. This is slower than the speeds observed when short-circuiting is turned off.

The issue seems to be that FSDataInputStream.readByte() calls are directly transferred to
the disk fd when the checksums are turned off.

Even though all the columns are integers, the data being read will be read via DataInputStream
which does

{code}
public final int readInt() throws IOException {
        int ch1 = in.read();
        int ch2 = in.read();
        int ch3 = in.read();
        int ch4 = in.read();
{code}

To confirm, an strace of the Yarn container shows

{code}
26690 read(154, "B", 1)                 = 1
26690 read(154, "\250", 1)              = 1
26690 read(154, ".", 1)                 = 1
26690 read(154, "\24", 1)               = 1
{code}

To emulate this without the entirety of Hive code, I have written a simpler test app 

https://github.com/t3rmin4t0r/shortcircuit-reader

The jar will read a file in -bs <n> sized buffers. Running it with 1 byte blocks gives
similar results to the Hive test run.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message