hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2434) MapFile.get on HDFS in TRUNK is WAY!!! slower than 0.15.x
Date Sat, 15 Dec 2007 20:12:43 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HADOOP-2434:
--------------------------

    Attachment: TestMapFile.java

Another version of the attachment.  Just does 100k instead of 1M 'cos 1M takes too long (outputs
state every 1000 reads folks don't think the program is dead).  Here is output from my cluster
using first branch-0.15:

{code}
[stack@aa0-000-12 branch-0.15]$ nohup ./bin/hadoop org.apache.hadoop.io.TestMapFile > testmapfile.timings
< /dev/null &
[1] 19284
[stack@aa0-000-12 branch-0.15]$ tail -f testmapfile.timings 
.07/12/15 20:02:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
07/12/15 20:02:17 INFO io.TestMapFile: Writing 100000 rows to testMapFileRandomAccess
07/12/15 20:02:38 INFO io.TestMapFile: Writing 100000 records took 20975ms
07/12/15 20:02:38 INFO io.TestMapFile: Reading 100000 random rows
07/12/15 20:02:40 INFO io.TestMapFile: Read 1000
07/12/15 20:02:41 INFO io.TestMapFile: Read 2000
07/12/15 20:02:43 INFO io.TestMapFile: Read 3000
07/12/15 20:02:44 INFO io.TestMapFile: Read 4000
07/12/15 20:02:45 INFO io.TestMapFile: Read 5000
07/12/15 20:02:47 INFO io.TestMapFile: Read 6000
07/12/15 20:02:48 INFO io.TestMapFile: Read 7000
07/12/15 20:02:49 INFO io.TestMapFile: Read 8000
07/12/15 20:02:50 INFO io.TestMapFile: Read 9000
07/12/15 20:02:52 INFO io.TestMapFile: Read 10000
...
{code}

Now using TRUNK:
{code}
[stack@aa0-000-12 hadoop-trunk]$ nohup ./bin/hadoop org.apache.hadoop.io.TestMapFile >
testmapfile.timings < /dev/null &
[stack@aa0-000-12 hadoop-trunk]$ tail -f testmapfile.timings
.07/12/15 19:44:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
07/12/15 19:44:27 INFO io.TestMapFile: Writing 100000 rows to testMapFileRandomAccess
07/12/15 19:44:48 INFO io.TestMapFile: Writing 100000 records took 20326ms
07/12/15 19:44:48 INFO io.TestMapFile: Reading 100000 random rows
07/12/15 19:49:17 INFO io.TestMapFile: Read 1000
07/12/15 19:53:40 INFO io.TestMapFile: Read 2000
07/12/15 19:57:47 INFO io.TestMapFile: Read 3000
...
{code}

TRUNK is taking about 13 minutes to read 3k.   Branch 0.15 does same random reads in ~5 seconds.
 Thats > 2 orders of magnitude slower.

I retried the johano position cache patch to be sure and that doesn' t fix the problem.

Pauls' suggestion of our reading a byte a time would seem to jibe with the timings seen.

> MapFile.get on HDFS in TRUNK is WAY!!! slower than 0.15.x
> ---------------------------------------------------------
>
>                 Key: HADOOP-2434
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2434
>             Project: Hadoop
>          Issue Type: Bug
>          Components: io
>            Reporter: stack
>         Attachments: TestMapFile.java, TestMapFile.java
>
>
> Stall happens down in SequenceFile in the first call to getPos inside readRecordLength.
 I tried the johano patch from HADOOP-2172 that restores the positional cache but that didn't
seem to be the issue here.
> Here is data to support my assertion.
> I wrote a little program to make a MapFile of 1M records.  I then did 1M random reads
from same file.  Below are timings from a 0.15.0 and TRUNK as of this afternoon run.
> 0.15.x branch:
> {code}
> [stack@aa0-000-12 branch-0.15]$ ./bin/hadoop org.apache.hadoop.io.TestMapFile
> .07/12/15 01:29:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> 07/12/15 01:29:02 INFO io.TestMapFile: Writing 1000000 rows to testMapFileRandomAccess
> 07/12/15 01:32:04 INFO io.TestMapFile: Writing 1000000 records took 182009ms
> 07/12/15 01:32:04 INFO io.TestMapFile: Reading 1000000 random rows
> 07/12/15 01:48:02 INFO io.TestMapFile: Reading 1000000 random records took 958243ms
> Time: 1,140.652
> OK (1 test)
> {code}
> For the below test using TRUNK r604352, I amended the test so it output a log message
every 100k reads:
> {code}
> [stack@aa0-000-12 hadoop-trunk]$ ./bin/hadoop org.apache.hadoop.io.TestMapFile
> .07/12/15 01:56:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> 07/12/15 01:56:34 INFO io.TestMapFile: Writing 1000000 rows to testMapFileRandomAccess
> 07/12/15 01:59:38 INFO io.TestMapFile: Writing 1000000 records took 183986ms
> 07/12/15 01:59:38 INFO io.TestMapFile: Reading 1000000 random rows
> .........
> {code}
> After 20 minutes it still hadn't printed out the 'read 100k messages' (I had to leave
-- will fill in final figures later)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message