hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Klatt (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files
Date Tue, 27 Apr 2010 16:17:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12861444#action_12861444
] 

Michael Klatt commented on HIVE-1326:
-------------------------------------


The reason the parentFile.delete() call is there is because the File.createTempFile method
actually creates a file. The code, as it currently is, creates a temporary directory to hold
the rowcontainer file and I made the smallest change possible to continue to support this
behavior.

Looking at the code, it appears that the createTempFile mechanism is used several lines down
to actually create the temporary file (within the new temporary directory). I'm not sure why
a temporary directory is created first, but I'll submit a new patch which doesn't try to create
a temporary directory at all.


> RowContainer uses hard-coded '/tmp/' path for temporary files
> -------------------------------------------------------------
>
>                 Key: HIVE-1326
>                 URL: https://issues.apache.org/jira/browse/HIVE-1326
>             Project: Hadoop Hive
>          Issue Type: Bug
>         Environment: Hadoop 0.19.2 with Hive trunk.  We're using FreeBSD 7.0, but that
doesn't seem relevant.
>            Reporter: Michael Klatt
>         Attachments: rowcontainer.patch
>
>
> In our production hadoop environment, the "/tmp/" is actually pretty small, and we encountered
a problem when a query used the RowContainer class and filled up the /tmp/ partition.  I tracked
down the cause to the RowContainer class putting temporary files in the '/tmp/' path instead
of using the configured Hadoop temporary path.  I've attached a patch to fix this.
> Here's the traceback:
> 2010-04-25 12:05:05,120 INFO org.apache.hadoop.hive.ql.exec.persistence.RowContainer:
RowContainer created temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
> 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 10000000 rows: used
memory = 385520312
> 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 11000000 rows: used
memory = 341780472
> 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 12000000 rows: used
memory = 301446768
> 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 13000000 rows: used
memory = 399208768
> 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 14000000 rows: used
memory = 364507216
> 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 15000000 rows: used
memory = 332907280
> 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 16000000 rows: used
memory = 298774096
> 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 17000000 rows: used
memory = 396505408
> 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 18000000 rows: used
memory = 362477288
> 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 19000000 rows: used
memory = 327229744
> 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 20000000 rows: used
memory = 296051904
> 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: java.io.IOException:
No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
> 	at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
> 	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
> 	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
> 	at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
> 	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
> 	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
> 	at org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
> 	at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:118)
> 	at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
> 	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:158)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:260)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:197)
> 	... 22 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message