hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3914) checksumOk implementation in DFSClient can break applications
Date Wed, 01 Oct 2008 13:59:46 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Devaraj Das updated HADOOP-3914:
--------------------------------

    Fix Version/s:     (was: 0.18.2)
                       (was: 0.19.0)

> checksumOk implementation in DFSClient can break applications
> -------------------------------------------------------------
>
>                 Key: HADOOP-3914
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3914
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.1
>            Reporter: Christian Kunz
>            Assignee: Christian Kunz
>         Attachments: patch.HADOOP-3914
>
>
> One of our non-map-reduce applications (written in C and using libhdfs to access dfs)
stopped working after switch from 0.16 to 0.17.
> The problem was finally traced down to failures in checksumOk.
> I would assume, the purpose of checksumOk is for a DfsClient to indicate to a sending
Datanode that the checksum of the received block is okay. This must be useful in the replication
pipeline.
> How checksumOk is implemented is that any IOException is caught and ignored, probably
because it is not essential for the client that the message is successful.
> But it proved fatal for our application because this application links in a 3rd-party
library which seems to catch socket exceptions before libhdfs.
> Why was there an Exception? In our case the application reads a file into the local buffer
of the DFSInputStream large enough to hold all data, the application reads to the end  and
the checksumOK is sent successfully at that time. But then the application does some other
stuff and comes back to re-read the file (still open). It is then when it calls checksumOk
again and crashes.
> It can easily be avoided by adding a Boolean making sure that checksumOk is called exactly
once when EOS is encountered. Redundant calls to checksumOk do not seem to make sense anyhow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message