hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-872) DFSClient 0.20.1 is incompatible with HDFS 0.20.2
Date Wed, 13 Jan 2010 20:07:55 GMT

     [ https://issues.apache.org/jira/browse/HDFS-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-872:
-----------------------------

    Attachment: hdfs-872.txt

Attached is a patch against current branch-0.20 which resolves the protocol incompatibility
of the HDFS-101/HDFS-793 pair. Since this is tricky code, I'll try to summarize the patch
in detail:

In HDFS-793, PipelineAck's wire format includes a new element which is the number of status
replies to follow. This is the central incompatibility. So, in this patch, I removed that
field and reset the version number back to the original 14 from old branch-0.20. To know how
many status replies to read, it now takes the downstream pipeline depth as a constructor parameter.
This is used only for reading, and otherwise is -1 (it's an error to call readFields if it's
not been set)

Since the number of replies in a pipeline ack is no longer dynamic, I removed the getNumOfReplies
call as well.

When reading an ack, I check for the HEARTBEAT message, and in that case don't read any replies.
Otherwise I expect a reply from each downstream datanode. *For review:* should readFields
handle the case of a sequence number equal to -2? Best I can tell, the current code never
sends such a sequence number, and if it does it is an error. It may make sense to check for
it and throw an IOException in the case of a negative seqno that is not HEARTBEAT_SEQNO. Opinions
appreciated.

In DFSClient I added a DEBUG level printout for the contents of the pipeline. This was useful
to me as I was testing, to ensure that I tested killing each of the nodes in the pipeline
in the intended order.

In BlockReciever, I added back the "continue" during HEARTBEAT processing. I believe this
was an omission in the earlier patch - best I can tell, without the continue, it currently
sends a spurious "seqno=-2" ack after each heartbeat. With the continue call, it circles around
the loop correctly to wait for the next ack.

*For review*: I put a TODO for the case where BlockReceiver receives a seqno = -2. I currently
believe that any negative sequence number that is not HEART_BEAT is an error and should throw
an IOException (eg we got our reads misaligned).

When constructing the ack message for a failed mirror, since every ack must have the same
number of replies, I send SUCCESS followed by N errors, where N is the number of downstream
targets. The client's behavior is to eject the first ERROR node, so the presence of ERROR
status further downstream is unimportant - in truth they are semantically UNKNOWN, but no
such status code exists. *For review*: HDFS-793 reversed the order of the loop on DFSClient.java:2431
to locate the _last_ DN with ERROR status. I had to reverse this back to the original loop
order for this patch since the replies look like SUCCESS, ERROR, ERROR in the case that DN
2 dies.

In terms of testing, I performed the following:

- Start up 3 node distributed cluster using patched servers
- With an unpatched client, began uploading a file. Killed each node in the pipeline (first,
second, last) and ensured that the correct datanode was ejected.
- With a patched client and patched server, ran the same test.
- With patched client and unpatched server, ensured that file uploads work properly. I did
not test killing the unpatched server nodes here - I can do so if necessary, but was using
a shared cluster for this test.

In all cases, the file upload lasted more than 30 seconds, so heartbeats were tested.

> DFSClient 0.20.1 is incompatible with HDFS 0.20.2
> -------------------------------------------------
>
>                 Key: HDFS-872
>                 URL: https://issues.apache.org/jira/browse/HDFS-872
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Bassam Tabbara
>             Fix For: 0.20.2
>
>         Attachments: hdfs-872.txt
>
>
> After upgrading to that latest HDFS 0.20.2 (r896310 from /branches/branch-0.20), old
DFS clients (0.20.1) seem to not work anymore. HBase uses the 0.20.1 hadoop core jars and
the HBase master will no longer startup. Here is the exception from the HBase master log:
> {code}
> 2010-01-06 09:59:46,762 WARN org.apache.hadoop.hdfs.DFSClient: DFS Read: java.io.IOException:
Could not obtain block: blk_338051
> 2596555557728_1002 file=/hbase/hbase.version
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1788)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
>         at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
>         at java.io.DataInputStream.readUTF(DataInputStream.java:572)
>         at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:189)
>         at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:208)
>         at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:208)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1241)
>         at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1282)
> 2010-01-06 09:59:46,763 FATAL org.apache.hadoop.hbase.master.HMaster: Not starting HMaster
because:
> java.io.IOException: Could not obtain block: blk_3380512596555557728_1002 file=/hbase/hbase.version
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1788)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1616)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1743)
>         at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
>         at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
>         at java.io.DataInputStream.readUTF(DataInputStream.java:572)
>         at org.apache.hadoop.hbase.util.FSUtils.getVersion(FSUtils.java:189)
>         at org.apache.hadoop.hbase.util.FSUtils.checkVersion(FSUtils.java:208)
>         at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:208)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>         at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1241)
>         at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1282)
> {code}
> If I switch the hadoop jars in the hbase/lib directory with 0.20.2 version it works well,
which what led me to open this bug here and not in the HBASE project.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message