hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Connect to HDFS running on a different Hadoop-Version
Date Wed, 25 Jan 2012 12:12:28 GMT
Hello Romeo,


On Wed, Jan 25, 2012 at 4:07 PM, Romeo Kienzler <romeo@ormium.de> wrote:
> Dear List,
> we're trying to use a central HDFS storage in order to be accessed from
> various other Hadoop-Distributions.

The HDFS you've setup, what 'distribution' is that from? You will have
to use that particular version's jar across all client applications
you use, else you'll run into RPC version incompatibilities.

> Do you think this is possible? We're having trouble, but not related to
> different RPC-Versions.

It should be possible _most of the times_ by replacing jars at the
client end to use the one that runs your cluster, but there may be
minor API incompatibilities between certain versions that can get in
the way. Purely depends on your client application and its
implementation. If it sticks to using the publicly supported APIs, you
are mostly fine.

> When trying to access a Cloudera CDH3 Update 2 (cdh3u2) HDFS from
> BigInsights 1.3 we're getting this error:

BigInsights runs off IBM's own patched Hadoop sources if I am right,
and things can get a bit tricky there. See the following points:

> Bad connection to FS. Command aborted. Exception: Call to
> localhost.localdomain/ failed on local exception:
> java.io.EOFException
> java.io.IOException: Call to localhost.localdomain/ failed on
> local exception: java.io.EOFException

This is surely an RPC issue. The call tries to read off a field, but
gets no response, EOFs and dies. We have more descriptive error
messages with the 0.23 version onwards, but the problem here is that
your IBM client jar is not the same as your cluster's jar. The mixture
won't work.

> com.ibm.biginsights.hadoop.patch.PatchedDistributedFileSystem.initialize(PatchedDistributedFileSystem.java:19)

^^ This is what am speaking of. Your client (BigInsights? Have not
used it really…) is using an IBM jar with their supplied
'PatchDistributedFileSystem', and that is probably incompatible with
the cluster's HDFS RPC protocols. I do not know enough about IBM's
custom stuff to know for sure it would work if you replace it with
your clusters' jar.

> But we've already replaced the client hadoop-common.jar's with the Cloudera
> ones.

Apparently not. Your strace shows that com.ibm.* classes are still
being pulled. My guess is that BigInsights would not work with
anything non IBM, but I have not used it to know for sure.

If they have a user community, you can ask there if there is a working
way to have BigInsights run against Apache/CDH/etc. distributions.
For CDH specific questions, you may ask at
https://groups.google.com/a/cloudera.org/group/cdh-user/topics instead
of the Apache lists here.

Harsh J
Customer Ops. Engineer, Cloudera

View raw message