hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: copying between hadoop instances
Date Tue, 08 Feb 2011 22:07:40 GMT

Sorry responded from my phone which was bounced...

I used distcp using dftp running on the cdh3 cloud to pull data from earlier cloud.
(default options.)

HTH



> From: Korb_Michael@bah.com
> To: common-user@hadoop.apache.org
> Date: Tue, 8 Feb 2011 15:10:43 -0500
> Subject: RE: copying between hadoop instances
> 
> Michael,
> 
> Which worked for you? DistCp on the destination cluster or fs -cp? What options/protocols
did you use?
> 
> Thanks,
> Mike
> ________________________________________
> From: Michael Segel [michael_segel@hotmail.com]
> Sent: Tuesday, February 08, 2011 2:57 PM
> To: common-user@hadoop.apache.org
> Subject: RE: copying between hadoop instances
> 
> hadoop fsck /
> 
> And yes, you should run it on the destination cluster.
> 
> I've done this and it works for me....
> 
> 
> > From: Korb_Michael@bah.com
> > To: common-user@hadoop.apache.org
> > Date: Tue, 8 Feb 2011 14:01:52 -0500
> > Subject: RE: copying between hadoop instances
> >
> > Same results. I think I'll have more luck with fs -cp. I think the error is caused
by the fact that my source DFS has 29 under-replicated blocks. How can I get rid of these?
> >
> > Thanks,
> > Mike
> > ________________________________________
> > From: Vladimir Klimontovich [klimontovich@gmail.com]
> > Sent: Tuesday, February 08, 2011 1:49 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: copying between hadoop instances
> >
> > Maybe, API is backward compatible. Try to run the same command on
> > different node (it you ran in on mc00001, try mc00000)
> >
> > On Tue, Feb 8, 2011 at 8:50 PM, Korb, Michael [USA]
> > <Korb_Michael@bah.com> wrote:
> > > I was unable to get the stacktrace. Is there a workaround for the incompatible
APIs? I'm using hftp instead of hdfs because the DistCp guide (http://hadoop.apache.org/common/docs/r0.20.2/distcp.html)
says, "For copying between two different versions of Hadoop, one will usually use HftpFileSystem.
This is a read-only FileSystem, so DistCp must be run on the destination cluster (more specifically,
on TaskTrackers that can write to the destination cluster). Each source is specified as hftp://<dfs.http.address>/<path>
(the default dfs.http.address is <namenode>:50070)."
> > >
> > > Mike
> > > ________________________________________
> > > From: Vladimir Klimontovich [klimontovich@gmail.com]
> > > Sent: Tuesday, February 08, 2011 12:48 PM
> > > To: common-user@hadoop.apache.org
> > > Subject: Re: copying between hadoop instances
> > >
> > > Yes, new APIs between old and new version are incompatible.
> > >
> > > Did you managed to get stacktrace from
> > > http://mc00000.mcloud.bah.com:50075/streamFile?filename=/user/cluster/annotated/2009/07/05/_logs/history/mc00002_1291306280950_job_201012021111_0518_cluster_com.bah.mapred.CombineFilesDriver%253A+netflow-smallfi&ugi=hdfs
> > >
> > > And, by the way, why are you using htfp for source instead of hdfs://?
> > >
> > >
> > >
> > > On Tue, Feb 8, 2011 at 8:45 PM, Korb, Michael [USA]
> > > <Korb_Michael@bah.com> wrote:
> > >> That address is to a file on the destination fs, but it didn't get copied
from the source. That is where fs -cp fails every time. Here's what happens when I try distcp:
> > >>
> > >> sudo -u hdfs ./hadoop distcp -update hftp://mc00001:50070/ hdfs://mc00000:55310/
> > >>
> > >> 11/02/08 12:38:50 INFO tools.DistCp: srcPaths=[hftp://mc00001:50070/]
> > >> 11/02/08 12:38:50 INFO tools.DistCp: destPath=hdfs://mc00000:55310/
> > >> Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.JobConf.getCredentials()Lorg/apache/hadoop/security/Credentials;
> > >>        at org.apache.hadoop.tools.DistCp.checkSrcPath(DistCp.java:632)
> > >>        at org.apache.hadoop.tools.DistCp.copy(DistCp.java:656)
> > >>        at org.apache.hadoop.tools.DistCp.run(DistCp.java:881)
> > >>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> > >>        at org.apache.hadoop.tools.DistCp.main(DistCp.java:908)
> > >>
> > >> I asked about this before and got a response from Ted Dunning. He said,
"This is due to the security API not being available.  You are crossing from a cluster with
security to one without and that is causing confusion. Presumably your client assumes that
it is available and your hadoop library doesn't provide it. Check your class path very carefully
looking for version assumptions and confusions."
> > >>
> > >> I don't know where to begin checking my class path for these things...
but perhaps if I could get distcp working it wouldn't run into the same problems as fs -cp.
> > >>
> > >> Thanks,
> > >> Mike
> > >> ________________________________________
> > >> From: Vladimir Klimontovich [klimontovich@gmail.com]
> > >> Sent: Tuesday, February 08, 2011 12:24 PM
> > >> To: common-user@hadoop.apache.org
> > >> Subject: Re: copying between hadoop instances
> > >>
> > >> Try to go to  http://mc00000.mcloud.bah.com:50075/streamFile?filename=/user/cluster/annotated/2009/07/05/_logs/history/mc00002_1291306280950_job_201012021111_0518_cluster_com.bah.mapred.CombineFilesDriver%253A+netflow-smallfi&ugi=hdfs
> > >> and check if browser show you the stacktrace. If could give a lot of
> > >> information.
> > >>
> > >> And what's wrong with distcp (any stacktraces, error messages?)
> > >>
> > >> On Tue, Feb 8, 2011 at 8:06 PM, Korb, Michael [USA]
> > >> <Korb_Michael@bah.com> wrote:
> > >>> I have two Hadoop instances running on one cluster of machines for
the purpose of upgrading. I'm trying to copy all the files from the old instance to the new
one but have been having trouble with both distcp and fs -cp.
> > >>>
> > >>> Most recently, I've been trying, "sudo -u hdfs ./hadoop fs -cp hftp://mc00001:50070/*
hdfs://mc00000:55310/" where mc00001 is the namenode of old hadoop and mc00000 is the namenode
of new hadoop.
> > >>>
> > >>> I've had some success with this command (some files have actually been
copied), but part of the way through the copy, I get this error:
> > >>>
> > >>> cp: Server returned HTTP response code: 500 for URL: http://mc00000.mcloud.bah.com:50075/streamFile?filename=/user/cluster/annotated/2009/07/05/_logs/history/mc00002_1291306280950_job_201012021111_0518_cluster_com.bah.mapred.CombineFilesDriver%253A+netflow-smallfi&ugi=hdfs
> > >>>
> > >>> Is it possible that there could be permissions issues? It also doesn't
seem quite right to be copying * since there are directories, but I don't think there's a
way to call fs -cp recursively. Could this be causing problems?
> > >>>
> > >>> Thanks,
> > >>> Mike
> > >>
> > >>
> > >>
> > >> --
> > >> Vladimir Klimontovich
> > >> Cell: +7-926-890-2349, skype: klimontovich
> > >>
> > >
> > >
> > >
> > > --
> > > Vladimir Klimontovich
> > > Cell: +7-926-890-2349, skype: klimontovich
> > >
> >
> >
> >
> > --
> > Vladimir Klimontovich
> > Cell: +7-926-890-2349, skype: klimontovich
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message