hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jian Fang <jian.fang.subscr...@gmail.com>
Subject Re: hftp in Hadoop 0.20.2
Date Wed, 15 Aug 2012 17:24:28 GMT
Thanks Harsh for clarifying this.

I have another question related to hftp. I tried to run distcp to copy big
files between two clusters, the source cluster is 0.20.2 and the
destination cluster is CDH3U4. For example,

$HADOOP_HOME/bin/hadoop distcp -m 6
hftp://pnjhadoopnn01:8023/profile/input/ProductInfo.txt
hdfs://dnjsrcha07:59310/profile/input

One problem I face now is that the mapper number for the distcp job is
always 1 no matter whether I add the "-m" option or not. This is a not good
since the file size is pretty big. Any way to solve this problem?

Also I saw some posts from the web mentioning that files uploaded from a
cluster node to hdfs may be stored in the local node. I am not sure if
distcp would do the same thing. If this is true, I cannot use distcp, but
need to download files from the source cluster and then upload files to the
destination cluster from a client machine. However, this is really
inefficient. Could you clarify this for me?

Thanks again,

Jian



On Sat, Aug 11, 2012 at 10:06 PM, Harsh J <harsh@cloudera.com> wrote:

> Jian,
>
> Do not rely on dfs.info.port, it is a deprecated property and does not
> exist anymore in 2.x releases. Rely instead on the fuller
> dfs.http.address in 1.x and dfs.namenode.http.address in 2.x.
>
> On Sat, Aug 11, 2012 at 3:45 AM, Jian Fang
> <jian.fang.subscribe@gmail.com> wrote:
> > Thanks Joey for the clarification. I will ask our hadoop admin to change
> > that.
> > But it would be great if this could be mentioned in the distcp document.
> >
> > Thanks,
> >
> > Jian
> >
> >
> > On Fri, Aug 10, 2012 at 6:06 PM, Joey Echeverria <joey@cloudera.com>
> wrote:
> >>
> >> Yes, the dfs.info.port controls the HTTP port of the NN, including for
> >> HFTP.
> >>
> >> You should make sure that your settings for dfs.http.address and
> >> dfs.info.port are in sync. So change one of those to match the port
> >> number of the other.
> >>
> >> -Joey
> >>
> >> On Fri, Aug 10, 2012 at 5:41 PM, Jian Fang
> >> <jian.fang.subscribe@gmail.com> wrote:
> >> > Hi Joey,
> >> >
> >> > I run the following command and got the jetty port as 8023.
> >> >
> >> >  $ grep "Jetty bound to port"
> >> > hadoop-hadoop-namenode-pnjhadoopnn01.barnesandnoble.com.log*
> >> >
> >> >
> hadoop-hadoop-namenode-pnjhadoopnn01.barnesandnoble.com.log.2012-04-07:2012-04-07
> >> > 20:56:16,334 INFO org.apache.hadoop.http.HttpServer: Jetty bound to
> port
> >> > 8023
> >> >
> >> > Does this mean hftp is actually bound to port 8023?
> >> >
> >> > I am a bit confused. In hdfs-site.xml, we have the property defined as
> >> > follows.
> >> >
> >> >
> >> > <property>
> >> >     <name>dfs.http.address</name>
> >> >    <value>pnjhadoopnn01:50070</value>
> >> > </property>
> >> >
> >> > and in core-site.xml, we have the following settings.
> >> >
> >> >   <property>
> >> >     <name>fs.default.name</name>
> >> >     <value>pnjhadoopnn01:8020</value>
> >> >     <final>true</final>
> >> >   </property>
> >> >
> >> >   <property>
> >> >     <name>dfs.secondary.info.port</name>
> >> >     <value>8022</value>
> >> >   </property>
> >> >   <property>
> >> >     <name>dfs.info.port</name>
> >> >     <value>8023</value>
> >> >   </property>
> >> >   <property>
> >> >     <name>mapred.job.tracker.info.port</name>
> >> >     <value>8024</value>
> >> >   </property>
> >> >   <property>
> >> >     <name>tasktracker.http.port</name>
> >> >     <value>8025</value>
> >> >   </property>
> >> >   <property>
> >> >     <name>mapred.job.tracker.info.port</name>
> >> >     <value>8024</value>
> >> >   </property>
> >> >
> >> > Does this mean hadoop honors dfs.info.port over dfs.http.address?
> >> >
> >> > Thanks,
> >> >
> >> > Jian
> >> >
> >> > On Fri, Aug 10, 2012 at 5:08 PM, Joey Echeverria <joey@cloudera.com>
> >> > wrote:
> >> >>
> >> >> Can you post your NN logs? It looks like the NN is not actually
> >> >> started or is listening on another port for HTTP.
> >> >>
> >> >> -Joey
> >> >>
> >> >> On Fri, Aug 10, 2012 at 2:38 PM, Jian Fang
> >> >> <jian.fang.subscribe@gmail.com> wrote:
> >> >> > Already did that. Connection was rejected.
> >> >> >
> >> >> >
> >> >> > On Fri, Aug 10, 2012 at 2:24 PM, Joey Echeverria <
> joey@cloudera.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> Try:
> >> >> >>
> >> >> >> $ telnet pnjhadoopnn01 50070
> >> >> >>
> >> >> >> -Joey
> >> >> >>
> >> >> >> On Fri, Aug 10, 2012 at 1:10 PM, Jian Fang
> >> >> >> <jian.fang.subscribe@gmail.com> wrote:
> >> >> >> > Here is the property in hdfs-site.xml
> >> >> >> >
> >> >> >> >    <property>
> >> >> >> >       <name>dfs.http.address</name>
> >> >> >> >       <value>pnjhadoopnn01:50070</value>
> >> >> >> >    </property>
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> >
> >> >> >> > Jian
> >> >> >> >
> >> >> >> >
> >> >> >> > On Fri, Aug 10, 2012 at 11:46 AM, Harsh J <harsh@cloudera.com>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Yes the test was to figure out if there really was
a listener
> on
> >> >> >> >> 50070. Can you check the hdfs-site.xml on the NN
machine for
> what
> >> >> >> >> its
> >> >> >> >> dfs.http.address may really be using for its port?
> >> >> >> >>
> >> >> >> >> On Fri, Aug 10, 2012 at 7:48 PM, Jian Fang
> >> >> >> >> <jian.fang.subscribe@gmail.com> wrote:
> >> >> >> >> > Hi Harsh,
> >> >> >> >> >
> >> >> >> >> > Seems the -p requires the root privilege, which
I don't
> have. I
> >> >> >> >> > run
> >> >> >> >> > "netstat -a | grep 50070", but did not get back
anything. As
> I
> >> >> >> >> > said,
> >> >> >> >> > telnet
> >> >> >> >> > did not work either.
> >> >> >> >> >
> >> >> >> >> > [hadoop@pnjhadoopnn01 ~]$ telnet  pnjhadoopnn01
50070
> >> >> >> >> > Trying xx.xx.xx.xx...
> >> >> >> >> > telnet: connect to address xx.xx.xx.xx: Connection
refused
> >> >> >> >> > telnet: Unable to connect to remote host: Connection
refused
> >> >> >> >> >
> >> >> >> >> > [hadoop@pnjhadoopnn01 ~]$ telnet localhost 50070
> >> >> >> >> > Trying 127.0.0.1...
> >> >> >> >> > telnet: connect to address 127.0.0.1: Connection
refused
> >> >> >> >> > telnet: Unable to connect to remote host: Connection
refused
> >> >> >> >> >
> >> >> >> >> > Thanks,
> >> >> >> >> >
> >> >> >> >> > Jian
> >> >> >> >> >
> >> >> >> >> > On Fri, Aug 10, 2012 at 1:50 AM, Harsh J <harsh@cloudera.com
> >
> >> >> >> >> > wrote:
> >> >> >> >> >>
> >> >> >> >> >> Jian,
> >> >> >> >> >>
> >> >> >> >> >> From your NN, can you get us the output
"netstat -anp | grep
> >> >> >> >> >> 50070"?
> >> >> >> >> >>
> >> >> >> >> >> On Fri, Aug 10, 2012 at 9:29 AM, Jian Fang
> >> >> >> >> >> <jian.fang.subscribe@gmail.com> wrote:
> >> >> >> >> >> > Thanks Harsh. But there is no firewall
there, the two
> >> >> >> >> >> > clusters
> >> >> >> >> >> > are
> >> >> >> >> >> > on
> >> >> >> >> >> > the
> >> >> >> >> >> > same networks. I cannot telnet to the
port even on the
> same
> >> >> >> >> >> > machine.
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >> > On Thu, Aug 9, 2012 at 6:00 PM, Harsh
J <
> harsh@cloudera.com>
> >> >> >> >> >> > wrote:
> >> >> >> >> >> >>
> >> >> >> >> >> >> Hi Jian,
> >> >> >> >> >> >>
> >> >> >> >> >> >> HFTP is always-on by default. Can
you check and make sure
> >> >> >> >> >> >> that
> >> >> >> >> >> >> the
> >> >> >> >> >> >> firewall isn't the cause of the
connection refused on
> port
> >> >> >> >> >> >> 50070
> >> >> >> >> >> >> on
> >> >> >> >> >> >> the NN and ports 50075 on the DNs
here?
> >> >> >> >> >> >>
> >> >> >> >> >> >> On Fri, Aug 10, 2012 at 1:47 AM,
Jian Fang
> >> >> >> >> >> >> <jian.fang.subscribe@gmail.com>
wrote:
> >> >> >> >> >> >> > Hi,
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > We have a hadoop cluster of
version 0.20.2 in
> production.
> >> >> >> >> >> >> > Now
> >> >> >> >> >> >> > we
> >> >> >> >> >> >> > have
> >> >> >> >> >> >> > another new Hadoop cluster
using cloudera's CDH3U4. We
> >> >> >> >> >> >> > like
> >> >> >> >> >> >> > to
> >> >> >> >> >> >> > run
> >> >> >> >> >> >> > distcp to
> >> >> >> >> >> >> > copy files between the two
clusters. Since the hadoop
> >> >> >> >> >> >> > versions
> >> >> >> >> >> >> > are
> >> >> >> >> >> >> > different, we have to use
hftp protocol to copy files
> >> >> >> >> >> >> > based
> >> >> >> >> >> >> > on
> >> >> >> >> >> >> > the
> >> >> >> >> >> >> > hadoop
> >> >> >> >> >> >> > document here:
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> >> >> >> >> >> >> >
> http://hadoop.apache.org/common/docs/r0.20.2/distcp.html#cpver.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > The problem is that I cannot
access files via hftp from
> >> >> >> >> >> >> > the
> >> >> >> >> >> >> > current
> >> >> >> >> >> >> > production 0.20.2 cluster
even though I can see the
> >> >> >> >> >> >> > following
> >> >> >> >> >> >> > setting
> >> >> >> >> >> >> > from
> >> >> >> >> >> >> > job tracker UI.
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > dfs.http.address pnjhadoopnn01:50070
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > I tried to telnet this port,
but got "connection
> refused"
> >> >> >> >> >> >> > error.
> >> >> >> >> >> >> > Seems
> >> >> >> >> >> >> > the
> >> >> >> >> >> >> > hftp service is not actually
running. Could someone
> tell
> >> >> >> >> >> >> > me
> >> >> >> >> >> >> > how
> >> >> >> >> >> >> > to
> >> >> >> >> >> >> > enable
> >> >> >> >> >> >> > the hftp service in the 0.20.2
hadoop cluster so that I
> >> >> >> >> >> >> > can
> >> >> >> >> >> >> > run
> >> >> >> >> >> >> > distcp?
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > Thanks in advance,
> >> >> >> >> >> >> >
> >> >> >> >> >> >> > John
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >>
> >> >> >> >> >> >> --
> >> >> >> >> >> >> Harsh J
> >> >> >> >> >> >
> >> >> >> >> >> >
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >> Harsh J
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> Harsh J
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Joey Echeverria
> >> >> >> Principal Solutions Architect
> >> >> >> Cloudera, Inc.
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Joey Echeverria
> >> >> Principal Solutions Architect
> >> >> Cloudera, Inc.
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Joey Echeverria
> >> Principal Solutions Architect
> >> Cloudera, Inc.
> >
> >
>
>
>
> --
> Harsh J
>

Mime
View raw message