hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: pauses during startup (maybe network related?)
Date Fri, 24 May 2013 05:29:14 GMT
You are spot on about the DNS lookup slowing things down. I've faced
the same issue (before I had a local network DNS set up for the WiFi
network I use).

> but I'm still more just miffed at how it's knowing I'm a 192 address when I told it to
use localhost.

There's a few configs you need to additionally change to make a
perfect localhost setup. Otherwise, there are defaults in Apache
Hadoop that bind to 0.0.0.0 and report the current system hostname
(which changes if you get onto a network), causing what you're seeing.

On Fri, May 24, 2013 at 7:42 AM, Ted <r6squeegee@gmail.com> wrote:
> thanks, I'm almost 100% sure it's network related now.
>
> What I tested was unpluggin my network :), the entire system starts in
> just a few seconds.
>
> I decided to search on "reverse dns" in google and I see other people
> have complained about very slow reverse dns lookups (some related to
> hadoop / hbase too).
>
> I'm not sure why this is happenning yet though. I thought 127.0.0.1 or
> localhost would have just resolved instantly - but it appears it's
> some how finding my real IP instead, i.e. 192.168.1.5 seems to show up
> in the log entries even though all my configurations say
> localhost/127.0.0.1 and my /etc/hosts file has and entry for
> localhost/127.0.0.1
>
> I think if I make a /etc/hosts entry for 192.168.1.5 everything will
> be quick, that's what I'm going to test later. The only problem is I'm
> on an dynamic IP... I've considered just making entries for all
> reasonable permutations like 192.168.1.1 through 192.168.1.20... but
> I'm still more just miffed at how it's knowing I'm a 192 address when
> I told it to use localhost.
>
> On 5/24/13, Chris Nauroth <cnauroth@hortonworks.com> wrote:
>> Hi Ted,
>>
>> 2013-05-23 19:28:19,937 INFO
>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>> occuring more than 10 times
>> ...
>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>> handler 28 on 9000: starting
>>
>> There are a couple of relevant activities that happen during namenode
>> startup in between these 2 log statements.  It loads the current fsimage
>> (persistent copy of file system metadata), merges in the edits log
>> (transaction log containing all file system metadata changes since the last
>> checkpoint), and then saves back a new fsimage file after that merge.
>>  Current versions of the Hadoop codebase will print some information to
>> logs about the volume of activity during this checkpointing process, so I
>> recommend looking for that in your logs to see if this explains it.
>>  Depending on whether or not your have a large number of transactions
>> queued since your last checkpoint, this whole process can cause namenode
>> startup to take several minutes.
>>
>> If this becomes a regular problem, then you can run SecondaryNameNode or
>> BackupNode to perform periodic checkpoints in addition to the checkpoint
>> that occurs on namenode restart.  This is probably overkill for a dev
>> environment on your laptop though.
>>
>> Hope this helps,
>>
>> Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>
>> On Thu, May 23, 2013 at 2:49 AM, Ted <r6squeegee@gmail.com> wrote:
>>
>>> Hi I'm running hadoop on my local laptop for development and
>>> everything "works" but there's some annoying pauses during the startup
>>> which causes the entire hadoop startup process to take up to 4 minutes
>>> and I'm wondering what it is and if I can do anything about it.
>>>
>>> I'm running everything on 1 machines, on fedora linux, hadoop-1.1.2,
>>> oracle jkd1.7.0_17, the machine is a dual core i5, and I have 8gb of
>>> ram and an SSD so it shouldn't be slow.
>>>
>>> When the system pauses, there is no cpu usage, no disk usage and no
>>> network usage (although I suspect it's waiting for the network to
>>> resolve or return something).
>>>
>>> Here's some snippets from the namenode logs during startup where you
>>> can see it just pauses for around 30 seconds or more with out errors
>>> or anything :
>>>
>>> ...
>>> 2013-05-23 19:26:37,660 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from
>>> hadoop-metrics2.properties
>>> 2013-05-23 19:26:37,676 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
>>> MetricsSystem,sub=Stats registered.
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
>>> period at 10 second(s).
>>> 2013-05-23 19:27:54,144 INFO
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics
>>> system started
>>> ...
>>> 2013-05-23 19:27:54,341 WARN
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: The
>>> dfs.support.append option is in your configuration, however append is
>>> not supported. This configuration option is no longer required to
>>> enable sync.
>>> 2013-05-23 19:27:54,341 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
>>> accessTokenLifetime=0 min(s)
>>> 2013-05-23 19:28:19,918 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>> FSNamesystemStateMBean and NameNodeMXBean
>>> 2013-05-23 19:28:19,937 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Caching file names
>>> occuring more than 10 times
>>> ...
>>> 2013-05-23 19:28:26,801 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 28 on 9000: starting
>>> 2013-05-23 19:28:26,833 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 31 on 9000: starting
>>> 2013-05-23 19:30:10,644 INFO org.apache.hadoop.hdfs.StateChange:
>>> BLOCK* NameSystem.registerDatanode: node registration from
>>> 127.0.0.1:50010 storage DS-651015167-192.168.1.5-50010-1369140176513
>>> 2013-05-23 19:30:10,650 INFO org.apache.hadoop.net.NetworkTopology:
>>> Adding a new node: /default-rack/127.0.0.1:50010
>>>
>>>
>>> I already start the system with : export
>>> HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
>>> I only allocate : export HADOOP_HEAPSIZE=512 (but it's an empty hadoop
>>> system, maybe just 1 or 2 test files less than 100k, and there's no
>>> CPU usage so it doesn't look like it's GC thrashing)
>>>
>>> I should mention again, there's no errors and the system runs fine and
>>> relatively speedy once started (considering it's on my laptop).
>>>
>>> Does anyone know what's causing these pauses? (and how I can get rid of
>>> them)
>>> Thanks.
>>> --
>>> Ted.
>>>
>>
>
>
> --
> Ted.



-- 
Harsh J

Mime
View raw message