hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: could only be replicated to 0 nodes, instead of 1
Date Sat, 16 Jul 2011 11:13:09 GMT
The actual check is done to see if 5 blocks worth of space is
available remaining.

On Sat, Jul 16, 2011 at 1:52 PM, Thomas Anderson
<t.dt.aanderson@gmail.com> wrote:
> Harsh,
>
> Thanks, you are right. The problem stems from the tmp directory space
> is not large enough. After changing tmp dir to other place, the
> problem goes away.
>
> But I remember one block size (default) in hdfs is 64m, so shouldn't
> it at least allow one file, whose actual size in local disk is smaller
> than 1k, to be uploaded?
>
> Thanks again for the advice.
>
> On Fri, Jul 15, 2011 at 7:49 PM, Harsh J <harsh@cloudera.com> wrote:
>> Thomas,
>>
>> Your problem might lie simply with the virtual node DNs using /tmp and
>> tmpfs being used for that -- which somehow is causing reported free
>> space to go as 0 in reports to the NN (master).
>>
>> tmpfs                 101M   44K  101M   1% /tmp
>>
>> This causes your trouble that the NN can't choose a suitable DN to
>> write to, cause it determines that none has at least a block size
>> worth of space (64MB default) available for writes.
>>
>> You can resolve as:
>>
>> 1. Stop DFS completely.
>>
>> 2. Create a directory under root somewhere (I use Cloudera's distro,
>> and its default configured location for data files comes along as
>> /var/lib/hadoop-0.20/cache/, if you need an idea for a location) and
>> set it as your hadoop.tmp.dir in core-site.xml on all the nodes.
>>
>> 3. Reformat your NameNode (hadoop namenode -format, say Y) and restart
>> DFS. Things _should_ be OK now.
>>
>> Config example (core-site.xml):
>>
>>  <property>
>>   <name>hadoop.tmp.dir</name>
>>   <value>/var/lib/hadoop-0.20/cache</value>
>>  </property>
>>
>> Let us know if this still doesn't get your dev cluster up and running
>> for action :)
>>
>> On Fri, Jul 15, 2011 at 4:40 PM, Thomas Anderson
>> <t.dt.aanderson@gmail.com> wrote:
>>> When doing partition, I remember only / and swap was specified for all
>>> nodes during creation. So I think /tmp is also mounted under /, which
>>> should have size around 9G. The total size of hardisk specified is
>>> 10G.
>>>
>>> The df -kh shows
>>>
>>> server01:
>>> /dev/sda1             9.4G  2.3G  6.7G  25% /
>>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>>> tmpfs                 101M  132K  101M   1% /tmp
>>> udev                  247M     0  247M   0% /dev
>>> tmpfs                 101M     0  101M   0% /var/run/shm
>>> tmpfs                  51M  176K   51M   1% /var/run
>>>
>>> server02:
>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>>> tmpfs                 101M   44K  101M   1% /tmp
>>> udev                  247M     0  247M   0% /dev
>>> tmpfs                 101M     0  101M   0% /var/run/shm
>>> tmpfs                  51M  176K   51M   1% /var/run
>>>
>>> server03:
>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>>> tmpfs                 101M   44K  101M   1% /tmp
>>> udev                  247M     0  247M   0% /dev
>>> tmpfs                 101M     0  101M   0% /var/run/shm
>>> tmpfs                  51M  176K   51M   1% /var/run
>>>
>>> server04:
>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>>> tmpfs                 101M   44K  101M   1% /tmp
>>> udev                  247M     0  247M   0% /dev
>>> tmpfs                 101M     0  101M   0% /var/run/shm
>>> tmpfs                  51M  176K   51M   1% /var/run
>>>
>>> server05:
>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>> tmpfs                 5.0M  4.0K  5.0M   1% /lib/init/rw
>>> tmpfs                 5.0M     0  5.0M   0% /var/run/lock
>>> tmpfs                 101M   44K  101M   1% /tmp
>>> udev                  247M     0  247M   0% /dev
>>> tmpfs                 101M     0  101M   0% /var/run/shm
>>> tmpfs                  51M  176K   51M   1% /var/run
>>>
>>> In addition, the output of dfs (du -sk /tmp/hadoop-user/dfs) is
>>>
>>> server02:
>>> 8       /tmp/hadoop-user/dfs/
>>>
>>> server03:
>>> 8       /tmp/hadoop-user/dfs/
>>>
>>> server04:
>>> 8       /tmp/hadoop-user/dfs/
>>>
>>> server05:
>>> 8       /tmp/hadoop-user/dfs/
>>>
>>> On Fri, Jul 15, 2011 at 7:01 PM, Harsh J <harsh@cloudera.com> wrote:
>>>> (P.s. I asked that cause if you look at your NN's live nodes tables,
>>>> the reported space is all 0)
>>>>
>>>> What's the output of:
>>>>
>>>> du -sk /tmp/hadoop-user/dfs on all your DNs?
>>>>
>>>> On Fri, Jul 15, 2011 at 4:01 PM, Harsh J <harsh@cloudera.com> wrote:
>>>>> Thomas,
>>>>>
>>>>> Is your /tmp/ mount point also under the / or is it separate? Your
>>>>> dfs.data.dir are /tmp/hadoop-user/dfs/data in all DNs, and if they are
>>>>> separately mounted then what's the available space on that?
>>>>>
>>>>> (bad idea in production to keep things default on /tmp though, like
>>>>> dfs.name.dir, dfs.data.dir -- reconfigure+restart as necessary)
>>>>>
>>>>> On Fri, Jul 15, 2011 at 3:47 PM, Thomas Anderson
>>>>> <t.dt.aanderson@gmail.com> wrote:
>>>>>> 1.) The disk usage (with df -kh) on namenode (server01)
>>>>>>
>>>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>>>> /dev/sda1             9.4G  2.3G  6.7G  25% /
>>>>>>
>>>>>> and datanodes (server02 ~ server05)
>>>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>>>> /dev/sda1             9.4G  2.2G  6.8G  25% /
>>>>>>
>>>>>> 2.) How can I make sure that datanode is busy? The environment is
only
>>>>>> for testing so there is no other user processes are running at that
>>>>>> moment. Also it is a fresh installation, so only hadoop required
>>>>>> packages are installed such as hadoop and jdk.
>>>>>>
>>>>>> 3.) fs.block.size is not set in hdfs-site.xml, including datanodes
and
>>>>>> namenode, because its purpose is for testing. I thought it would
use
>>>>>> the default value, which should be 512?
>>>>>>
>>>>>> 4.) What might be a good way for fast check if network is not stable?
>>>>>> I check the healthy page e.g. server01:50070/dfshealth.jsp where
>>>>>> livenodes are up and  last contact varies when checking the page.
>>>>>>
>>>>>> Node     Last Contact    Admin State     Configured  Capacity
(GB)       Used
>>>>>> (GB)     Non DFS  Used (GB)      Remaining  (GB)      
  Used  (%)       Used  (%)
>>>>>> Remaining  (%)   Blocks
>>>>>> server02         2      In Service      0.1     0  
    0       0.1     0.01     99.96  0
>>>>>> server03         0      In Service      0.1     0  
    0       0.1     0.01     99.96  0
>>>>>> server04         1      In Service      0.1     0  
    0       0.1     0.01     99.96  0
>>>>>> server05         2      In Service      0.1     0  
    0       0.1     0.01     99.96  0
>>>>>>
>>>>>> 5.) Only command `hadoop fs -put /tmp/testfile test` is issued as
it
>>>>>> is just to test if the installation is working. So the file e.g.
>>>>>> testfile will be removed first (hadoop fs -rm test/testfile), then
>>>>>> upload again with hadoop put command.
>>>>>>
>>>>>> The logs are listed as below:
>>>>>>
>>>>>> namenode:
>>>>>> server01: http://pastebin.com/TLpDmmPx
>>>>>>
>>>>>> datanodes:
>>>>>> server02: http://pastebin.com/pdE5XKfi
>>>>>> server03: http://pastebin.com/4aV7ECCV
>>>>>> server04: http://pastebin.com/tF7HiRZj
>>>>>> server05: http://pastebin.com/5qwSPrvU
>>>>>>
>>>>>> Please let me know if more information needs to be provided.
>>>>>>
>>>>>> I really appreciate your suggestion.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 15, 2011 at 4:54 PM, Brahma Reddy <brahmareddyb@huawei.com>
wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> By seeing this exception(could only be replicated to 0 nodes,
instead of 1)
>>>>>>> ,datanode is not available to Name Node..
>>>>>>>
>>>>>>> This are the following cases Data Node may not available to Name
Node
>>>>>>>
>>>>>>> 1)Data Node disk is Full
>>>>>>>
>>>>>>> 2)Data Node is Busy with block report and block scanning
>>>>>>>
>>>>>>> 3)If Block Size is Negative value(dfs.block.size in hdfs-site.xml)
>>>>>>>
>>>>>>> 4)while write in progress primary datanode goes down(Any n/w
fluctations b/w
>>>>>>> Name Node and Data Node Machines)
>>>>>>>
>>>>>>> 5)when Ever we append any partial chunk and call sync for subsequent
partial
>>>>>>> chunk appends client should store the previous data in buffer.
>>>>>>>
>>>>>>> For example after appending "a" I have called sync and when I
am trying the
>>>>>>> to append the buffer should have "ab"
>>>>>>>
>>>>>>> And Server side when the chunk is not multiple of 512 then it
will try to do
>>>>>>> Crc comparison for the data present in block file as well as
crc present in
>>>>>>> metafile. But while constructing crc for the data present in
block it is
>>>>>>> always comparing till the initial Offeset
>>>>>>>
>>>>>>> Or For more analysis Please the data node logs
>>>>>>>
>>>>>>> Warm Regards
>>>>>>>
>>>>>>> Brahma Reddy
>>>>>>>
>>>>>>> ****************************************************************************
>>>>>>> ***********
>>>>>>> This e-mail and attachments contain confidential information
from HUAWEI,
>>>>>>> which is intended only for the person or entity whose address
is listed
>>>>>>> above. Any use of the information contained herein in any way
(including,
>>>>>>> but not limited to, total or partial disclosure, reproduction,
or
>>>>>>> dissemination) by persons other than the intended recipient's)
is
>>>>>>> prohibited. If you receive this e-mail in error, please notify
the sender by
>>>>>>> phone or email immediately and delete it!
>>>>>>> -----Original Message-----
>>>>>>> From: Thomas Anderson [mailto:t.dt.aanderson@gmail.com]
>>>>>>> Sent: Friday, July 15, 2011 9:09 AM
>>>>>>> To: hdfs-user@hadoop.apache.org
>>>>>>> Subject: could only be replicated to 0 nodes, instead of 1
>>>>>>>
>>>>>>> I have fresh  hadoop 0.20.2 installed on virtualbox 4.0.8 with
jdk
>>>>>>> 1.6.0_26. The problem is when trying to put a file to hdfs, it
throws
>>>>>>> error `org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>>>>>>> File /path/to/file could only be replicated to 0 nodes, instead
of 1';
>>>>>>> however, there is no problem to create a folder, as the command
ls
>>>>>>> print the result
>>>>>>>
>>>>>>> Found 1 items
>>>>>>> drwxr-xr-x   - user supergroup          0 2011-07-15 11:09
/user/user/test
>>>>>>>
>>>>>>> I also try with flushing firewall (remove all iptables restriction),
>>>>>>> but the error message is still thrown out when uploading (hadoop
fs
>>>>>>> -put /tmp/x test) a file from local fs.
>>>>>>>
>>>>>>> The name node log shows
>>>>>>>
>>>>>>> 2011-07-15 10:42:43,491 INFO org.apache.hadoop.hdfs.StateChange:
>>>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>>>> aaa.bbb.ccc.ddd.22:50010 storage DS-929017105-aaa.bbb.ccc.22-50010-13
>>>>>>> 10697763488
>>>>>>> 2011-07-15 10:42:43,495 INFO org.apache.hadoop.net.NetworkTopology:
>>>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.22:50010
>>>>>>> 2011-07-15 10:42:44,169 INFO org.apache.hadoop.hdfs.StateChange:
>>>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>>>> aaa.bbb.ccc.35:50010 storage DS-884574392-aaa.bbb.ccc.35-50010-13
>>>>>>> 10697764164
>>>>>>> 2011-07-15 10:42:44,170 INFO org.apache.hadoop.net.NetworkTopology:
>>>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.35:50010
>>>>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.hdfs.StateChange:
>>>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>>>> aaa.bbb.ccc.ddd.11:50010 storage DS-1537583073-aaa.bbb.ccc.11-50010-1
>>>>>>> 310697764488
>>>>>>> 2011-07-15 10:42:44,507 INFO org.apache.hadoop.net.NetworkTopology:
>>>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.11:50010
>>>>>>> 2011-07-15 10:42:45,796 INFO org.apache.hadoop.hdfs.StateChange:
>>>>>>> BLOCK* NameSystem.registerDatanode: node registration from
>>>>>>> 140.127.220.25:50010 storage DS-1500589162-aaa.bbb.ccc.25-50010-1
>>>>>>> 310697765386
>>>>>>> 2011-07-15 10:42:45,797 INFO org.apache.hadoop.net.NetworkTopology:
>>>>>>> Adding a new node: /default-rack/aaa.bbb.ccc.25:50010
>>>>>>>
>>>>>>> And all datanodes have similar message as below:
>>>>>>>
>>>>>>> 2011-07-15 10:42:46,562 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: using
>>>>>>> BLOCKREPORT_INTERVAL of 3600000msec Initial delay: 0msec
>>>>>>> 2011-07-15 10:42:47,163 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport
of 0
>>>>>>> blocks got processed in 3 msecs
>>>>>>> 2011-07-15 10:42:47,187 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic
>>>>>>> block scanner.
>>>>>>> 2011-07-15 11:19:42,931 INFO
>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport
of 0
>>>>>>> blocks got processed in 1 msecs
>>>>>>>
>>>>>>> Command `hadoop fsck /`  displays
>>>>>>>
>>>>>>> Status: HEALTHY
>>>>>>>  Total size:    0 B
>>>>>>>  Total dirs:    3
>>>>>>>  Total files:   0 (Files currently being written: 1)
>>>>>>>  Total blocks (validated):      0
>>>>>>>  Minimally replicated blocks:   0
>>>>>>>  Over-replicated blocks:        0
>>>>>>>  Under-replicated blocks:       0
>>>>>>>  Mis-replicated blocks:         0
>>>>>>>  Default replication factor:    3
>>>>>>>  Average block replication:     0.0
>>>>>>>  Corrupt blocks:                0
>>>>>>>  Missing replicas:              0
>>>>>>>  Number of data-nodes:          4
>>>>>>>
>>>>>>> The setting in conf include:
>>>>>>>
>>>>>>> - Master node:
>>>>>>> core-site.xml
>>>>>>>  <property>
>>>>>>>    <name>fs.default.name</name>
>>>>>>>    <value>hdfs://lab01:9000/</value>
>>>>>>>  </property>
>>>>>>>
>>>>>>> hdfs-site.xml
>>>>>>>  <property>
>>>>>>>    <name>dfs.replication</name>
>>>>>>>    <value>3</value>
>>>>>>>  </property>
>>>>>>>
>>>>>>> -Slave nodes:
>>>>>>> core-site.xml
>>>>>>>  <property>
>>>>>>>    <name>fs.default.name</name>
>>>>>>>    <value>hdfs://lab01:9000/</value>
>>>>>>>  </property>
>>>>>>>
>>>>>>> hdfs-site.xml
>>>>>>>  <property>
>>>>>>>    <name>dfs.replication</name>
>>>>>>>    <value>3</value>
>>>>>>>  </property>
>>>>>>>
>>>>>>> Do I missing any configuration? Or any place that I can check?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>>
>



-- 
Harsh J

Mime
View raw message