hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Finkelshteyn <iefin...@gmail.com>
Subject Re: HDFS Files Seem to be Stored in the Wrong Location?
Date Mon, 06 Feb 2012 16:38:42 GMT
Ah, crud. Typo on my part. Don't know how I didn't notice that. Thanks!

On 2/6/12 11:30 AM, Harsh J wrote:
> You need your dfs.data.dir configured to the bigger disks for data.
> That config targets the datanodes.
>
> The one you've overriden is for the namenode's metadata, and hence the
> default dfs.data.dir config is writing to /tmp on your root disk
> (which is a bad thing, gets wiped after a reboot).
>
> On Mon, Feb 6, 2012 at 9:51 PM, Eli Finkelshteyn<iefinkel@gmail.com>  wrote:
>> Hi,
>> I have a pseudo-distributed Hadoop cluster setup, and I'm currently hoping
>> to put about 100 gigs of files on it to play around with. I got a unix box
>> at work no one else is using for this, and running a df -h, I get:
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/sda1             7.9G  2.4G  5.2G  31% /
>> none                  3.8G     0  3.8G   0% /dev/shm
>> /dev/sdb              414G  210M  393G   1% /mnt
>>
>> Alright, so /mnt looks quite big and seems like a good place to store my
>> hdfs files. I go ahead and create a folder named hadoop-data there and set
>> the following in hdfs-site.xml:
>>
>> <property>
>> <!-- where hadoop stores its files (datanodes only) -->
>> <name>dfs.name.dir</name>
>> <value>/mnt/hadoop-data</value>
>> </property>
>>
>> After a bit of troubleshooting, I restart the cluster and try to put a
>> couple of test files onto HDFS. Doing an ls of hadoop-data, I see:
>>
>> $ ls
>> current  image  in_use.lock  previous.checkpoint
>>
>> OK, things look good. Time to try uploading some real data. Now, here's
>> where the problem arises. If I add a 10mb dummy file to hadoop-data through
>> regular unix and run df -h, I see that the used space of /mnt goes up
>> exactly 10mb. But, when I start running a big dump of data through:
>>
>> hadoop fs -put ~/hadoop_playground/data2/data2/ /data/
>>
>> I notice that running df -h seems to put the data in completely the wrong
>> location! Note that below, only the usage of /dev/sda1 has increased. /mnt
>> has not moved.
>>
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/sda1             7.9G  3.4G  4.2G  45% /
>> none                  3.8G     0  3.8G   0% /dev/shm
>> /dev/sdb              414G  210M  393G   1% /mnt
>>
>> So, what gives? Anyone have any clue how my files are seemingly both put in
>> the hadoop-data folder, but take up space elsewhere? I could see this likely
>> being a Unix issue, but I figured I'd ask here just in case it's not, since
>> I'm pretty stumped.
>>
>> Cheers,
>> Eli
>
>


Mime
View raw message