hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alfonso Olias Sanz" <alfonso.olias.s...@gmail.com>
Subject Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file
Date Thu, 20 Mar 2008 23:09:28 GMT
HI lohit

I am using 0.16.0.   The test scenario was:  37GB of data in files of
several sizes between 15MB and 120MB.  I uploaded around 1000 files.
When the bin/hadoop copy command exited. I run the java application
which retrieves that info.  The way I check all the files is
1 open the local directory
2 using a filter for zip files
3 call list()  returns all the zip files in the directory.

So when I start iterating through the list, it works fine till I reach
1/3 of the file names. Then it starts returning empty matrices. Then
again returns the hostnames for the last 1/4 of all the elements.  I
cannot tell you exactly the numbers right now. I could check this on

I have 5 nodes running for my experiment, each node with 20GB for the
HDFS. While coping the files I used the web app for monitoring the
HDFS nodes. The node from where I am coping the files is the one that
is more used (% of HD) although the files are spread through all the
nodes. This one is very loaded.

When the copy command finishes, It is when I run my java application
and I get the empty String [][].

I checked again the webb app after several minutes 5/10min and the
cluster was almost balanced. So data had being moved from this node to
the others in the cluster.  When all the nodes had similar percentages
of use (space). I run again the java app and it seemed to work.

I am not SURE of this because I couldn't check the output. I will run
a test again next Monday.

But is seems that while rebalancing the cluster the files that are
being reallocated. The   String[][] fileCacheHints =
fs.getFileCacheHints(...) method cannot return a value.  Am I right??

I have 2 more questions. what is then start, end means for the
parameters?  from byte 0 to byte 100 for instance? the javadoc does
not say a word about them.

And why does return a matrix??? I am using replication level 2. For
all the files that returned a value the matrix just contained an array
 fileCacheHints[0][] == {hostNameA,hostNameB}


On 20/03/2008, lohit <lohit_bv@yahoo.com> wrote:
> I tried to get location of a file which is 100 bytes and also first 100 bytes of huge
file. Both returned me set of hosts.
>  This is against trunk.
>    FileSystem fs = FileSystem.get(conf);
>     String[][] fileCacheHints = fs.getFileCacheHints(new Path("/user/lohit/test.txt"),
0, 100L);
>     for (String[] tmp : fileCacheHints) {
>       System.out.println("");
>       for(String tmp1 : tmp)
>         System.out.print(tmp1);
>     }
>  ----- Original Message ----
>  From: lohit <lohit_bv@yahoo.com>
>  To: core-dev@hadoop.apache.org
>  Sent: Thursday, March 20, 2008 11:14:49 AM
>  Subject: Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing
>  Hi Alfonso,
>  which version of hadoop are you using. Yesterday a change was checked into trunk which
changes getFileCacheHints.
>  Thanks,
>  Lohit
>  ----- Original Message ----
>  From: Alfonso Olias Sanz <alfonso.olias.sanz@gmail.com>
>  To: core-user@hadoop.apache.org; core-dev@hadoop.apache.org
>  Sent: Wednesday, March 19, 2008 10:51:08 AM
>  Subject: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file
>  HI there,
>  I am trying to get the hostnames where  a file is contained
>       dfs.getFileCacheHints(inFile, 0, 100);
>  But for a reason I cannot guess, some files that are in the HDFS the
>  returning String[][] is empty.
>  if I list the file using bin/hadoop -ls path | grep fileName   The file appears.
>  Also I am able to get the FileStatus dfs.getFileStatus(inFile);
>  What I am trying to do is for a list of files, get the hostnames were
>  the files are phisically stored.
>  Thanks
>  alfonso

View raw message