hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lohit <lohit...@yahoo.com>
Subject Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file
Date Fri, 21 Mar 2008 00:29:42 GMT

>So when I start iterating through the list, it works fine till I reach
>1/3 of the file names. Then it starts returning empty matrices. Then
>again returns the hostnames for the last 1/4 of all the elements.  I
>cannot tell you exactly the numbers right now. I could check this on

By any chance do you have zero byte files?

As soon as the file are closed the block locations information should be updated and any calls
to getFileCacheHints() would give you back those locations.

>But is seems that while rebalancing the cluster the files that are
>being reallocated. The   String[][] fileCacheHints =
>fs.getFileCacheHints(...) method cannot return a value.  Am I right??

When you describe this scenario, are you explicitly invoking therebalancer? Ideally, if you
are using hadoop dfs -copyFromLocal or -putor if a map reduce job is writing a file onto HDFS
and it terminateswith success. Later invocations of getFileCacheHints on these non-zerofiles
should not return you an empty matrix.

>I have 2 more questions. what is then start, end means for the
>parameters?  from byte 0 to byte 100 for instance? the javadoc does
>not say a word about them.

start, is the start offset within the file and second parameter is the length. In essence
you are providing the range within the file and trying to find out locations of blocks corresponding
to them. I agree that javadoc should be more descriptive. We could fix this.

>And why does return a matrix??? I am using replication level 2. For
>all the files that returned a value the matrix just contained an array
 >fileCacheHints[0][] == {hostNameA,hostNameB}

A file can have multiple blocks. In the matrix, each row correspond to one block of a file.
And columns within each row list all the hosts which host the block. (this depends on number
of replicas you have, for a replication factor of 3,  you would have 3 columns).

Let us know when the file is created, closed and when your java app calls getFileCacheHints().


On 20/03/2008, lohit <lohit_bv@yahoo.com> wrote:
> I tried to get location of a file which is 100 bytes and also first 100 bytes of huge
file. Both returned me set of hosts.
>  This is against trunk.
>    FileSystem fs = FileSystem.get(conf);
>     String[][] fileCacheHints = fs.getFileCacheHints(new Path("/user/lohit/test.txt"),
0, 100L);
>     for (String[] tmp : fileCacheHints) {
>       System.out.println("");
>       for(String tmp1 : tmp)
>         System.out.print(tmp1);
>     }
>  ----- Original Message ----
>  From: lohit <lohit_bv@yahoo.com>
>  To: core-dev@hadoop.apache.org
>  Sent: Thursday, March 20, 2008 11:14:49 AM
>  Subject: Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing
>  Hi Alfonso,
>  which version of hadoop are you using. Yesterday a change was checked into trunk which
changes getFileCacheHints.
>  Thanks,
>  Lohit
>  ----- Original Message ----
>  From: Alfonso Olias Sanz <alfonso.olias.sanz@gmail.com>
>  To: core-user@hadoop.apache.org; core-dev@hadoop.apache.org
>  Sent: Wednesday, March 19, 2008 10:51:08 AM
>  Subject: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file
>  HI there,
>  I am trying to get the hostnames where  a file is contained
>       dfs.getFileCacheHints(inFile, 0, 100);
>  But for a reason I cannot guess, some files that are in the HDFS the
>  returning String[][] is empty.
>  if I list the file using bin/hadoop -ls path | grep fileName   The file appears.
>  Also I am able to get the FileStatus dfs.getFileStatus(inFile);
>  What I am trying to do is for a list of files, get the hostnames were
>  the files are phisically stored.
>  Thanks
>  alfonso

View raw message