hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alfonso Olias Sanz" <alfonso.olias.s...@gmail.com>
Subject Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing file
Date Mon, 24 Mar 2008 11:34:27 GMT
On 21/03/2008, lohit <lohit_bv@yahoo.com> wrote:
>
>  >So when I start iterating through the list, it works fine till I reach
>  >1/3 of the file names. Then it starts returning empty matrices. Then
>  >again returns the hostnames for the last 1/4 of all the elements.  I
>  >cannot tell you exactly the numbers right now. I could check this on
>  >monday.
>
>
> By any chance do you have zero byte files?

No, all the files contain data.

>
>  As soon as the file are closed the block locations information should be updated and
any calls to getFileCacheHints() would give you back those locations.
>
>
>  >But is seems that while rebalancing the cluster the files that are
>  >being reallocated. The   String[][] fileCacheHints =
>  >fs.getFileCacheHints(...) method cannot return a value.  Am I right??
>
>
> When you describe this scenario, are you explicitly invoking therebalancer? Ideally,
if you are using hadoop dfs -copyFromLocal or -putor if a map reduce job is writing a file
onto HDFS and it terminateswith success. Later invocations of getFileCacheHints on these non-zerofiles
should not return you an empty matrix.
>

Yes I have called explicit the balancer because I want the data
balanced before I run the application

I am running again the same test. I called explicit the balancer.
This is the actual output of the log file
  tail -f /home/aolias/software/Hadoop/hadoop-0.16.0/bin/../logs/hadoop-aolias-balancer-gaiawl03.net4.lan.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left
To Move  Bytes Being Moved
Mar 24, 2008 12:11:47 PM          0                 0 KB
24.33 MB          787.42 MB
Mar 24, 2008 12:15:55 PM          1            461.45 MB
2.66 GB          787.42 MB
Mar 24, 2008 12:23:14 PM          2            761.36 MB
3.53 GB          787.42 MB


  I suppose that while data is being balanced,  there is no output for
those blocks/files.  I will run the test twice, one before the
balancer finishes, and the second one after it finishesh balancing the
cluster.

Is it there any way to query the running balancer from java in order
to make the aplication wait til the system is balanced?

>
>  >I have 2 more questions. what is then start, end means for the
>  >parameters?  from byte 0 to byte 100 for instance? the javadoc does
>  >not say a word about them.
>
>
> start, is the start offset within the file and second parameter is the length. In essence
you are providing the range within the file and trying to find out locations of blocks corresponding
to them. I agree that javadoc should be more descriptive. We could fix this.
>

Ok thanks! :)
>
>  >And why does return a matrix??? I am using replication level 2. For
>  >all the files that returned a value the matrix just contained an array
>   >fileCacheHints[0][] == {hostNameA,hostNameB}
>
>
> A file can have multiple blocks. In the matrix, each row correspond to one block of a
file. And columns within each row list all the hosts which host the block. (this depends on
number of replicas you have, for a replication factor of 3,  you would have 3 columns).
>
>  Let us know when the file is created, closed and when your java app calls getFileCacheHints().
>
>
>  Thanks
>  Alfonso
>
>  On 20/03/2008, lohit <lohit_bv@yahoo.com> wrote:
>  > I tried to get location of a file which is 100 bytes and also first 100 bytes of
huge file. Both returned me set of hosts.
>  >  This is against trunk.
>  >
>  >    FileSystem fs = FileSystem.get(conf);
>  >
>  >     String[][] fileCacheHints = fs.getFileCacheHints(new Path("/user/lohit/test.txt"),
0, 100L);
>  >     for (String[] tmp : fileCacheHints) {
>  >       System.out.println("");
>  >       for(String tmp1 : tmp)
>  >         System.out.print(tmp1);
>  >
>  >     }
>  >
>  >
>  >  ----- Original Message ----
>  >  From: lohit <lohit_bv@yahoo.com>
>  >  To: core-dev@hadoop.apache.org
>  >  Sent: Thursday, March 20, 2008 11:14:49 AM
>  >  Subject: Re: [core] dfs.getFileCacheHints () returns an empty matrix for an existing
file
>  >
>  >  Hi Alfonso,
>  >
>  >  which version of hadoop are you using. Yesterday a change was checked into trunk
which changes getFileCacheHints.
>  >
>  >  Thanks,
>  >  Lohit
>  >
>  >  ----- Original Message ----
>  >  From: Alfonso Olias Sanz <alfonso.olias.sanz@gmail.com>
>  >  To: core-user@hadoop.apache.org; core-dev@hadoop.apache.org
>  >  Sent: Wednesday, March 19, 2008 10:51:08 AM
>  >  Subject: [core] dfs.getFileCacheHints () returns an empty matrix for an existing
file
>  >
>  >  HI there,
>  >
>  >  I am trying to get the hostnames where  a file is contained
>  >       dfs.getFileCacheHints(inFile, 0, 100);
>  >
>  >  But for a reason I cannot guess, some files that are in the HDFS the
>  >  returning String[][] is empty.
>  >
>  >  if I list the file using bin/hadoop -ls path | grep fileName   The file appears.
>  >
>  >  Also I am able to get the FileStatus dfs.getFileStatus(inFile);
>  >
>  >
>  >  What I am trying to do is for a list of files, get the hostnames were
>  >  the files are phisically stored.
>  >
>  >  Thanks
>  >  alfonso
>  >
>  >
>  >
>  >
>  >
>  >
>  >
>
>
>
>

Mime
View raw message