hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruba Borthakur" <dhr...@gmail.com>
Subject Re: Question on opening file info from namenode in DFSClient
Date Sat, 08 Nov 2008 13:14:18 GMT
Hi Taeho,

Thanks for ur explanation. If your application opens a dfs file and
does not close it, then the dfsclient will automatcally keep block
locations cached. So, you could achieve your desired goal by
developing a cache layer (above HDFS) that does not close the hdfs
file even if the user has closed it. This cache-layer needs to manage
this cache-pol of HDFS fle handles.

does this help?
thanks,
dhruba




On Fri, Nov 7, 2008 at 12:53 AM, Taeho Kang <tkang1@gmail.com> wrote:
> Hi, thanks for your reply Dhruba,
>
> One of my co-workers is writing a BigTable-like application that could be
> used for online, near-real-time, services. So since the application could be
> hooked into online services, there would times when a large number of users
> (e.g. 1000 users) request to access few files in a very short time.
>
> Of course, in a batch process job, this is a rare case, but for online
> services, it's quite a common case.
> I think HBase developers would have run into similar issues as well.
>
> Is this enough explanation?
>
> Thanks in advance,
>
> Taeho
>
>
>
> On Tue, Nov 4, 2008 at 3:12 AM, Dhruba Borthakur <dhruba@gmail.com> wrote:
>
>> In the current code, details about block locations of a file are
>> cached on the client when the file is opened. This cache remains with
>> the client until the file is closed. If the same file is re-opened by
>> the same DFSClient, it re-contacts the namenode and refetches the
>> block locations. This works ok for most map-reduce apps because it is
>> rare that the same DSClient re-opens the same file again.
>>
>> Can you pl explain your use-case?
>>
>> thanks,
>> dhruba
>>
>>
>> On Sun, Nov 2, 2008 at 10:57 PM, Taeho Kang <tkang1@gmail.com> wrote:
>> > Dear Hadoop Users and Developers,
>> >
>> > I was wondering if there's a plan to add "file info cache" in DFSClient?
>> >
>> > It could eliminate network travelling cost for contacting Namenode and I
>> > think it would greatly improve the DFSClient's performance.
>> > The code I was looking at was this
>> >
>> > -----------------------
>> > DFSClient.java
>> >
>> >    /**
>> >     * Grab the open-file info from namenode
>> >     */
>> >    synchronized void openInfo() throws IOException {
>> >      /* Maybe, we could add a file info cache here! */
>> >      LocatedBlocks newInfo = callGetBlockLocations(src, 0, prefetchSize);
>> >      if (newInfo == null) {
>> >        throw new IOException("Cannot open filename " + src);
>> >      }
>> >      if (locatedBlocks != null) {
>> >        Iterator<LocatedBlock> oldIter =
>> > locatedBlocks.getLocatedBlocks().iterator();
>> >        Iterator<LocatedBlock> newIter =
>> > newInfo.getLocatedBlocks().iterator();
>> >        while (oldIter.hasNext() && newIter.hasNext()) {
>> >          if (!
>> oldIter.next().getBlock().equals(newIter.next().getBlock()))
>> > {
>> >            throw new IOException("Blocklist for " + src + " has
>> changed!");
>> >          }
>> >        }
>> >      }
>> >      this.locatedBlocks = newInfo;
>> >      this.currentNode = null;
>> >    }
>> > -----------------------
>> >
>> > Does anybody have an opinion on this matter?
>> >
>> > Thank you in advance,
>> >
>> > Taeho
>> >
>>
>

Mime
View raw message