hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject Re: fine granularity operation on HDFS
Date Thu, 28 Jan 2010 13:56:49 GMT
Thanks Amogh.

For the second part of my question, I actually mean loading block separately from HDFS. I
don't know whether it is realistic. Anyway, for my goal is to process different division of
a file separately, to do that at split level is OK. But even I can get the splits from inputformat,
how to "add only a few splits you need to mapper and discard the others"? (pathfilters only
works for file, but not block, I think).

Thanks.
-Gang


----- 原始邮件 ----
发件人: Amogh Vasekar <amogh@yahoo-inc.com>
收件人: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
发送日期: 2010/1/27 (周三) 1:40:26 下午
主   题: Re: fine granularity operation on HDFS

Hi,
>>now that I can get the splits of a file in hadoop, is it possible to name some splits
(not all) as the input to mapper?
I'm assuming when you say "splits of a file in hadoop" you mean splits generated from the
inputformat and not the blocks stored in HDFS.
The [File]InputFormat you use gives you access to splits, locations etc. You can use this
to add only a few splits you need to mapper and discard the others ( something you can do
on files as a whole using PathFilters ).

>>Or can I manually read some of these splits (not the whole file) using HDFS api?
You mean you list these splits somewhere in a file beforehand so individual mappers can read
one line (split) ?

Amogh


      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/

Mime
View raw message