hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: How to perform FILE IO with Hadoop DFS
Date Mon, 05 May 2008 20:17:54 GMT

Keep in mind that many applications can do without real append if they don't
have massive reliability requirements.  Just accumulate data on the side and
burp it into HDFS periodically.  Then on some longer time scale accumulate
your data burps into a full sized data belch.  The cost is surprisingly low
and the effect is very similar to appends (with more complex writing and
reading as the compensatory pain).

Hbase does something similar.


On 5/5/08 1:03 PM, "vikas" <pvssvikas@gmail.com> wrote:

> Thank you very much for the right link. It really helped. As many others
> even I'm waiting for
> "Append to files in HDFS"
> 
> Is there any thing which I can do to raise its priority. Does HADOOP
> Developer community is  tracking any request counter for a particular
> feature to raise ones priority. if that is the case I would like to add my
> vote to this :)
> 
> I've registered to mailing list .. and that gives me previlage of creating
> JIRA and watching on one. can you tell me how I get into developer community
> so that if time permits even I can contribute by discussion or code.
> 
> Best regards,
> -Vikas
> 
> 
> On Mon, May 5, 2008 at 9:43 PM, Arun C Murthy <arunc@yahoo-inc.com> wrote:
> 
>> 
>> On May 4, 2008, at 6:27 PM, vikas wrote:
>> 
>>  Hi All,
>>> 
>>> I was looking for, how multiple inputs can be written to same output
>>> that
>>> too at different intervals of time ( ie.. I want to re-open the same
>>> file to
>>> append data to it )
>>> 
>>> This link did not contain any thing related to my Q.
>>> http://issues.apache.org/jira/browse/HADOOP-3149. May be you wanted to
>>> suggest me another link.  The above link has the details of how an input
>>> can
>>> be written to multiple output files.
>>> 
>>> 
>> My apologies, the correct link is
>> http://issues.apache.org/jira/browse/HADOOP-1700
>> - a copy/paste error.
>> 
>>  Is any one working o develop the usability of DFS it would be really
>>> effective if DFS operations are allowed on top of which we can use
>>> map-reduce functionality.
>>> 
>>> Please correct me if I'm assuming the programming model differently. as
>>> of
>>> know it looks as if I need to write a separate application  to collect
>>> the
>>> input & then store it in HADOOP so that it can be processed on
>>> multi-node
>>> cluster.
>>> 
>>> 
>> Yes, you will need to 'load' data onto HDFS and then run Map-Reduce
>> programs on it.
>> 
>> However, the input to your Map-Reduce program can be a 'directory', thus
>> you can load data into the same directory periodically as separate files and
>> then when you have all the data, process them.
>> 
>>  I feel it would be more good if one can directly store data to DFS which
>>> can
>>> be processed. updating the same file will give me an opportunity to
>>> avoid
>>> multiple small files and a redundant task of merging them to a different
>>> file.
>>> 
>>> 
>> This is a relevant problem and we are currently developing the notion of
>> 'archives' to get over multiple small files (which place a fair bit of  load
>> on the NameNode).
>> http://issues.apache.org/jira/browse/HADOOP-3307 (I'm pretty sure the link
>> _is_ right this time around... *smile*)
>> 
>> Arun
>> 
>> 
>> 
>>  Thank you very much for your time,
>>> 
>>> -Vikas
>>> 
>>> On Mon, May 5, 2008 at 12:43 AM, Arun C Murthy <arunc@yahoo-inc.com>
>>> wrote:
>>> 
>>>  Vikas,
>>>> 
>>>> On May 4, 2008, at 7:51 AM, vikas wrote:
>>>> 
>>>>  Hi All,
>>>> 
>>>>> 
>>>>> can any one please help me with the technique of writing to the same
>>>>> file in
>>>>> DFS of Hadoop.
>>>>> 
>>>>> I want to perform insertion, deletion and update on the file of my
>>>>> DFS.
>>>>> 
>>>>> 
>>>>>  HDFS doesn't support file-updates, once written (i.e. after the
>>>> file is
>>>> 'closed') the file is immutable.
>>>> 
>>>> Appends to a file are coming soon: http://issues.apache.org/jira/
>>>> browse/HADOOP-3149.
>>>> 
>>>> Arun
>>>> 
>>>> 
>>>> 
>> 


Mime
View raw message