hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wang Zhong <wangzhong....@gmail.com>
Subject Re: How to write large string to file in HDFS
Date Wed, 29 Apr 2009 07:48:50 GMT
You can try using FSDataOutputStream in reduce phase. Create a file
with FSDataOutputStream by the method below:

====
FileSystem fs = FileSystem.get(conf);
OutputStream os = fs.create(path);
os.writeChars(str);
====

You should call writeChars in each iteration of your values but not
use a StringBuffer. The key should be part of your file name to
indicate the group of URIs.


On Wed, Apr 29, 2009 at 2:56 PM, nguyenhuynh.mr
<nguyenhuynh.mr@gmail.com> wrote:
> Wang Zhong wrote:
>
>> Where did you get the large string? Can't you generate the string one
>> line per time and append it to local files, then upload to HDFS when
>> finished?
>>
>> On Wed, Apr 29, 2009 at 10:47 AM, nguyenhuynh.mr
>> <nguyenhuynh.mr@gmail.com> wrote:
>>
>>> Hi all!
>>>
>>>
>>> I have the large String and I want to write it into the file in HDFS.
>>>
>>> (The large string has >100.000 lines.)
>>>
>>>
>>> Current, I use method copyBytes of class org.apache.hadoop.io.IOUtils.
>>> But the copyBytes request the InputStream of content. Therefore, I have
>>> to convert the String to InputStream, some things like:
>>>
>>>
>>>
>>>    InputStream in=new ByteArrayInputStream(sb.toString().getBytes());
>>>
>>>    The "sb" is a StringBuffer.
>>>
>>>
>>> It not work with the command line above. :(
>>>
>>> There is the error:
>>>
>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>    at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
>>>    at java.lang.StringCoding.encode(StringCoding.java:272)
>>>    at java.lang.String.getBytes(String.java:947)
>>>    at asnet.haris.mapred.jobs.Test.main(Test.java:32)
>>>
>>>
>>>
>>> Please give me the good solution!
>>>
>>>
>>> Thanks,
>>>
>>>
>>> Best regards,
>>>
>>> Nguyen,
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
> Thanks for your answer!
>
> I have Map/Reduce job. It partition URI from HBase into groups URIs.
> In the map phase, get group name of the URI and collect output
> <groupname, uri>.
> In the reduce phase, I get the String (URIs of the partition) and save
> into HDFS.
> Each group is a file.
>
> Thanks,
>
> Best regards,
> NguyenHuynh.
>
>



-- 
Wang Zhong

Mime
View raw message