hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <danielru...@gmail.com>
Subject Re: Fast way to read thousands of double value in hadoop jobs
Date Fri, 19 Aug 2016 06:52:32 GMT
That was the idea :)
Thanks for the update

On Friday, 19 August 2016, Madhav Sharan <msharan@usc.edu> wrote:

> Thanks for your suggestion Daniel. I was already using SequenceFile but my
> format was poor. I was storing file contents as Text in my SeqFile,
>
> So all my map jobs did repeated conversion from Text to double. I resolved
> this by correcting SequenceFile format. Now I store serialised java object
> in SeqFile and my map jobs are faster.
>
> --
> Madhav Sharan
>
>
> On Wed, Aug 17, 2016 at 11:07 PM, Daniel Haviv <danielrulez@gmail.com
> <javascript:_e(%7B%7D,'cvml','danielrulez@gmail.com');>> wrote:
>
>> Store them within a sequencefile
>>
>>
>> On Thursday, 18 August 2016, Madhav Sharan <msharan@usc.edu
>> <javascript:_e(%7B%7D,'cvml','msharan@usc.edu');>> wrote:
>>
>>> Hi , can someone please recommend a fast way in hadoop to store and
>>> retrieve matrix of double values?
>>>
>>> As of now we store values in text files and the read it in java using
>>> HDFS inputstream and Scanner. *[0]* These files are actually vectors
>>> representing a video file. Each vector is 883 X 200 and for one map job we
>>> read 4 such vectors so *job is to convert 706,400 values to double*.
>>>
>>> Using this approach we take ~ 1.5 second to convert all these values. I
>>> can use a external cache server to avoid repeated conversion but I am
>>> looking for a better solution.
>>>
>>> [0] - https://github.com/USCDataScience/hadoop-pot/blob/master/s
>>> rc/main/java/org/pooledtimeseries/PoT.java#L596
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_USCDataScience_hadoop-2Dpot_blob_master_src_main_java_org_pooledtimeseries_PoT.java-23L596&d=DQMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=DhBa2eLkbd4gAFB01lkNgg&m=6105jJkHPEbDi_yojUYcLP3vpvkzg0AV-r1MdgyCG1g&s=PNNdBOT8PCJ4RFaHzF9EYPJaDfjlLKJfyvlIobonBxA&e=>
>>>
>>>
>>> --
>>> Madhav Sharan
>>>
>>>
>

Mime
View raw message