hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <danielru...@gmail.com>
Subject Re: Fast way to read thousands of double value in hadoop jobs
Date Thu, 18 Aug 2016 06:07:13 GMT
Store them within a sequencefile

On Thursday, 18 August 2016, Madhav Sharan <msharan@usc.edu> wrote:

> Hi , can someone please recommend a fast way in hadoop to store and
> retrieve matrix of double values?
>
> As of now we store values in text files and the read it in java using HDFS
> inputstream and Scanner. *[0]* These files are actually vectors
> representing a video file. Each vector is 883 X 200 and for one map job we
> read 4 such vectors so *job is to convert 706,400 values to double*.
>
> Using this approach we take ~ 1.5 second to convert all these values. I
> can use a external cache server to avoid repeated conversion but I am
> looking for a better solution.
>
> [0] - https://github.com/USCDataScience/hadoop-pot/
> blob/master/src/main/java/org/pooledtimeseries/PoT.java#L596
>
> --
> Madhav Sharan
>
>

Mime
View raw message