hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Madhav Sharan <msha...@usc.edu>
Subject Fast way to read thousands of double value in hadoop jobs
Date Thu, 18 Aug 2016 01:32:51 GMT
Hi , can someone please recommend a fast way in hadoop to store and
retrieve matrix of double values?

As of now we store values in text files and the read it in java using HDFS
inputstream and Scanner. *[0]* These files are actually vectors
representing a video file. Each vector is 883 X 200 and for one map job we
read 4 such vectors so *job is to convert 706,400 values to double*.

Using this approach we take ~ 1.5 second to convert all these values. I can
use a external cache server to avoid repeated conversion but I am looking
for a better solution.

[0] -
https://github.com/USCDataScience/hadoop-pot/blob/master/src/main/java/org/pooledtimeseries/PoT.java#L596


--
Madhav Sharan

Mime
View raw message