hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: multi-dimensional array storage
Date Wed, 28 Mar 2012 06:20:29 GMT
Hey, besides HBase you can use SequenceFiles, they have Key/Value pairs.
So normally you use somekind of <VectorWritable, NullWritable> pairs,
VectorWritable is for example in mahout. They have a good math package for
sparse and dense vectors.

If you don't want vector classes then you can use ArrayWritable for dense
and MapWritable for sparse data.
It depends also on what you're doing with your data, so if you have more
information about the algorithm, we can give you a better suggestion ;)

Am 28. März 2012 00:51 schrieb Edward J. Yoon <edwardyoon@apache.org>:

> Hi,
> I believe that HBase is the best way to store multi-dimensional
> arrays. HBase provides storage efficiencies as number of dimensions
> grow, ordering capability, and also allows you to record and access
> data corrections and updates directly via HBase client library.
> Another option is use of SequenceFile and MapFile. Once data loaded to
> the program initially, your math operations can run directly in memory
> and and synchronized using a standard BSP APIs.
> Thanks.
> On Wed, Mar 28, 2012 at 12:46 AM, Noah Watkins <jayhawk@cs.ucsc.edu>
> wrote:
> > Hi Hama list,
> >
> > I'm interested in using Hama to process large multi-dimensional arrays
> (sparse and dense). What is the best way to store and represent this type
> of data for processing in Hama?
> >
> > Thanks,
> > Noah
> --
> Best Regards, Edward J. Yoon
> @eddieyoon

Thomas Jungblut
Berlin <thomas.jungblut@gmail.com>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message