mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject By default, do we want to compress DRM outputs in any way?
Date Sat, 03 Sep 2011 07:41:29 GMT
Per above.

I noticed i do ask for compression of results and intermediate data.
(more of a programming reflex really than any motivated decision).

But for data such as vectors, assuming sparse vectors are used where
appropriate, compression is not going to win much.

On the other hand, if native libraries are enabled, default GZIP codec
does not cost much compared to computations etiher.

And a third option, maybe we shouldn't put any defaults in at all and
leave it for -D options. Which i see as somewhat a problem since
hadoop somewhat tries to encapsulate those properties in static
methods of classes such as  FileOutputFormat, which may imply that the
property names are not meant to be part of any user contract and just
implementation details of a concrete file format.

I am leaning towards enforcing no compression by default.

View raw message