hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: How to do Secondary Sort on a String and a float?
Date Sun, 26 Dec 2010 08:37:28 GMT
Hi,

You can use WritableComparator for "Writable" serializations. Docs
here: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/WritableComparator.html

The issue lies with how you're encoding your pair of <String, Float>.
If you know sizes defined for each (or have a marker byte between,
etc.), you can extract the bytes out of the required object alone
(String or Float) and use the compareBytes function on it. The "s1 &
s2" define start points, and "l1 and l2" define lengths to read from
"s1 & s2" points -- on the passed byte[] arrays for the two "Writable"
objects.

You can also, perhaps, de-serialize the whole byte stream (via your
Writable.readFields()) and then compare object-wise -- but this would
make it slow, since byte-to-byte comparisions are faster, hence
RawComparator.

Avro has a neat serialization, I prefer using it over plain Writables.
Working with a "Schema" is much more easier.

-- 
Harsh J
www.harshj.com

Mime
View raw message