crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly
Date Fri, 24 Jan 2014 08:49:40 GMT


Gabriel Reid commented on CRUNCH-329:

I've been thinking about this one some more, and I guess I've come back to liking your idea
of having a way of registering serialization codes explicitly (but not doing it implicitly).
The TupleWritables aren't reusable on their own (without a PType) as it is, so I think I was
worrying too much about being able to read a previously-created file of TupleWritables.

To summarize the idea I've got in my head (which was actually your idea I think):
* core primitive types are serialized as themselves (as it's currently implemented in the
* you can explicitly register a code for a custom type, but you're not required to, and it
doesn't happen implicitly. This registering could require a Configuration object, and then
the custom-registered types would just be stored in the Configuration instead of a static
in Writables.
* types that don't have a registered code are just serialized as BytesWritable implicitly
* trying to configure a TupleWritableComparator on a tuple field that is implicitly serialized
as BytesWritable fails fast with a message about the use of Writables.setCode()

Sound good?

> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>                 Key: CRUNCH-329
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.10.0, 0.8.3
>         Attachments: fix-ss-writables.patch
> Secondary sorts aren't currently working correctly for Writable types after we hacked
the TupleWritable impl to make all of the fields BytesWritables (e.g., secondary IntWritable
values will no longer be sorted correctly, even though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for each possible
WritableComparable type in a pipeline that we can use to decode what Writable type each tuple
field corresponds to. This allows us to keep the various fields sortable while still doing
a reasonable job of minimizing the serialization required to pass the type information along.

This message was sent by Atlassian JIRA

View raw message