hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3665) WritableComparator newKey() fails for NullWritable
Date Tue, 01 Jul 2008 07:05:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Douglas updated HADOOP-3665:
----------------------------------

    Attachment: 3665-0.patch

bq. The whole point is that I would like to understand how Reduce job can output a file without
any key values in it. The NullWritable seemed to be an ideal candidate for this but unfortunately
I ran into exceptions when trying it. So I made a quick and dirty fix which is not meant to
be a production ready (obviously NullWritable should not be special-cased in any way!).

I'm sorry, I hadn't understood this. If you only want to output null keys from your reduce,
then the RecordWriter used by your OutputFormat can encode or ignore null keys (e.g. TextOutputFormat).
SequenceFiles, as you discovered, explicitly disallow zero-length keys, so you'll need to
pick a different binary file format to store output records. Glancing at the code, this constraint
is inconsistently enforced, and not for any particular reason that I can discern. Adapting
SequenceFile to handle zero-length keys might be as simple as allowing zero-length keys from
the Writers, since the Reader looks like it could handle it.

bq. On the other hand there seemed to be some questions which need to be asked and possible
addressed. One of them is that ReflectionUtils is able to call any constructor after setAccessible
is set to true but is this what we really want for singleton keys? And do we really need singleton
keys at all? (I believe the answer is positive).

There's already a fair amount of object reuse. We need an object to deserialize into per the
Writable contract, so a registration system like the one in WritableComparator would be necessary
in ReflectionUtils to make singletons work (i.e. a map of classes to instances checked before
the map of classes to constructors). Other than NullWritable, all of the sane use cases I
can think of are just badly designed, but there are likely good ones.

bq. How about size (length) of key value? Is it allowed to be zero?

It depends on where in the framework you're looking. The OutputFormat defines how to encode/handle
null/NullWritable keys from the reduce (or the map if you're running without reduces). In
0.17, intermediate data is stored in SequenceFiles, so zero-length keys can't be emitted from
the map. In 0.18, zero-length keys are supported, but their semantics are kind of odd. In
most cases, emitting NullWritable keys from the map is not a scalable design.

bq. And why WritableComparato calls to newInstance method while this causes issues with any
class having non-public constructor?

Most WritableComparable types use RawComparator, which provides much better performance while
rendering this consideration irrelevant. Unfortunately, WritableComparator creates new instances
of its internal keys whether it requires them or not! This is easily remedied. This patch
does the following:

* No longer creates instances of the WritableComparable in WritableComparator when a class
has registered a WritableComparator (neither does it create a buffer). This makes super.compare(byte[],
off1, len1, byte[], off2, len2) illegal, but I doubt this is a problem. Though one could imagine
a situation where a raw comparator attempts an efficient comparison but uses the slow comparator
when the result is ambiguous, such a comparator is easily adapted.
* Lets WritableComparators be configurable, so WritableComparable objects not defining RawComparators
are still configured before being compared
* Defines a raw comparator for NullWritable
* Changes checks in SequenceFile Writer classes to check only for key lengths less than zero;
this doesn't require any changes to the Reader, which already supports zero-length keys, so
the SequenceFile version doesn't need to be adjusted, either.
* Adds a test case for reading/writing NullWritable keys.

> WritableComparator newKey() fails for NullWritable
> --------------------------------------------------
>
>                 Key: HADOOP-3665
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3665
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: io
>    Affects Versions: 0.16.0, 0.16.1, 0.16.2, 0.16.3, 0.16.4, 0.17.0
>         Environment: n/a
>            Reporter: Lukas Vlcek
>            Priority: Minor
>             Fix For: 0.19.0
>
>         Attachments: 3665-0.patch, HADOOP-3665.path
>
>
> It is not possible to use NullWritable as a key in order to suppress key value in output.
> Syndrome exception:
> Caused by: java.lang.IllegalAccessException: Class org.apache.hadoop.io.WritableComparator
can not access a member of class org.apache.hadoop.io.NullWritable with modifiers "private"
> The problem is that NullWritable is a singleton and does not provide public non-parametric
constructor. The following code in WritableComparator causes the exception: return (WritableComparable)keyClass.newInstance();
> Proposed simple solution is to use ReflectionUtils instead (it requires modification
as well).
> This issue is probably related to HADOOP-2922

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message