hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <chri...@yahoo-inc.com>
Subject Re: WritableComparators and thread safety
Date Wed, 03 Sep 2008 23:46:31 GMT
WritableComparator subclasses that rely on deserialization for each  
compare are already not threadsafe, but if you don't use the static  
define, you'll get a new instance for each call to get, but *only for  
the ordering defined by the WritableComparable type.* This is the  
intent, I think: that the comparators registered with the cache in  
WritableComparator are low-level comparators matching the semantics of  
the WritableComparable type.

Other comparators with state are odd beasts; if it is required, then a  
ThreadLocal instance of whatever you need is probably easier to reason  
about. If you just want to change the ordering for the sort, the  
registry doesn't enter into it. As a side-node (I think you're clear  
on this), JobConf will return the same instance if it comes from the  
registry. It only creates new instances for user-defined comparators.

For the (vast, vast) majority of RawComparators, there shouldn't be  
any state, so sharing an instance is a modest, but clear win. Is there  
something special about your use case that makes a new instance  
necessary? Is there a particular reason why a ThreadLocal instance of  
its members wouldn't be more appropriate? Can you define your  
serialized format in a way that would make defining the raw compare()  
simpler? -C

On Sep 3, 2008, at 6:53 AM, Igor Maximchuk wrote:

> Hello,
> We have experienced thread safety issues when subclassing  
> WritableComparator and subsequently calling WritableComparator.define.
> It seems that same comparator instance has its methods called from  
> different threads. This makes the following code failing from time  
> to time due to race condition
> public class PComparator extends WritableComparator {
>  ....
>  private DataInputBuffer buf = new  DataInputBuffer();
>   public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2,  
> int l2) {
>        ... [do something that modifies buf]
>   }
> }
> ...
> static {
>       WritableComparator.define(PKey.class, new PComparator());
> }
> So, no member variables can be modified in compare without the  
> explicit synchronization.
> On the other hand, when comparator class is specified in  JobConf,  
> each thread recieives it's own instance and no race condition occurs.
> I think this case should be mentioned in documentation
> I can also suggest to implement another version  
> WritableComparator.define(Class c, Class comparatorClass) which  
> registers the comparaator class, not comparator class instance and  
> make WritableComparator.get to instantiate comparators registered in  
> such a way and declare old defile method deprecated

View raw message