lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-460) hashCode improvements
Date Sun, 30 Oct 2005 15:04:55 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-460?page=comments#action_12356325 ] 

Yonik Seeley commented on LUCENE-460:
-------------------------------------

A couple of guidelines off the top of my head...
 - hash codes should strive to be unique across the Query hierarchy, not just unique within
one specific subclass.  For example, TermQuery(t) and SpanTermQuery(t) will generate the exact
same hash codes.
- mix bits between different components that have any hashCode parts in common... 
   for example RangeQuery will produce the same hashCode whenever lowerTerm==upperTerm.
   Also, field[x TO y] will produce the same hashCode for *any* field since the fieldname
parts of the
  terms will always cancel eachother out.  This will also cause the hashCode of field{x TO
x} to equal field:x
  The hashCode of FilteredQuery will also cause many collisions because the bits aren't mixed
inbetween
   the query and the filter.
  Remember that every query as a boost component... never just xor two query hashCodes together.
- make things position dependent.
  Currently, field[x TO y] will produce the same hasCode as field[y TO x]... not particularly
important for RangeQuery, but
   you get the idea. 
- don't be afraid of using "+" instead of "^".  They both take a single CPU cycle, but "+"
is not quite so easily (accidentally) reversed.
- flipping more than a single bit when hashing a boolean might be a good idea - it will make
collisions harder.

http://www.concentric.net/~Ttwang/tech/inthash.htm is an interesting link on integer hash
codes (what we are in effect doing when we combine multiple hash codes).  Esp interesting
is the section "Parallel Operations"

> hashCode improvements
> ---------------------
>
>          Key: LUCENE-460
>          URL: http://issues.apache.org/jira/browse/LUCENE-460
>      Project: Lucene - Java
>         Type: Improvement
>   Components: Search
>     Versions: CVS Nightly - Specify date in submission
>     Reporter: Yonik Seeley
>     Priority: Minor
>      Fix For: CVS Nightly - Specify date in submission

>
> It would be nice for all Query classes to implement hashCode and equals to enable them
to be used as keys when caching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message