lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Smith" <PSm...@tenfold.com>
Subject Re: Re[2]: multi word synonym (was Hungarian notation analyzer and phrase queries)
Date Fri, 29 Apr 2005 16:47:08 GMT
Indexing every multi-word synonym as a single token would introduce
spaces into the tokens. In that case searching for (java) would not
match "i love jsp and tomcat". I think that searching for (java*) would
match.

Rewriting the query is also problematic. If you search for (java
server), you don't have a rule to rewrite it to help you find jsp.

Paul

>>> sven.duzont@keljob.com 04/27/05 05:51AM >>>
Hello,

What about the solution to index every multi-word synonym as a single
token ?
Example :
Phrase to index : "i love jsp and tomcat"
Synonyms        : "jsp" = "java server pages" = "javaserver pages"
Tokens          : i love jsp               and tomcat
                         java server pages
                         javaserver pages
Position        : 0 1    2                 3   4

This solution will have the advantage to solve the phrase query
problems.

One will have also to rewrite queries before parsing it with the
QueryParser. for instance the query (tomcat jsp) will be rewrited as
(tomcat (jsp OR "java server pages" OR "javaserver pages"))

Any thoughts ?
Thanks in advance

---
 Sven





mercredi 13 avril 2005, 19:36:44, vous avez écrit:


CH> : Another approach would be to index this as:
CH> :
CH> : token:       use   power      query for advanced searches
CH> :                     powerquery
CH> : position:    0     1          2     3   4        5
CH> :
CH> : Then use phrase queries with slop=1, to permit a one-token gap
when
CH> : someone searches for "use powerquery for advanced searches".

CH> right, but in your example "1" is a magic number that works because
we are
CH> only dealing with "multi-word synonyms" of 2 words.  in general,
this
CH> approach requires that you pick some some N such that you are
garunteed no
CH> synonym contains more then N-1 words, and set the token positions
to...

CH>  token:       use   powerquery         for  advanced  searches
CH>                     power      query
CH>  position:    0     N          N+1     2N   3N        4N





CH> -Hoss


CH>
---------------------------------------------------------------------
CH> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
CH> For additional commands, e-mail: java-user-help@lucene.apache.org 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org 
For additional commands, e-mail: java-user-help@lucene.apache.org 


http://www.tenfold.com

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the TenFold Postmaster (postmaster@tenfold.com).
**********************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message