lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Libbrecht <p...@activemath.org>
Subject Re: Re[2]: multi word synonym (was Hungarian notation analyzer and phrase queries)
Date Fri, 29 Apr 2005 19:58:54 GMT
I knew there was a catch...

I do think, however, that the point is a delicate one which would 
consideration: multi-word synonyms are quite common!

paul


Le 29 avr. 05, à 18:47, Paul Smith a écrit :

> Indexing every multi-word synonym as a single token would introduce
> spaces into the tokens. In that case searching for (java) would not
> match "i love jsp and tomcat". I think that searching for (java*) would
> match.
>
> Rewriting the query is also problematic. If you search for (java
> server), you don't have a rule to rewrite it to help you find jsp.
>
> Paul
>
>>>> sven.duzont@keljob.com 04/27/05 05:51AM >>>
> Hello,
>
> What about the solution to index every multi-word synonym as a single
> token ?
> Example :
> Phrase to index : "i love jsp and tomcat"
> Synonyms        : "jsp" = "java server pages" = "javaserver pages"
> Tokens          : i love jsp               and tomcat
>                          java server pages
>                          javaserver pages
> Position        : 0 1    2                 3   4
>
> This solution will have the advantage to solve the phrase query
> problems.
>
> One will have also to rewrite queries before parsing it with the
> QueryParser. for instance the query (tomcat jsp) will be rewrited as
> (tomcat (jsp OR "java server pages" OR "javaserver pages"))
>
> Any thoughts ?
> Thanks in advance
>
> ---
>  Sven
>
>
>
>
>
> mercredi 13 avril 2005, 19:36:44, vous avez écrit:
>
>
> CH> : Another approach would be to index this as:
> CH> :
> CH> : token:       use   power      query for advanced searches
> CH> :                     powerquery
> CH> : position:    0     1          2     3   4        5
> CH> :
> CH> : Then use phrase queries with slop=1, to permit a one-token gap
> when
> CH> : someone searches for "use powerquery for advanced searches".
>
> CH> right, but in your example "1" is a magic number that works because
> we are
> CH> only dealing with "multi-word synonyms" of 2 words.  in general,
> this
> CH> approach requires that you pick some some N such that you are
> garunteed no
> CH> synonym contains more then N-1 words, and set the token positions
> to...
>
> CH>  token:       use   powerquery         for  advanced  searches
> CH>                     power      query
> CH>  position:    0     N          N+1     2N   3N        4N
>
>
>
>
>
> CH> -Hoss
>
>
> CH>
> ---------------------------------------------------------------------
> CH> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> CH> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> http://www.tenfold.com
>
> **********************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the TenFold Postmaster (postmaster@tenfold.com).
> **********************************************************************
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message