lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthijs Bierman <matth...@impressie.nl>
Subject Re: get original term for synonym
Date Thu, 15 Nov 2007 14:13:03 GMT
Hi Mark,

I have solved it in another way now. I've created my own implementation 
of StandardAnalyzer (which I've called AdvancedAnalyzer). This analyzer 
keeps the word "zone-indeling" together, so users can simply search for 
this term and it will be highlighted exactly as is. These compound words 
occur in Dutch only I presume.

The problem was my highlighter was for a PDF document (through PDFBox). 
This would highlight 1,2 and "indeling". Resulting in unwanted behaviour 
when the word was split into zone and indeling.

Thanks for your help though.

Cheers,
Matthijs

mark harwood wrote:
> It would be useful to have more details about the query input and the expected highlights
you want.
>
> So given your 'zone-indeling' example document and the index-time tokenisation you described,
which of the following queries would you expect to match and what would you want highlighted
in each case?
> 1) zone
> 2) zone-indeling
> 3) "zone indeling"
> 4) zone-somethingElse
>
>
> My assumption here is that you are using the standard Lucene Query parser and that query
3 will therefore be a phrase query. 
>
> Cheers
> Mark
>
>
> ----- Original Message ----
> From: Matthijs Bierman <matthijs@impressie.nl>
> To: java-user@lucene.apache.org
> Sent: Wednesday, 14 November, 2007 11:51:07 AM
> Subject: Re: get original term for synonym
>
> Hi Mark,
>
> Your solution would be correct if the synonym would be a true 2-way
> synonym. Unfortunately this is not the case. My analyzer takes care of
> decomposition of specific Dutch words (where a "-" is used to create
> compound words). For example: 'zone-indeling' would create synonyms for
> 'zone'-> 'zone-indeling' and 'indeling'->'zone-indeling'.
> When analyzing 'zone' it will therefore not point back to
> 'zone-indeling' (this information is simply not available). Putting all
> the results from the indexing process into a file or lucene document
> (thus creating a 'lookup' index) would probably make the lookup process
> rather slow, or make application startup too long (due to HashMap
> generation).
>
> Maybe you can do something with offsets?
>
> Thanks,
> Matthijs
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
>
>       ___________________________________________________________
> Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
> now.
> http://uk.answers.yahoo.com/ 
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message