lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <jan.kure...@nokia.com>
Subject RE: custom attributs in tokens
Date Wed, 24 Nov 2010 08:12:51 GMT
Of course:

We are trying to search in documents that contain text in several languages. We are also investigating
other approaches*, so this is not about finding other variants.
the goal is to only match tokens from 1 or more given languages and not to match the token
if it is by accident the same in another language.

For the payloads my plan is to add the correct language to each and every token during indexing
(I'm not sure how to solve this best, but I'm sure this can be solved at least with lucene
directly).
On search side my current idea is to wrap around a TermPosition and skip all docs, where the
current payload has not one of the requested languages.
I probably need to use my own Query/Weight for this?
Another approach would be to just overwrite the Similarity, but this will only influence scoring
and depending on the underlying query not completely skip the token - I have to test the difference
for the final score between this approaches.

This one blog made me curious if there is already something similar, that skips TermPositions
based on given attributes? I could imagine something similar to the current Tokenattribute
concept during index time, but also available during search and controlled by a similarity...

Jan

-----Original Message-----
From: ext Simon Willnauer [mailto:simon.willnauer@googlemail.com] 
Sent: Dienstag, 23. November 2010 17:50
To: java-user@lucene.apache.org
Subject: Re: custom attributs in tokens

On Tue, Nov 23, 2010 at 4:50 PM,  <jan.kurella@nokia.com> wrote:
> Yes, payloads I will use. But they perform at score time and not at search time. I just
wanted to know if there is anything like that.
>
So what is the difference? Maybe you can elaborate a little what are
you trying to do?

simon
> "not even on trunk" does this mean there is a discussion about this ongoing somewhere?
I'm just curious.
>
> Jan
>
> -----Original Message-----
> From: ext Simon Willnauer [mailto:simon.willnauer@googlemail.com]
> Sent: Dienstag, 23. November 2010 16:44
> To: java-user@lucene.apache.org
> Subject: Re: custom attributs in tokens
>
> Attribute Serialization is not implemented yet, not even in trunk. You
> can use payloads instead.
>
> Simon
>
> On Tue, Nov 23, 2010 at 2:43 PM,  <jan.kurella@nokia.com> wrote:
>> Hi,
>>
>> I found a blog post from 2008 where it says, there will be additional custom attributes
for tokens in the future, that will be searchable.
>> What is the status of these?
>>
>> Jan
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message