lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Melanie Langlois" <Melanie.Langl...@tradingscreen.com>
Subject RE: indexing rss feeds in multiple languages
Date Thu, 22 Mar 2007 07:36:38 GMT
Well, thanks, sounds like the best option to me. Does anybody use the PerFieldAnalyzerWrapper?
I'm just curious to know if there is any impact on the performances when using different analyzers.


Mélanie 

-----Original Message-----
From: Doron Cohen [mailto:DORONC@il.ibm.com] 
Sent: Thursday, March 22, 2007 3:56 PM
To: java-user@lucene.apache.org
Subject: Re: indexing rss feeds in multiple languages

If language is known also at search time, PerFieldAnalyzerWrapper seems a
nice third option: single document per feed, with a separate field for each
language, additional field(s) for the common data;  using
PerFieldAnalyzerWrapper at both indexing and search;  using FieldSelector
at search to retrieve only the relevant field(s) for matched documents.
(never done this myself though.)
- Doron

"Melanie Langlois" <Melanie.Langlois@tradingscreen.com> wrote on 21/03/2007
23:03:03:

> Hi,
>
>
>
> I saw that there are many post on the mailing list about indexing in
> multiple language, so I will try to not post duplicate question. In
> my case, I want to index rss feeds, so one feed contains several
> items in different languages, and some common data for all the items
> (date, source..).  After reading the different posts, I think I will
> create a document per item, index them in the same index using each
> time a language specific analyzer, and store lang field for specific
> search. But I'm wondering how I should handle the common fields, it
> seems I have two options:
>
> 1 : store the common data in each item. What happen if duplicate
> information are entered, are they duplicate in the index ?
>
>
>
> 2 : create a separate document for the common data. In this case I
> will need to link these data to all underlying items storing some
> ids. The issue is that I would need to search the index twice if the
> search is done only per date, because I would need to retrieve the
> items contents.
>
>
>
> Thank in advance for your help.
>
>
>
> Mélanie
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message