lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: mixing analyzer
Date Mon, 01 Oct 2007 22:04:57 GMT
The whole question of multilingual indexing has been discusses
at length, you might find some ideas if you search the archive...

Erick

On 10/1/07, Dino Korah <dckorah@gmail.com> wrote:
>
> Thanks Erick.
>
> The PerFieldAnalyzerWrapper could fit in but in the current world of
> multilingual anywhere, (even in programming languages.. %$£%#@), almost
> any
> field in an email (addresses, subject, body, attachment filenames, ... )
> document could be multilingual.
>
> I will have a go anyway.
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: 01 October 2007 14:35
> To: java-user@lucene.apache.org
> Subject: Re: mixing analyzer
>
> Sure, but there's a time/space tradeoff. Isn't there always <G>....
>
> PerFieldAnalyzerWrapper is your friend. It would require that your
> index be built on a per-language basis. Say indexing
> text from French documents in a field "french_text", Chinese
> documents in a field chinese_text. You'd construct your query
> something like "chinese_text:blah OR french_text:blah" and
> your PerFieldAnalyzer would just handle things for you.
>
> But this may not fit your problem space ideally....
>
> Best
> Erick
>
>
> On 10/1/07, Dino Korah <dckorah@gmail.com> wrote:
> >
> > Hi,
> >
> >
> >
> > I am working on a lucene email indexing system which potentially can get
> > documents in various languages. Currently I am using StandardAnalyzer,
> > which
> > works for English but not for many of the other languages. One of the
> > requirements for the search interface is that they have to search
> without
> > selecting languages.
> >
> >
> >
> > Is it possible to search across multiple indexes that has been created
> > with
> > different analyzers; ie, when some one search for say "earth" lucene
> will
> > search for "earth" over CJK Analyzed IndexCJK and StandardAnalyzer
> > analyzed
> > IndexSA in one search() call? If not is there a way of combining the
> > result
> > from multiple search() call.
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> > d i n o    k o r a h
> > Tel: +44 795 66 65 283
> > --------------------------------
> > 51°21'52"N  0°5' 14.16"
> >
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message