lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gustavo Poll <gkp...@gmail.com>
Subject Re: [Lucene.Net] How to index/search a file name
Date Tue, 06 Sep 2011 19:06:24 GMT
thanks again... Ok, it is not..

standard analyzer:

[name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] [aß?de?]
[??????] [ssß]

UnaccentedWordAnalyzer:

[name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]
[gusioc] [aß?de?] [??????] [ssss]


StandardAnalyzer would be perfect to my application if it was accent
insensitive... Can anyone tell me please, the easiest way to code such
analyzer? (accent insensitive Standard Analyzer)

I hear it is not a good idea to make a class that inherits StandardAnalyzer
cause StandardAnalyzer should be a final class.. Is this coherent?

Appreciate any help please...
Gustavo Poll




2011/9/6 Digy <digydigy@gmail.com>

> A function is worth a thousand words J
>
>
>
>
>
>        void Test()
>
>        {
>
>            Analyzer[] analyzers = new Analyzer[] { new StandardAnalyzer(),
> new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };
>
>            string input = "Name.Surname@gmail.com 123.456 3,5 AT&T
> ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";
>
>
>
>            foreach (Analyzer analyzer in analyzers)
>
>            {
>
>                TokenStream ts = analyzer.TokenStream("", new
> StringReader(input));
>
>                Lucene.Net.Analysis.Token t = ts.Next();
>
>                while (t != null)
>
>                {
>
>                    Console.Write("[" + t.TermText() + "] ");
>
>                    t = ts.Next();
>
>                }
>
>                Console.WriteLine(); Console.WriteLine();
>
>
>
>            }
>
>        }
>
>
>
> DIGY
>
>
>
>
>
> -----Original Message-----
> From: Gustavo Poll [mailto:gkpoll@gmail.com]
> Sent: Tuesday, September 06, 2011 9:00 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] How to index/search a file name
>
>
>
> thanks DIGY, I have interest in that too... Let me see if i understood:
>
>
>
> UnaccentedWordAnalyzer  is like Standard Analyzer, but accent insensitive?
>
>
>
> Thanks!
>
> Gustavo Poll
>
>
>
>
>
> 2011/9/6 digy digy <digydigy@gmail.com>
>
>
>
> > That may help
>
> >
>
> > UnaccentedWordAnalyzer @
>
> >
>
> >
> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs
>
> >
>
> >
>
> > DIGY
>
> >
>
> > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <floyd.wu@gmail.com> wrote:
>
> >
>
> > > Hi everyone,
>
> > >
>
> > > I have a question that annoying me many times. my situation is that I
>
> > need
>
> > > to index file name and need to be searchable using partial file name.
>
> > >
>
> > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)
>
> > >
>
> > > When I shot queries
>
> > >
>
> > > filename:ABCD    no match return.
>
> > >
>
> > > filename:2010Q2_ABCD     match
>
> > >
>
> > > filename:Report*    match
>
> > >
>
> > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current
>
> > > filename
>
> > > field is set to tokenized/indexed/store
>
> > >
>
> > > What I want is when user type any part of file name that lucene.Net can
>
> > > match.
>
> > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)
>
> > >
>
> > > Please help on this or kindly direct me a way to solve it.
>
> > >
>
> > > Floyd
>
> > >
>
> >
>
>
>
> -----
>
> Bu iletide virüs bulunamadı.
>
> AVG tarafından kontrol edildi - www.avg.com
>
> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message