lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Digy" <digyd...@gmail.com>
Subject RE: [Lucene.Net] How to index/search a file name
Date Tue, 06 Sep 2011 20:04:09 GMT
That can be a starting point (Just play a little bit with with tokenizers & filters )

 

    public class ModifiedStandardAnalyzer : Analyzer

    {

        public override TokenStream TokenStream(System.String fieldName, System.IO.TextReader
reader)

        {

            StandardTokenizer tokenStream = new StandardTokenizer(reader, true);

            TokenStream result = new StandardFilter(tokenStream);

            result = new LowerCaseFilter(result);

            result = new ASCIIFoldingFilter(result);

            return result;

        }

    }

 

DIGY

 

-----Original Message-----
From: Gustavo Poll [mailto:gkpoll@gmail.com] 
Sent: Tuesday, September 06, 2011 10:06 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] How to index/search a file name

 

thanks again... Ok, it is not..

 

standard analyzer:

 

[name.surname@gmail.com] [123.456] [3,5] [at&t] [güsıöç] [güsiöç] [aß?de?]

[??????] [ssß]

 

UnaccentedWordAnalyzer:

 

[name] [surname] [gmail] [com] [123] [456] [3] [5] [at] [t] [gusioc]

[gusioc] [aß?de?] [??????] [ssss]

 

 

StandardAnalyzer would be perfect to my application if it was accent

insensitive... Can anyone tell me please, the easiest way to code such

analyzer? (accent insensitive Standard Analyzer)

 

I hear it is not a good idea to make a class that inherits StandardAnalyzer

cause StandardAnalyzer should be a final class.. Is this coherent?

 

Appreciate any help please...

Gustavo Poll

 

 

 

 

2011/9/6 Digy <digydigy@gmail.com>

 

> A function is worth a thousand words J

> 

> 

> 

> 

> 

>        void Test()

> 

>        {

> 

>            Analyzer[] analyzers = new Analyzer[] { new StandardAnalyzer(),

> new Lucene.Net.Analysis.Ext.UnaccentedWordAnalyzer() };

> 

>            string input = "Name.Surname@gmail.com 123.456 3,5 AT&T

> ğüşıöç%ĞÜŞİÖÇ$ΑΒΓΔΕΖ#АБВГДЕ SSß";

> 

> 

> 

>            foreach (Analyzer analyzer in analyzers)

> 

>            {

> 

>                TokenStream ts = analyzer.TokenStream("", new

> StringReader(input));

> 

>                Lucene.Net.Analysis.Token t = ts.Next();

> 

>                while (t != null)

> 

>                {

> 

>                    Console.Write("[" + t.TermText() + "] ");

> 

>                    t = ts.Next();

> 

>                }

> 

>                Console.WriteLine(); Console.WriteLine();

> 

> 

> 

>            }

> 

>        }

> 

> 

> 

> DIGY

> 

> 

> 

> 

> 

> -----Original Message-----

> From: Gustavo Poll [mailto:gkpoll@gmail.com]

> Sent: Tuesday, September 06, 2011 9:00 PM

> To: lucene-net-user@lucene.apache.org

> Subject: Re: [Lucene.Net] How to index/search a file name

> 

> 

> 

> thanks DIGY, I have interest in that too... Let me see if i understood:

> 

> 

> 

> UnaccentedWordAnalyzer  is like Standard Analyzer, but accent insensitive?

> 

> 

> 

> Thanks!

> 

> Gustavo Poll

> 

> 

> 

> 

> 

> 2011/9/6 digy digy <digydigy@gmail.com>

> 

> 

> 

> > That may help

> 

> >

> 

> > UnaccentedWordAnalyzer @

> 

> >

> 

> >

> https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/src/contrib/Core/Analysis/Ext/Analysis.Ext.cs

> 

> >

> 

> >

> 

> > DIGY

> 

> >

> 

> > On Tue, Sep 6, 2011 at 12:31 PM, Floyd Wu <floyd.wu@gmail.com> wrote:

> 

> >

> 

> > > Hi everyone,

> 

> > >

> 

> > > I have a question that annoying me many times. my situation is that I

> 

> > need

> 

> > > to index file name and need to be searchable using partial file name.

> 

> > >

> 

> > > example--> 2009&2010Q2_ABCD_Report.xls (the file name)

> 

> > >

> 

> > > When I shot queries

> 

> > >

> 

> > > filename:ABCD    no match return.

> 

> > >

> 

> > > filename:2010Q2_ABCD     match

> 

> > >

> 

> > > filename:Report*    match

> 

> > >

> 

> > > I'm using StandardAnalyzer and Lucene.Net version is 2.9.3. Current

> 

> > > filename

> 

> > > field is set to tokenized/indexed/store

> 

> > >

> 

> > > What I want is when user type any part of file name that lucene.Net can

> 

> > > match.

> 

> > > (string like 2009 or 2010Q2 or ABCD or Report or xls or Report.xls)

> 

> > >

> 

> > > Please help on this or kindly direct me a way to solve it.

> 

> > >

> 

> > > Floyd

> 

> > >

> 

> >

> 

> 

> 

> -----

> 

> Bu iletide virüs bulunamadı.

> 

> AVG tarafından kontrol edildi - www.avg.com

> 

> Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011

> 

> 

 

-----

Bu iletide virüs bulunamadı.

AVG tarafından kontrol edildi - www.avg.com

Sürüm: 2012.0.1796 / Virüs Veritabanı: 2082/4480 - Sürüm Tarihi: 06.09.2011


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message