lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomoko Uchida <tomoko.uchida.1...@gmail.com>
Subject Re: Search in lines, so need to index lines?
Date Wed, 01 Aug 2018 11:35:08 GMT
Ira,

I do not understand your requirements, but essentially lucene is not for
regex searching.
There are tools for fast regular expression search, if you do not satisfy
with java standard library, for example:
https://github.com/google/re2j

And yes, grep command would be the best tool for you.

Tomoko

2018年8月1日(水) 20:01 Gordin, Ira <ira.gordin@sap.com>:

> Hi Tomoko,
>
> I need to search in many files and we use Lucene for this purpose.
>
> Thanks,
> Ira
>
> -----Original Message-----
> From: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
> Sent: Wednesday, August 1, 2018 1:49 PM
> To: java-user@lucene.apache.org
> Subject: Re: Search in lines, so need to index lines?
>
> Hi Ira,
>
> > I am trying to implement regex search in file
>
> Why are you using Lucene for regular expression search?
> You can implement this by simply using java.util.regex package?
>
> Regards,
> Tomoko
>
> 2018年8月1日(水) 0:18 Gordin, Ira <ira.gordin@sap.com>:
>
> > Hi Uwe,
> >
> > I am trying to implement regex search in file the same as in editors, in
> > Notepad++ for example.
> >
> > Thanks,
> > Ira
> >
> > -----Original Message-----
> > From: Uwe Schindler <uwe@thetaphi.de>
> > Sent: Tuesday, July 31, 2018 6:12 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: Search in lines, so need to index lines?
> >
> > Hi,
> >
> > you need to create your own tokenizer that splits tokens on \n or \r.
> > Instead of using WhitespaceTokenizer, you can use:
> >
> > Tokenizer tok = CharTokenizer. fromSeparatorCharPredicate(ch -> ch=='\r'
> > || ch=='\n');
> >
> > But I would first think of how to implement the whole thing correctly.
> > Using a regular expression as "default" query is slow and does not look
> > correct. What are you trying to do?
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Gordin, Ira <ira.gordin@sap.com>
> > > Sent: Tuesday, July 31, 2018 4:08 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Search in lines, so need to index lines?
> > >
> > > Hi all,
> > >
> > > I understand Lucene knows to find query matches in tokens. For example
> > if I
> > > use WhiteSpaceTokenizer and I am searching with /.*nice day.*/ regular
> > > expression, I'll always find nothing. Am I correct?
> > > In my project I need to find matches inside lines and not inside words,
> > so I
> > > am considering to tokenize lines. How I should to implement this idea?
> > > I'll really appriciate you have more ideas/implementations.
> > >
> > > Thanks in advance,
> > > Ira
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> --
> Tomoko Uchida
>


-- 
Tomoko Uchida

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message