lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Search in lines, so need to index lines?
Date Wed, 01 Aug 2018 11:15:55 GMT
http://man7.org/linux/man-pages/man1/grep.1.html

On Wed, Aug 1, 2018 at 7:01 AM, Gordin, Ira <ira.gordin@sap.com> wrote:
> Hi Tomoko,
>
> I need to search in many files and we use Lucene for this purpose.
>
> Thanks,
> Ira
>
> -----Original Message-----
> From: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
> Sent: Wednesday, August 1, 2018 1:49 PM
> To: java-user@lucene.apache.org
> Subject: Re: Search in lines, so need to index lines?
>
> Hi Ira,
>
>> I am trying to implement regex search in file
>
> Why are you using Lucene for regular expression search?
> You can implement this by simply using java.util.regex package?
>
> Regards,
> Tomoko
>
> 2018年8月1日(水) 0:18 Gordin, Ira <ira.gordin@sap.com>:
>
>> Hi Uwe,
>>
>> I am trying to implement regex search in file the same as in editors, in
>> Notepad++ for example.
>>
>> Thanks,
>> Ira
>>
>> -----Original Message-----
>> From: Uwe Schindler <uwe@thetaphi.de>
>> Sent: Tuesday, July 31, 2018 6:12 PM
>> To: java-user@lucene.apache.org
>> Subject: RE: Search in lines, so need to index lines?
>>
>> Hi,
>>
>> you need to create your own tokenizer that splits tokens on \n or \r.
>> Instead of using WhitespaceTokenizer, you can use:
>>
>> Tokenizer tok = CharTokenizer. fromSeparatorCharPredicate(ch -> ch=='\r'
>> || ch=='\n');
>>
>> But I would first think of how to implement the whole thing correctly.
>> Using a regular expression as "default" query is slow and does not look
>> correct. What are you trying to do?
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> Achterdiek 19, D-28357 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>> > -----Original Message-----
>> > From: Gordin, Ira <ira.gordin@sap.com>
>> > Sent: Tuesday, July 31, 2018 4:08 PM
>> > To: java-user@lucene.apache.org
>> > Subject: Search in lines, so need to index lines?
>> >
>> > Hi all,
>> >
>> > I understand Lucene knows to find query matches in tokens. For example
>> if I
>> > use WhiteSpaceTokenizer and I am searching with /.*nice day.*/ regular
>> > expression, I'll always find nothing. Am I correct?
>> > In my project I need to find matches inside lines and not inside words,
>> so I
>> > am considering to tokenize lines. How I should to implement this idea?
>> > I'll really appriciate you have more ideas/implementations.
>> >
>> > Thanks in advance,
>> > Ira
>> >
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> --
> Tomoko Uchida

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message