lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gordin, Ira" <ira.gor...@sap.com>
Subject RE: Search in lines, so need to index lines?
Date Tue, 31 Jul 2018 15:18:20 GMT
Hi Uwe,

I am trying to implement regex search in file the same as in editors, in Notepad++ for example.

Thanks,
Ira

-----Original Message-----
From: Uwe Schindler <uwe@thetaphi.de> 
Sent: Tuesday, July 31, 2018 6:12 PM
To: java-user@lucene.apache.org
Subject: RE: Search in lines, so need to index lines?

Hi,

you need to create your own tokenizer that splits tokens on \n or \r. Instead of using WhitespaceTokenizer,
you can use:

Tokenizer tok = CharTokenizer. fromSeparatorCharPredicate(ch -> ch=='\r' || ch=='\n');

But I would first think of how to implement the whole thing correctly. Using a regular expression
as "default" query is slow and does not look correct. What are you trying to do?

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Gordin, Ira <ira.gordin@sap.com>
> Sent: Tuesday, July 31, 2018 4:08 PM
> To: java-user@lucene.apache.org
> Subject: Search in lines, so need to index lines?
> 
> Hi all,
> 
> I understand Lucene knows to find query matches in tokens. For example if I
> use WhiteSpaceTokenizer and I am searching with /.*nice day.*/ regular
> expression, I'll always find nothing. Am I correct?
> In my project I need to find matches inside lines and not inside words, so I
> am considering to tokenize lines. How I should to implement this idea?
> I'll really appriciate you have more ideas/implementations.
> 
> Thanks in advance,
> Ira
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Mime
View raw message