lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ilya Zavorin <>
Subject RE: Efficient string lookup using Lucene
Date Sun, 26 Aug 2012 01:10:21 GMT
Does Lucene support this type of structure, or do I need to somehow implement it outside Lucene?

By the way, I need this to run on an Android phone so size of memory might be an issue...


Ilya Zavorin

-----Original Message-----
From: Dawid Weiss [] 
Sent: Friday, August 24, 2012 4:50 PM
Subject: Re: Efficient string lookup using Lucene

What you need is a suffix tree or a suffix array. Both data structures will allow you to perform
constant-time searches for existence/ occurrence of any input pattern. Depending on how much
text you have on the input it may either be a simple task -- see here:

or a complicated task if your input size is larger (larger than memory). Google search for
suffix trees/ suffix arrays though, it's the data structure to use here.


On Fri, Aug 24, 2012 at 9:48 PM, Ilya Zavorin <> wrote:
> Hi Everyone,
> I have the following task. I have a set of documents in multiple languages. I don't know
what these languages are. Any given doc may contain text in several languages mixed up. So
to me these are just a bunch of Unicode text files.
> What I need is to implement an efficient EXACT string lookup. That is, I need to be able
to find ANY Unicode string exactly as it appears. I do not care about language-specific modifications
of the string. That is, if I search for a string "run", I do not need to find "ran" but I
do want to find it in all of these strings below:
> Fox is running fast
> !%#^&$run!$!%@&$#
> run,run
> Is there a way of using StandardAnalyzer or any other analyzer and the corresponding
query parser to find these? Again, my queries might be more or less random Unicode sequences
and I need to find all their accurrences in the text.
> Essentially, what I am trying to do is implement substring matching more efficiently
that using Java's standard substring matching methods.
> Thanks!
> Ilya Zavorin

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message