lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noopur Julka <noopur.ju...@gmail.com>
Subject Re: Efficient string lookup using Lucene
Date Sun, 26 Aug 2012 06:39:37 GMT
I haven't yet found answer to my original question which was
how to work with search for japanese characters.

Regards,
Noopur Julka



On Sun, Aug 26, 2012 at 9:17 AM, Devon H. O'Dell <devon.odell@gmail.com>wrote:

> Seems worth mentioning in partial response to this thread's topics that
> (almost) regardless of index strategy, lucene performance hinges on number
> of matched documents per query, not total docs in index. There are other
> mitigating factors (disk type, ram size, etc), but worst case performance
> analysis can generally be modeled in terms of matched documents as opposed
> to index size.
>
> Apologies for any spelling / grammatical errors; this is sent from my
> phone.
>
> --dho
>  On Aug 25, 2012 11:02 PM, "Noopur Julka" <noopur.julka@gmail.com> wrote:
>
> > Index being very large can be ruled out as Luke returned few results and
> > the app is capable of returning approx 200 results.
> >
> > Regards,
> > Noopur Julka
> >
> >
> >
> > On Sun, Aug 26, 2012 at 6:40 AM, Ilya Zavorin <izavorin@caci.com> wrote:
> >
> > > Does Lucene support this type of structure, or do I need to somehow
> > > implement it outside Lucene?
> > >
> > > By the way, I need this to run on an Android phone so size of memory
> > might
> > > be an issue...
> > >
> > > Thanks,
> > >
> > >
> > > Ilya Zavorin
> > >
> > >
> > > -----Original Message-----
> > > From: Dawid Weiss [mailto:dawid.weiss@gmail.com]
> > > Sent: Friday, August 24, 2012 4:50 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Efficient string lookup using Lucene
> > >
> > > What you need is a suffix tree or a suffix array. Both data structures
> > > will allow you to perform constant-time searches for existence/
> > occurrence
> > > of any input pattern. Depending on how much text you have on the input
> it
> > > may either be a simple task -- see here:
> > >
> > > http://labs.carrotsearch.com/jsuffixarrays.html
> > >
> > > or a complicated task if your input size is larger (larger than
> memory).
> > > Google search for suffix trees/ suffix arrays though, it's the data
> > > structure to use here.
> > >
> > > Dawid
> > >
> > > On Fri, Aug 24, 2012 at 9:48 PM, Ilya Zavorin <izavorin@caci.com>
> wrote:
> > > > Hi Everyone,
> > > >
> > > > I have the following task. I have a set of documents in multiple
> > > languages. I don't know what these languages are. Any given doc may
> > contain
> > > text in several languages mixed up. So to me these are just a bunch of
> > > Unicode text files.
> > > >
> > > > What I need is to implement an efficient EXACT string lookup. That
> is,
> > I
> > > need to be able to find ANY Unicode string exactly as it appears. I do
> > not
> > > care about language-specific modifications of the string. That is, if I
> > > search for a string "run", I do not need to find "ran" but I do want to
> > > find it in all of these strings below:
> > > >
> > > > Fox is running fast
> > > > !%#^&$run!$!%@&$#
> > > > run,run
> > > >
> > > > Is there a way of using StandardAnalyzer or any other analyzer and
> the
> > > corresponding query parser to find these? Again, my queries might be
> more
> > > or less random Unicode sequences and I need to find all their
> accurrences
> > > in the text.
> > > >
> > > > Essentially, what I am trying to do is implement substring matching
> > more
> > > efficiently that using Java's standard substring matching methods.
> > > >
> > > > Thanks!
> > > >
> > > > Ilya Zavorin
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message