lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Searching documents on big index by using ParallelMultiSearcher is slow...
Date Wed, 04 Oct 2006 11:32:37 GMT
OK, you're now officially beyond my competence, so I'll have to wait for
people who actually know <G>....

Although if I read your stats right, you're getting approximately 1 sec
response time over 10M documents on a 10G index. That's not bad at all. What
kind of response time do you need?

On 10/3/06, Scott <m.scott.tiger@gmail.com> wrote:
>
> Hi,
>
> > Well, the first question is always "are you opening/closing your
> > IndexSearchers for each request on your remote machines?". This is
> always a
> > no-no. This is also a question for your single-searcher version.
>
> Yes I know, each search slave (RMI server) have single instance
>   of IndexSearcher and it's open once when RMI server starts.
>
> > What is your performance if you only go to one server? I'd start by
> finding
>
> A performance on one server with FULL index (not divided by 10)
>   is about 2500 ms.
> On one server with splitted index (divided by 10) is about 50 ms.
>
> And on ParallelMultiSearcher with 10 of remote searchable,
>   each RemoteSearchable returns in about 50 - 100 ms,
>   and ParallelMultiSearcher returns also 50 - 100 ms, because of
>   threading.
> but Hits Searcher.search(Query, Sort) responds in about 500 - 1000 ms.
>
> I think that Searcher.search with Sort reads all of SortFields from
>   IndexReader and it's bottleneck.
>
> Are there results of high performance distributed Lucene with
> ParallelMultiSearcher?
> Or need hadoop?
>
> Erick Erickson wrote:
> > Well, the first question is always "are you opening/closing your
> > IndexSearchers for each request on your remote machines?". This is
> always a
> > no-no. This is also a question for your single-searcher version.
> >
> > What is your performance if you only go to one server? I'd start by
> finding
> > out what happens when you forget all the ParallelMultiSearcher stuff,
> all
> > the RMI stuff etc, and just see what your performance is on one of your
> > index parts locally. Once that is answered, extend to RMI, then the
> > Parallel...., at each step seeing if your performance degrades
> > unacceptably.
> > That'll at least give you a clue what part of the process is the biggest
> > problem.
> >
> > And without knowing a LOT more about your searches, and your index, it's
> > kind of hard to come up with solutions <G>....
> >
> > Best
> > Erick
> >
> > On 10/3/06, Scott <m.scott.tiger@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> I have a question about ParallelMultiSearcher performance.
> >>
> >> I want to search documents on about 10 gigabytes of index.
> >> (The index has 10,000,000 documents.)
> >>
> >> I get very slow performance using IndexSearcher with ONE index
> normally.
> >> Then I tried to use ParallelMultiSearcher with 10 servers of remote
> >> searchable.
> >>
> >> Index:
> >> Each search slaves have 1/10 of index.
> >> (ONE index divided to 10 servers.)
> >>
> >> Search slave:
> >> Each search slaves start remote searchable RMI server,
> >> and wait connecting from search master.
> >>
> >> Search master:
> >> The search master use Naming.lookup() to get remote searchable.
> >> Get 10 remote searchables from each search slaves and build
> >> ParallelMultiSearcher.
> >> Then search.
> >>
> >> Any solution?
> >>
> >> --
> >> Scott
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> --
> Scott
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message