lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Serving remote lucene client - RMI vs HTTP
Date Mon, 16 Jul 2007 20:40:49 GMT
Another question would be what queries take the longest. Are your
response times pretty constant on a per-query basis or are there
outliers that could perhaps point to a different solution?

Finally, what is the size of your index? The total number of documents
is certainly useful, but so is the finished index size. I can put a million
document in a 100M index or a 100G index, and size does matter <G>...

Best
Erick

On 7/16/07, Ard Schrijvers <a.schrijvers@hippo.nl> wrote:
>
> Hello,
>
> > Hi EVeryone,
> >
> > Thank you all for your replies.
> >
> > And reply to your questions Grant:
> > We have more than 3 Million document in our index.
> > We get more than 150,000 searches (queries) per day. We
> > expect this no to go
> > up.
>
> Just curious, but suppose those 150.000 searches are done during 8 hours a
> day, that means around 6 searches a second...that is not too much is it?
>
> Perhaps a too trivial question, but do you only re-open the index reader
> when there is actually something changed in the index (or are you crawling
> all day? perhaps you should only crawl during the night)? Reopening the
> index over and over again while not needed is inefficient. Also, there are
> some caching mechanisms around
>
> Furthermore, if you use sorting in your query, you might gain much speed
> by indexing these fields differently (like sorting on some large field is
> very expensive)
>
> Then of course, do you use range searches?
>
> Also, did you check setting maxmemory of your jvm higher?
>
> One more suggestion: try to open your index with "luke" and see wether
> "luke" is faster in doing the searches. If so, you certainly can find some
> simple ways to increase the performance,
>
> Ard
>
> > This count only includes actual user search and doesn't
> > include robot
> > searches and other internal searches (for sending emails to user etc).
> > And our final queries sent to lucene are quite complicated.
> > This is because
> > we need to confirm to a lot of criteria (some are set by
> > users and some are
> > internal logics). I don't think we can simplify our queries.
> >
> > And importantly we are looking to scale to a larger no. of
> > traffic. Our
> > current performance in terms of speed is ok.
> >
> > We are not sure about implementing using Solr because we
> > crawl only specific
> > type of sites and our crawling mechanism has proven to be
> > quite stable. We
> > are only looking to scale to larger no. of users and doing a new
> > implementation using Solr might be an overkill as time is a
> > huge factor.
> >
> > Best Regards,
> > Kumar
> >
> >
> > Grant Ingersoll-6 wrote:
> > >
> > > Hi Kumar,
> > >
> > > I am curious about where the bulk of time was spent in
> > Lucene.  I am
> > > not doubting your numbers or analysis, just want to know if
> > there was
> > > anything that could be improved in Lucene and make sure you are
> > > solidly in need of going to multiple machines as it adds an extra
> > > level of complexity.  Can you provide info about number of
> > documents,
> > > # of updates, # of queries etc. ?  What kind of analysis, etc. are
> > > you doing?  That being said, putting the web app on one
> > machine and
> > > the search application on the other makes sense much in the
> > same way
> > > it makes sense in most cases to do this for a database.
> > >
> > > As for POST/GET vs. RMI, have a look at Solr, esp. its replication
> > > capabilities for load balancing search servers.  I think it
> > answers
> > > the question in favor of the POST/GET approach.  In fact,
> > you may be
> > > able to drop in Solr to your situation w/o too much work.
> > >
> > > Cheers,
> > > Grant
> > >
> > >
> > > On Jul 15, 2007, at 10:10 PM, kumarlimbu wrote:
> > >
> > >>
> > >> Hi Everyone,
> > >>
> > >> We are using lucene,nutch and spring framework to create a
> > specialized
> > >> search engine. Due to growing traffic we are trying to scale. By
> > >> doing some
> > >> tests we found out that the bottle neck was lucene search.
> > We used
> > >> some
> > >> heavy traffic simulation and logged the time taken by each
> > portion
> > >> of the
> > >> server response and found out that the bulk of the time
> > was spent in
> > >> searching from lucene index.
> > >>
> > >> In order to accomodate higher traffic we are planning on
> > splitting our
> > >> application in 2 portions:
> > >> 1. Web application (on 1 machine)
> > >> 2. Search application (one more than 1 machine)
> > >>
> > >> Each one of the application will reside on a (possibly) separate
> > >> machines.
> > >> We are looking forward to scaling by adding more than 1 machine
> > >> dedicated to
> > >> searching as lucene search seems to be the bottleneck.
> > >>
> > >> Web application will provide the front-end to the user.
> > All static
> > >> pages,
> > >> images and the style information will reside on this machine. It
> > >> will also
> > >> serve dynamic pages but all the searching will take place on the
> > >> search
> > >> application. Web application will send search parameters
> > to the search
> > >> application and after searching, it will send back the results to
> > >> the web
> > >> application which will format it and display it to the user.
> > >>
> > >> What we are unable to decide is whether to use RMI (
> > >>
> http://www.soft-amis.com/index.html?return=http://www.soft-amis.com/
> >> cluster4spring/index.html
> >> cluster4spring  )or simple HTTP POST/GET request with response in XML
> >> format. We did some research and found out that RMI (with clustering
> >> support) might be more suitable for our needs. Unfortunately our
> >> team is not
> >> familiar with RMI and so we don't know if there will be any issues
> >> with it
> >> during implementation.
> >> Advantage of using simple GET/POST is we have more control which
> >> searcher
> >> app to use and when. An important criteria for us is to disable
> >> searching
> >> from the searcher app who's index is being updated. This is very
> >> important
> >> for us. Is there anyway in RMI to inform the web (client)
> >> application that a
> >> particular server is unavailable (due to index being updated).
> >>
> >> We would also like to know if anybody has implemented lucene
> >> searching in a
> >> similar fashion. Thanks for your help.
> >>
> >> Kumar Limbu
> >>
> >> --
> >> View this message in context: http://www.nabble.com/Serving-remote-
> >> lucene-client---RMI-vs-HTTP-tf4084167.html#a11608209
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> > --------------------------
> > Grant Ingersoll
> > Center for Natural Language Processing
> > http://www.cnlp.org/tech/lucene.asp
> >
> > Read the Lucene Java FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Serving-remote-lucene-client---RMI-vs-HTTP-tf4084167.html#a11609007
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message