lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira" ...@odoko.co.uk>
Subject Re: Solr n00b question: writing a custom QueryComponent
Date Tue, 08 Feb 2011 23:12:23 GMT
Your observation regarding optimisation is an interesting one, it does
at least make sense that reducing the size of a segment will speed up
optimisation and reduce the disk space needed.

In a situation that had multiple shards, we had two 'rows', for
redundancy purposes. In that situation, we could take one row offline
while it optimised and allow the other to serve search during that time.
If we offset optimisation by 12 hours for each of our rows, we can
optimise daily and not have a problem with loss of up-to-date content or
slow searches during an optimisation.

As to splitting indexes, it isn't an easy task to do properly, and
there's nothing in Solr to do it. However, there is a very clever class
in Lucene contrib that you can use to split a Lucene index [1], and you
can safely use it to split a Solr index so long as the index isn't in
use while you're doing it.

Upayavira
[1] for example:
http://lucene.apache.org/java/3_0_2/api/contrib-misc/org/apache/lucene/index/MultiPassIndexSplitter.html

On Tue, 08 Feb 2011 06:24 -0800, "Ishwar" <ishwarsridharan@yahoo.com>
wrote:
> Thanks for the detailed reply Upayavira.
> 
> 
> To answer your question, our index is growing much faster than expected
> and our performance is grinding to a halt. Currently, it has over 150
> million records.
> We're planning to split the index into multiple shards very soon and move
> the index creation to hadoop.
> 
> Our current situation is that we need to run optimize once every couple
> of days to keep it in shape. Given the size(index + stored), it takes a
> long time to complete during which time we can't add new documents into
> the index. And because of the size of the stored fields, we need double
> the storage size of the current  index to optimize. Since we're on EC2,
> this requires frequent increase in storage capacity.
> 
> Even after sharding the index, the time to take to optimize the index is
> going to be significant. That's the reason why we decided to store these
> fields in MySQL.
> If there's some easier solution that I've overlooked, please point it
> out.
> 
> On a related note, is there a way to 'automagically' split the existing
> index into multiple shards?
> 
> --
> Thanks,
> Ishwar
> 
> 
> Just another resurrected Neozoic Archosaur comics.
> http://www.flickr.com/photos/mojosaurus/sets/72157600257724083/
> 
> 
> From: Upayavira <uv@odoko.co.uk>
> To: solr-user@lucene.apache.org
> Cc: 
> Sent: Tuesday, February 8, 2011 7:17 PM
> Subject: Re: Solr n00b question: writing a custom QueryComponent
> 
> The conventional way to do it would be to index your title and content
> fields in Solr, along with the ID to identify the document.
> 
> You could do a search against solr, and just return an ID field, then
> your 'client code' would match that up with the title/content data from
> your database. And yes, SolrJ would be the obvious route to take here,
> for your client application.
> 
> Yes, it does mean another component that needs to be maintained, but by
> using Solr's external interface you will be protected from changes to
> internals that could break your custom components, and you will likely
> be more able to take advantage of other Solr features that are also
> available via the standard interfaces.
> 
> My next question is: are you going to be using the data you're storing
> in mysql for something other than just enhancing search results? If not,
> it may still make sense to store the data in Solr. It would mean you
> just have one index to manage, rather than an index and a database -
> after all, the words *have* to take up disk space somewhere :-). If you
> end up with so many documents indexed that performance grinds (over
> 10million??) you can split your index across multiple shards.
> 
> Upayavira
> 
> Once you get search results back from Solr, you would do a query against
> your database to return the additional 
> 
> On Tue, 08 Feb 2011 03:38 -0800, "Ishwar" <ishwarsridharan@yahoo.com>
> wrote:
> > Hi Upayavira,
> > 
> > Apologies for the lack of clarity in the mail. The feeds have the
> > following fields:
> > id, url, title, content, refererurl, createdDate, author, etc. We need
> > search functionality on title and content. 
> > As mentioned earlier, storing title and content in solr takes up a lot of
> > space. So, we index title and content in solr, and we wish to store title
> > and content in MySQL which has the fields - id, title, content.
> > 
> > I'm also looking at a solr client- solrj to query MySQL based on what
> > solr returns. But that means another component which needs to be
> > maintained. I was wondering if it's a good idea to implement the
> > functionality in solr itself.
> > 
> >  
> > --
> > Thanks,
> > Ishwar
> > 
> > 
> > Just another resurrected Neozoic Archosaur comics.
> > http://www.flickr.com/photos/mojosaurus/sets/72157600257724083/
> > 
> > 
> > From: Upayavira <uv@odoko.co.uk>
> > To: solr-user@lucene.apache.org
> > Cc: 
> > Sent: Tuesday, February 8, 2011 4:36 PM
> > Subject: Re: Solr n00b question: writing a custom QueryComponent
> > 
> > I'm still not quite clear what you are attempting to achieve, and more
> > so why you need to extend Solr rather than just wrap it.
> > 
> > You have data with title, description and content fields. You make no
> > mention of an ID field.
> > 
> > Surely, if you want to store some in mysql and some in Solr, you could
> > make your Solr client code enhance the data it gets back after querying
> > Solr with data extracted from Mysql. What is the issue here?
> > 
> > Upayavira
> > 
> > 
> > On Mon, 07 Feb 2011 23:17 -0800, "Ishwar" <ishwarsridharan@yahoo.com>
> > wrote:
> > > Hi all,
> > > 
> > > Been a solr user for a while now, and now I need to add some
> > > functionality to solr for which I'm trying to write a custom
> > > QueryComponent. Couldn't get much help from websearch. So, turning to
> > > solr-user for help.
> > > 
> > > I'm implementing search functionality for  (micro)blog aggregation. We
> > > use solr 1.4.1. In the current solr config, the title and content fields
> > > are both indexed and stored in solr. Storing takes up a lot of space,
> > > even with compression. I'd like to store the title and description field
> > > in solr in mysql and retrieve these fields in results from MySQL with an
> > > id lookup.
> > > 
> > > Using the DataImportHandler won't work because we store just the title
> > > and content fields in MySQL. The rest of the fields are in solr itself.
> > > 
> > > I wrote a custom component by extending QueryComponent, and overriding
> > > only the finishStage(ResponseBuilder) function where I try to retrieve
> > > the necessary records from MySQL. This is how the new QueryComponent is
> > > specified in solrconfig.xml
> > > 
> > > <searchComponent name="query"    
> > > class="org.apache.solr.handler.component.TestSolr" />
> > > 
> > > 
> > > I see that the component is getting loaded from the solr debug output
> > > <lst name="prepare">
> > > <double name="time">1.0</double>
> > > <lst name="org.apache.solr.handler.component.TestSolr">
> > > <double name="time">0.0</double>
> > > </lst>
> > > ...
> > > 
> > > But the strange thing is that the finishStage() function is not being
> > > called before returning results. What am I missing?
> > > 
> > > Secondly, functions like ResponseBuilder._responseDocs are visible only
> > > in the package org.apache.solr.handler.component. How do I access the
> > > results in my package?
> > > 
> > > If you folks can give me links to a wiki or some sample custom
> > > QueryComponent, that'll be great.
> > > 
> > > --
> > > Thanks in advance.
> > > Ishwar.
> > > 
> > > 
> > > Just another resurrected Neozoic Archosaur comics.
> > > http://www.flickr.com/photos/mojosaurus/sets/72157600257724083/
> > --- 
> > Enterprise Search Consultant at Sourcesense UK, 
> > Making Sense of Open Source
> --- 
> Enterprise Search Consultant at Sourcesense UK, 
> Making Sense of Open Source
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


Mime
View raw message