lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lohrenz, Steven" <Steven.Lohr...@hmhpub.com>
Subject RE: Searching Across Multiple Cores
Date Thu, 14 Oct 2010 17:52:35 GMT
Ken, 

Ok, I understand how the distributed search works, but I don't understand how to build my
query appropriately so that the results returned from the two shards only return values that
exist in both result sets. 

In essence, I'm doing a join across the two shards on the resourceId. 

So Core0 has:
resourceId (unique key)
title 
tag1
tag2 
tag3

And Core1 has:
resourceId + folder + userId + grade (concatenated - this is the uniqueId)
resourceId
folder
userId
grade

For example, I would want to find all the content with userId = 893489 and tag1 = 'contentTagX'.


My thought of how to do this is to search Core1 for all the items with userId = 893489. This
would return a set of results for that user with resourceId. Then I would need to search Core0
for where tag1 = 'contentTagX' and where resourceId = those returned in the result set from
Core1. 

I can probably do this in a search handler (say Core3 with a mashup of the 2 schemas but just
redirects to the other shards), but is there an easier way to do it?

Or am I missing something?

Thanks for your help,
Steve


-----Original Message-----
From: Ken Stanley [mailto:dohpaz@gmail.com] 
Sent: 14 October 2010 18:19
To: solr-user@lucene.apache.org
Subject: Re: Searching Across Multiple Cores

Steve,

Using shards is actually quite simple; it's just a matter of setting up your
shards (via multiple cores, or multiple instances of SOLR) and then passing
the shards parameter in the query string. The shards parameter is a
comma-separated list of the servers/cores you wish to use together.

So, let's try this using a fictitious example. You have two cores, one
called main for your main data set of metadata and favorites for your user
favorites meta data. You set up each schema accordingly, and you've indexed
your data. When you want to do a query on both sets of data you would build
your query appropriately, and then use the following URL (the host is
assumed to be localhost for simplicity):

http://localhost/solr/main/select?q=id:[*+TO+*]&shards=localhost/solr/main,localhost/solr/favorites&rows=100&start=0

I am personally investigating using this technique to tie together two cores
that utilize different schemas; one schema will contain news articles,
blogs, and similar types of data, while another schema will contain
company-specific information, such as addresses, etc. If you're still having
trouble after trying this, let me know and I'd be more than happy to share
any findings that I come across.

I hope that this helps to clear things up for you. :)

- Ken

It looked like something resembling white marble, which was
probably what it was: something resembling white marble.
                -- Douglas Adams, "The Hitchhikers Guide to the Galaxy"


On Thu, Oct 14, 2010 at 4:25 AM, Lohrenz, Steven
<Steven.Lohrenz@hmhpub.com>wrote:

> Ken,
>
> I have been through that page many times. I could use Distributed search
> for what? The first scenario or the second?
>
> The question is: can I merge a set of results from the two cores/shards and
> only return results that exist in both (determined by the resourceId, which
> exists on both)?
>
> Cheers,
> Steve
>
> -----Original Message-----
> From: Ken Stanley [mailto:dohpaz@gmail.com]
> Sent: 13 October 2010 20:08
> To: solr-user@lucene.apache.org
> Subject: Re: Searching Across Multiple Cores
>
> On Wed, Oct 13, 2010 at 2:11 PM, Lohrenz, Steven
> <Steven.Lohrenz@hmhpub.com>wrote:
>
> > Hi,
> >
> > I am trying to figure out if how I can accomplish the following:
> >
> > I have a fairly static and large set of resources I need to have indexed
> > and searchable. Solr seems to be a perfect fit for that. In addition I
> need
> > to have the ability for my users to add resources from the main data set
> to
> > a 'Favourites' folder (which can include a few more tags added by them).
> The
> > Favourites needs to be searchable in the same manner as the main data
> set,
> > across all the same fields.
> >
> > My first thought was to have two separate schemas
> > - the first  for the main data set and its metadata
> > - the second for the Favourites folder with all of the metadata from the
> > main set copied over and then adding the additional fields.
> >
> > Then I thought that would probably waste quite a bit of space (the number
> > of users is much larger than the number of main resources).
> >
> > So then I thought I could have the main data set with its metadata. Then
> > there would be second one for the Favourites folder with the unique id
> from
> > the first and the additional fields it needs (userId, grade, folder,
> tag).
> > In addition, I would create another schema/core with all the fields from
> the
> > other two and have a request handler defined on it that searches across
> the
> > other 2 cores and returns the results through this core.
> >
> > This third core would have searches run against it where the results
> would
> > expect to only be returned for a single user. For example, a user
> searches
> > their Favourites folder for all the items with Foo. The result is only
> those
> > items the user has added to their Favourites with Foo somewhere in their
> > main data set metadata.
> >
> > Could this be made to work? What would the consequences be? Any
> alternative
> > suggestions?
> >
> > Thanks,
> > Steve
> >
> >
> Steve,
>
> From your description, it really sounds like you could reap the benefits of
> using Distributed Search in SOLR:
>
> http://wiki.apache.org/solr/DistributedSearch
>
> I hope that this helps.
>
> - Ken
>
Mime
View raw message