lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject Re: Search across related/correlated multivalue fields in Solr
Date Wed, 27 Apr 2011 17:49:58 GMT
There is no great way.

One approach would be to 'de-normalize' at index time, to actually have 
a field that looks like this:

institution_year: 2010.OHIO_ST  ;   2007.YALE

Then, with some code on client side, you could more easily facet and 
search how you want. It still doesn't (I don't think) make range queries 
easy (or even possible?).  And it can get un-manageable if you have more 
than two dimensions.

Another solution like you say is trying to do multiple queries on 
multiple document sets, but that gets tricky too.

There is also a "join" feature patch that is not currently in any 
released Solr, but just got committed to master, no way to know when or 
if it will end up in a released version for sure.  I hestitate to link 
to the JIRA because there's some ugly politics in the comments, but I 
think it _might_ be useful in this use case. But I haven't completely 
thought it through, but it is something useful in many of these sorts of 
multi-valued, multi-dimensional "join" type cases. 
https://issues.apache.org/jira/browse/SOLR-2272

But in general, yes, this is something that's hard to do in Solr/lucene.

Jonathan

On 4/27/2011 1:30 PM, ronotica wrote:
> The nature of my project is such that search is needed and specifically
> search across related entities. We want to perform several queries involving
> a correlation between two or more properties of a given entity in a
> collection.
>
> To put things in context, here is a snippet of the domain:
>
> Student { firstname, lastname }
> Education { degreeCode, degreeYear, institution }
>
> The database tables look like so:
>
> STUDENT
> ----------
> STUDENT_ID     FNAME      LNAME
> 100                 John          Doe
> 200                 Rasheed     Jones
> 300                 Mary          Hampton
>
> EDUCATION
> -------------
> EDUCATION_ID      DEGREE_CODE       DEGREE_YR       INSTITUTION
> STUDENT_ID
> 1                         MD                      2008
> OHIO_ST                100
> 2                         PHD                     2010                 YALE
> 100
> 3                         MS                      2007
> OHIO_ST               200
> 4                         MD                      2010                  YALE
> 300
>
> A student can have many educations. Currently, our documents look like this
> in solr:
>
> DOC_ID       STUDENT_ID    FNAME       LNAME      DEGREE_CODE    DEGREE_YR
> INSTITUTION
> 100             100                John          Doe          MD PHD
> 2008 2010     OHIO_ST YALE
> 101             200                Rasheed     Jones        MS
> 2007             OHIO_ST
> 102             300                Mary          Hampton   MD
> 2010             YALE
>
> Searching for all students who graduated from OHIO_ST in 2010 currently
> gives a hit (John Doe) when it shouldn't.
>
> What is the best way to have overcome this issue in Solr? This is only
> happening when I am searching across correlated fields, mainly because the
> data has been denormalized and Lucene has no notion of relationships between
> the various fields.
>
> One way that as come to mind is to have separate documents for "education"
> and perform multiple searches to get at an answer. Besides this, is there
> any other way? Does Solr provide any elegant solution for this?
>
> Any help will be greatly appreciated.
>
> Thanks.
>
> PS: We have about 15 of these kind of relationships all relating to the
> student and will like to perform search on each of them.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Search-across-related-correlated-multivalue-fields-in-Solr-tp2871176p2871176.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
View raw message