lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Young" <bubble...@gmail.com>
Subject Re: Best practice for storing relational data in Solr
Date Fri, 04 Jan 2008 11:59:46 GMT
Short answer: It depends.
Long answer: It depends on whether you want to be able to search on.
If you need to search by recruiter name then obviously you'll need to
index it, if you don't you only really need to index the most relevent
db identifier, then work out the relations from that in MySQL (it's
what it's good at after all).

Cheers
Rob

On Jan 4, 2008 11:39 AM, steve.lillywhite
<steve.lillywhite@bsignificant.com> wrote:
> Hi all,
>
>
>
> This is a (possibly very naive) newbie question regarding Solr best practice...
>
>
>
> I run a website that displays/stores data on job applicants, together with information
on where they came from (e.g. which recruiter), which office they are applying to, etc. This
data is stored in a mySQL database. I currently have a basic search facility, but I  plan
to introduce Solr to improve this, by also storing applicant data in a Solr schema.
>
>
>
> My problem is that *related* applicant data can also be updated in the web GUI (e.g.
if there was a typo a recruiter could be changed from "My Rcruiter" to "My Recruiter", and
I don't know how best to reflect this in the Solr schema.
>
> Example:
>
> We may have 20000 applicants that came from recruiter "My Recruiter". If the name of
this recruiter is altered in the GUI then I would have to reindex all 20000 of those applicants
in the Solr schema, which seems very overkill. The alternative would be if I didn't store
the recruiter name in the Solr schema, and instead only stored its mySQL database identifier.
Then, I would need to parse any search results from Solr to put in the recruiter name before
displaying the data in the GUI.
>
>
>
> So I guess I'm asking which of these is the better approach;
>
>
>
> 1.       Use Solr to store the text value of related applicant data that exists in a
relational mySQL database. Whenever that data is updated in the database reindex all dependent
entries in the Solr schema. Advantage of this approach I guess is that search results can
be returned from Solr and displayed as is (if XSLT is used). E.g. search result for "John
Smith" of recruiter "My Recruiter" could be returned in the required HTML format from Solr,
and displayed in the web GUI without any reformatting or further processing.
>
> 2.       Use Solr to store database Ids of related applicant data that exists in a relational
mySQL database. When that data is updated in the database there is no need to reindex Solr.
However, search results from Solr will need to be parsed before they can be output in the
web GUI. E.g. if Solr returns "John Smith" of recruiter with database ID 143, then 143 will
need to be mapped back to "My Recruiter" by my application before it can be displayed.
>
>
>
> Can anyone offer any guidance here?
>
>
>
> Regards
>
>
>
> Steve
>
>
>
>
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.5.516 / Virus Database: 269.17.13/1208 - Release Date: 03/01/2008 15:52
>
>

Mime
View raw message