lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gora Mohanty <g...@mimirtech.com>
Subject Re: configuring schema to match database
Date Fri, 11 Jan 2013 16:23:35 GMT
On 11 January 2013 21:13, Niklas Langvig <niklas.langvig@globesoft.com> wrote:
> It sounds good not to use more than one core, for sure I do not want to over complicate
this.
[...]

Yes, not only are multiple cores unnecessarily complicated here,
your searches will also be be less complex, and faster.

> Both table courses and languages has it's own primary key courseseqno and languagesseqno

There is no need to index these.

> Both also have a foreign key "userid" that references the users table with column userid
> The relationship from users to courses and languages are one-to-many.

> but I guess I'm thinking wrong because my idead whould be to have a "block" of fields
connected with one id
>
> <field name="coursename" type="string" indexed="true" />
> <field name="startdate" type="date" indexed="true" />
> <field name="enddate" type="" indexed="true" />
>
> These three are connected with a
> <field name="courseseqno" type="int" indexed="true" />
> But also have a
> <field name="userid" type="int" indexed="true" />
> To connect to a specific user?
[...]

You are still thinking of Solr as a RDBMS, where you should not
be. In your case, it is easiest to flatten out the data. This increases
the size of the index, but that should not really be of concern. As
your courses and languages tables are connected only to user, the
schema that I described earlier should suffice. To extend my
earlier example, given:
* userA with courses c1, c2, c3, and languages l1, l2
* userB with c2, c3, and l2
you should flatten it such that you get the following Solr documents
<userA> <c1 name> <c1 startdate>...<l1> <l1 writing skill>...
<userA> <c1 name> <c1 startdate>...<l2> <l2 writing skill>...
<userA> <c2 name> <c2 startdate>...<l1> <l1 writing skill>...
...
<userB> <c2 name> <c2 startdate>...<l2> <l2 writing skill>...
<userB> <c3 name> <c3 startdate>...<l2> <l2 writing skill>...
i.e., a total of 3 courses x 2 languages = 6 documents for
userA, and 2 courses x 1 language = 2 documents for userB

In order to get this form of flattened data into Solr, I would
suggest using the DataImportHandler with nested entities.
Please see the earlier link to DIH. Also, a Google search
for Solr dataimporthandler nested entities turns up many
examples, including:
http://solr.pl/en/2010/10/11/data-import-handler-%E2%80%93-how-to-import-data-from-sql-databases-part-1/
Please give it a try, and post here with your attempts if
you run into any issues.

Regards,
Gora

Mime
View raw message