lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: advice on creating a solr index when data source is from many unrelated db tables
Date Sun, 01 Aug 2010 15:17:20 GMT
Yes, there might be 50 fields. This is not a problem for SOLR/Lucene.

HOWEVER: SOLR is NOT a database and shouldn't be used like one. Your
emails hint that you're thinking like a DB person, not a search person. I
guarantee that if you just index the database information into 9 different
document types (using your example), the next question you'll have is
a variant on "how do I do a join on different documents in a single
search?".
And the answer is "you don't".

Instead, go against all your training in the DB world and DE-normalize your
data. Your goal here is for each "document" to contain all the information
you need to satisfy your requirements in one SOLR query.

I know, I know... the first reaction any experienced DB admin has to that
suggestion is a horrified "Do WHAT?!?!?!". But remember that SOLR is
a search application. Database developers have put a huge amount of effort
into being able to do joins. And they do it very well. SOLR doesn't
make any attempt to "do the join thing better", and it shouldn't. It
searches
massive amounts of text. And it does it very well. Disk space is cheap.

So step back and ask yourself what you're trying to do with SOLR. It may be
that you need the best of both worlds, in which case you index your
searchable
data along with your DB primary keys and use SOLR to search your text then
use your DB to do the other stuff. It may be that you need to flatten your
data and only have one document type and just use SOLR. It may be many
things.
But I almost guarantee that the thing it isn't is to just index the tables
as they
are <G>....

Best
Erick


On Fri, Jul 30, 2010 at 12:58 PM, S Ahmed <sahmed1020@gmail.com> wrote:

> So I have tables like this:
>
> Users
> UserSales
> UserHistory
> UserAddresses
> UserNotes
> ClientAddress
> CalenderEvent
> Articles
> Blogs
>
> Just seems odd to me, jamming on these tables into a single index.  But I
> guess the idea of using a 'type' field to quality exactly what I am
> searching is a good idea, in case I need to filter for only 'articles' or
> blogs or contacts etc.
>
> But there might be 50 fields if I do this no?
>
>
>
> On Fri, Jul 30, 2010 at 4:01 AM, Chantal Ackermann <
> chantal.ackermann@btelligent.de> wrote:
>
> > Hi Ahmed,
> >
> > fields that are empty do not impact the index. It's different from a
> > database.
> > I have text fields for different languages and per document there is
> > always only one of the languages set (the text fields for the other
> > languages are empty/not set). It works all very well and fast.
> >
> > I wonder more about what you describe as "unrelated data" - why would
> > you want to put unrelated data into a single index? If you want to
> > search on all the data and return mixed results there surely must be
> > some kind of relation between the documents?
> >
> > Chantal
> >
> > On Thu, 2010-07-29 at 21:33 +0200, S Ahmed wrote:
> > > I understand (and its straightforward) when you want to create a index
> > for
> > > something simple like Products.
> > >
> > > But how do you go about creating a Solr index when you have data coming
> > from
> > > 10-15 database tables, and the tables have unrelated data?
> > >
> > > The issue is then you would have many 'columns' in your index, and they
> > will
> > > be NULL for much of the data since you are trying to shove 15 db tables
> > into
> > > a single Solr/Lucense index.
> > >
> > >
> > > This must be a common problem, what are the potential solutions?
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message