lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Solr as a database, how about joins? (was: acts-as-solr (Ruby on Rails))
Date Mon, 13 Nov 2006 17:08:03 GMT
On 11/13/06, Bertrand Delacretaz <bdelacretaz@apache.org> wrote:
> I'm also envisioning using Solr to replace a database in some web apps

Yes, querying only the search collection rather than both the search
collection and the database can make a lot of sense: a less
complicated webapp, and you only need to make the search collection
HA.

> - but how would you handle (or rather simulate) joins in such a case?

The usual approach is to denormalize the data.  The downside is a
slightly bigger search collection.

> Say you have a Book which references an Author in a separate Solr
> <document> - how do you suggest inserting the Author's data into each
> Book like an SQL join would do?

Is it possible to make the collection book centric and put the
author's data into each book during indexing?

> Is it efficient to do a new Lucene query for each Book found, to get
> the Author? I can imagine doing that in  a loop, and Solr's caches
> would probably help. But how does that feel from Lucene's point of
> view?

It's doable.  The only advantage is decreased index size, but you give
up some query power and speed.

> This wouldn't be a full join, as there's probably no way to do a
> single query like
>
>   select * from Book,Author
>   where Book.author_id = Author.author_id
>   and Author.name like '%chill%"

DB type joints would probably take a *lot* of work.

Another downside is the potential for federated or distributed search
in the future.  Joins go across documents and are thus not easily
distributed.

> Being able to do this would be cool, but at this point I'm only
> thinking of retrieving related info linked via IDs.

Trying to think of a URL friendly syntax for this that would work for
including fields from more than one other "table"... something like:
addFields=artist_name where artist_id:song_artist
addFields=album_name,album_date where album_id:song_album

I'm still not sure if it's a good idea or not though... you give up
powerful queries like
+song_title:foo +album_date:[1970 TO 1980] -artist_name:bob

-Yonik

Mime
View raw message