lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Modelling relational data in Lucene Index?
Date Fri, 03 Nov 2006 14:26:15 GMT
One thing it took me a while to grasp, and is not automatic for folks with
significant database backgrounds is that the fields in a Lucene document are
only related to those of any other document by the meaning you, as a
programmer, understand. That is, document 1 may have fields a, b, c.
Document 2 may have fields b, e, g. There is no requirement that, in this
example, document 1 has fields e and g for instance. and vice-versa. In
other words, Lucene documents don't fit into a table model.

The reason I mention that is that I'm extremely leery of packing data in a
field that really doesn't belong together. Plus, your searching becomes more
complicated.

In your example above, what happens if the file name and image are similar
enough to produce false hits? Whereas if you stored them as separate fields
in a document, you don't have this kind of problem.

So, if you can cleverly de-normalize your data in such a way as to satisfy
all the searches you'll ever want to perform, you can store it all in a
Lucene index and be happy. If you can't, you could use Lucene to search the
parts you *do* care about and store the rest in a database. Or, you could
just use a database. I believe it all hinges on whether you have a fixed set
of queries you can anticipate (and thus reflect in a Lucene index) or not.

Best
Erick

On 11/2/06, Rajesh parab <rajesh_parab_1@yahoo.com> wrote:
>
> Thanks for feedback Chris.
>
> I agree with you. The data set should be flattened out to store inside
> Lucene index. The Folder-File was just an example. As you know, in
> relational database, we can have more complex relationships. I understand
> that this model may not work for deeper relationships.
>
> What I am mainly interested in is just one level deep relationship. But, I
> would like to search on the additional attributes of the related object. For
> example, in the relationship for Folder-File, I would like to use additional
> file attributes as search criteria along with file name while searching for
> folders.
>
> The way I see is having single filed for the related object and all its
> additional attributes and use some separator while capturing this data
> inside Lucene Field object. For example -
>
>             new Field("file", "abc.txt<sep>image");
>
> But, I am not quite sure if this model will work.
>
> BTW. I did not understand what you meant by the detached approach. Can you
> please elaborate?
>
> Regards,
> Rajesh
>
> ----- Original Message ----
> From: Chris Lu <chris.lu@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Thursday, November 2, 2006 7:57:46 PM
> Subject: Re: Modelling relational data in Lucene Index?
>
>
> For this specific question, you can create index on files, search
> files that of type image, and from matched files, find the unique
> directories(can be done in lucene or you can do it via java).
>
> Of course this does not scale to deeper relationships. Usually you do
> need to flattern the database objects in order to use lucene. It's
> just trading space for speed.
>
> I would prefer a detached approach instead of Hibernate or EJB's
> approach, which is kind of too tightly coupled with any system. How to
> rebuild if the index is corrupted, or you have a new Analyzer, or
> schema evolves? How to make it multi-thread safe?
>
> --
> Chris Lu
> -------------------------
> Instant Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
>
> On 11/2/06, Mark Miller <markrmiller@gmail.com> wrote:
> > Lucene is probably not the solution if you are looking for a relational
> > model. You should be using a database for that. If you want to combine
> > Lucene with a relational model, check out Hibernate and the new EJB
> > annotations that it supports...there is a cool little Lucene add-on that
> > lets you declare fields to be indexed (and how) with annotations.
> >
> > - Mark
> >
> > Rajesh parab wrote:
> > > Hi,
> > >
> > > As I understand, Lucene has a flat structure where you can define
> multiple fields inside the document. There is no relationship between any
> field.
> > >
> > > I would like to enable index based search for some of the components
> inside relational database. For exmaple, let say "Folder" Object. The Folder
> object can have relationship with File object. The File object, in turn, can
> have attributes like is image, is text file, etc. So, the stricture is
> > >
> > >     Folder -- > File
> > >              |
> > >              ------- > is image, is text file, ......
> > >
> > >
> > > I would like to enable a search to find a Folder with File of type
> image. How can we model such relational data inside Lucene index?
> > >
> > > Regards,
> > > Rajesh
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message