lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: a question about indexing database tables
Date Thu, 22 Feb 2007 13:33:41 GMT
OK, I was off on a tangent. We've had several discussions where people were
effectively trying to replace a RDBMS with Lucene and finding out it that
RDBMSs are very good at what they do <G>...

But in general, I'd probably approach it by doing the RDBMS work first and
indexing the result. I think this is your option (2). Yes, this will
de-normalize a bunch of your data and you'll chew up some space, but disk
space is cheap. Very cheap <G>.

One thing to remember, though, that took me a while to get used to,
especially when I had my database hat on. There's no requirement that every
document in a Lucene index have the same fields. Conceptually, you can store
*all* your tables in the same index. So a document for table one has fields
table_1_field1 table_1_field2 table_1_field3. "documents" for table two have
fields table_2_field1 table_2_field2 etc.....

These documents will never interfere with each other during searches because
they share no fields (and each query goes against a particular field).

I mention this because your maintenance will be much easier if you only have
one index <G>....

Best
Erick

On 2/22/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
>
> Thanks Erick
> but we have to because we need to execute very big queries that create
> traffik network and are very very slow. but with lucene we do it in some
> milliseconds. and now we indexed our needed information by joining tables.
> it works fine, besides, it returns the exact result as we can get from
> database. we indexed about one million records.
> but let me say, we are not using it instead of database, we use it to
> generate some dynamic reports that if we did it by sql queries, it would
> take about 15 minutes.
>
> On 2/22/07, Erick Erickson <erickerickson@gmail.com> wrote:
> >
> > don't do either one <G>.... Search this mail archive for discussions of
> > databases, there are several long threads discussing this along with
> > various
> > options on how to make this work. See particularly a mail entitled
> > *Oracle/Lucene
> > integration -status- *and any discussions participated in by Marcelo
> > Ochoa.
> >
> > But, in general, Lucene is a text search engine, NOT a RDBMS. When you
> > start
> > saying "keep all relation in order to get right result", it sounds like
> > you're trying to use Lucene as a RDBMS. It doesn't do this very well,
> > that's
> > not what it was built for. There are several options...
> > > get clever with your index such that you don't do anything like join
> > tables. This implies that you re-design your data layout, probably
> > de-normalizing lots of data, etc.
> > > Use a hybrid solution. That is, use Lucene to search text and then do
> > whatever further relational processing you need in the database. You
> need
> > to
> > store enough information in the Lucene documents to be able to query the
> > database.
> > > stick with a database if it works for you already.
> >
> > In general, it's a mis-use of lucene to try to get RDBMS behavior out of
> > it.
> > When you find yourself trying to do this, take a few minutes and ask
> > yourself if this design is appropriate, and continue only if you can
> > answer
> > in the affirmative...
> >
> > Best
> > Erick
> >
> > On 2/22/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
> > >
> > > Hello
> > > In our application we have to index the database tables, there is two
> > way
> > > to
> > > make this
> > >
> > > 1- index each table in a separate directory and then keep all relation
> > in
> > > order to get right result. in this method, we should use filters to
> > > overcome
> > > the problem of searching on another search result.
> > > 2. joining two or more tables and index the result of join query.
> > >
> > > which approach is better, reliable, has acceptable performance.
> > >
> > > thanks
> > > --
> > > Regards,
> > > Mohammad
> > >
> >
>
>
>
> --
> Regards,
> Mohammad
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message