lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohammad Norouzi" <mnr...@gmail.com>
Subject Re: a question about indexing database tables
Date Mon, 26 Feb 2007 05:15:42 GMT
Hi Erick
thank you and sorry for taking long my reply. I am involving in a project.

I was thinking of your idea about storing all tables in the same field. it
seems to me a good idea, but some vague issues.
first, how to create a lucene's document. did you mean, storing all tables
by joining all tables?
if no, how to determine each row to be inserted in the index file?

second, let's consider we indexed all tables as you said, how to find out
the data related in hierarchy of tables.

please have a look at following structure, I want to know whether I
understand you or not?

table1_id     table1_name table1_family   table2_id_emp  table2_address
    1                   x1               f1
1                     street1
    2                   x2               f2
1                     street2
    3                   x3               f3
2                     street6


On 2/22/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> OK, I was off on a tangent. We've had several discussions where people
> were
> effectively trying to replace a RDBMS with Lucene and finding out it that
> RDBMSs are very good at what they do <G>...
>
> But in general, I'd probably approach it by doing the RDBMS work first and
> indexing the result. I think this is your option (2). Yes, this will
> de-normalize a bunch of your data and you'll chew up some space, but disk
> space is cheap. Very cheap <G>.
>
> One thing to remember, though, that took me a while to get used to,
> especially when I had my database hat on. There's no requirement that
> every
> document in a Lucene index have the same fields. Conceptually, you can
> store
> *all* your tables in the same index. So a document for table one has
> fields
> table_1_field1 table_1_field2 table_1_field3. "documents" for table two
> have
> fields table_2_field1 table_2_field2 etc.....
>
> These documents will never interfere with each other during searches
> because
> they share no fields (and each query goes against a particular field).
>
> I mention this because your maintenance will be much easier if you only
> have
> one index <G>....
>
> Best
> Erick
>
> On 2/22/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
> >
> > Thanks Erick
> > but we have to because we need to execute very big queries that create
> > traffik network and are very very slow. but with lucene we do it in some
> > milliseconds. and now we indexed our needed information by joining
> tables.
> > it works fine, besides, it returns the exact result as we can get from
> > database. we indexed about one million records.
> > but let me say, we are not using it instead of database, we use it to
> > generate some dynamic reports that if we did it by sql queries, it would
> > take about 15 minutes.
> >
> > On 2/22/07, Erick Erickson <erickerickson@gmail.com> wrote:
> > >
> > > don't do either one <G>.... Search this mail archive for discussions
> of
> > > databases, there are several long threads discussing this along with
> > > various
> > > options on how to make this work. See particularly a mail entitled
> > > *Oracle/Lucene
> > > integration -status- *and any discussions participated in by Marcelo
> > > Ochoa.
> > >
> > > But, in general, Lucene is a text search engine, NOT a RDBMS. When you
> > > start
> > > saying "keep all relation in order to get right result", it sounds
> like
> > > you're trying to use Lucene as a RDBMS. It doesn't do this very well,
> > > that's
> > > not what it was built for. There are several options...
> > > > get clever with your index such that you don't do anything like join
> > > tables. This implies that you re-design your data layout, probably
> > > de-normalizing lots of data, etc.
> > > > Use a hybrid solution. That is, use Lucene to search text and then
> do
> > > whatever further relational processing you need in the database. You
> > need
> > > to
> > > store enough information in the Lucene documents to be able to query
> the
> > > database.
> > > > stick with a database if it works for you already.
> > >
> > > In general, it's a mis-use of lucene to try to get RDBMS behavior out
> of
> > > it.
> > > When you find yourself trying to do this, take a few minutes and ask
> > > yourself if this design is appropriate, and continue only if you can
> > > answer
> > > in the affirmative...
> > >
> > > Best
> > > Erick
> > >
> > > On 2/22/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
> > > >
> > > > Hello
> > > > In our application we have to index the database tables, there is
> two
> > > way
> > > > to
> > > > make this
> > > >
> > > > 1- index each table in a separate directory and then keep all
> relation
> > > in
> > > > order to get right result. in this method, we should use filters to
> > > > overcome
> > > > the problem of searching on another search result.
> > > > 2. joining two or more tables and index the result of join query.
> > > >
> > > > which approach is better, reliable, has acceptable performance.
> > > >
> > > > thanks
> > > > --
> > > > Regards,
> > > > Mohammad
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Mohammad
> >
>



-- 
Regards,
Mohammad

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message