lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: a question about indexing database tables
Date Mon, 26 Feb 2007 13:54:09 GMT
You'll get no hits since there is no document in the index that has both
fields.

On 2/26/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
>
> Thank you very much Erick.
> Really I didnt know about this. I've just thought we can not add two
> different documents.
> now my question is, if I index the data in the way you said and now I have
> a
> query say, "(table1_name:john) AND (table2_address:Adam street)"
> using MultiFieldQueryParser, what is the result?
> is the result from both documents and if yes, which document will the Hits
> return?
> is the result a union of two documents or intersection?
>
> thank you very much indeed
>
> On 2/26/07, Erick Erickson <erickerickson@gmail.com> wrote:
> >
> > No, that's not what I was talking about. Remember that there's no
> > requirement in Lucene that each document have the same fields. So, you
> > have
> > something like...
> >
> > Document doc = new Document()
> > doc.add("table1_id", id);
> > doc.add("table1_name", name);
> > IndexWriter.add(doc);
> >
> >
> > Document doc = new Document();
> > doc.add("table2_id_emp", employeeId);
> > doc.add("table2_address", address);
> > IndexWriter.add(doc);
> >
> >
> > You now have two documents in the index, one with fields "table1_id",
> > "table1_name" and the other has fields "table2_id_emp",
> "table2_address".
> >
> > These two documents are entirely orthogonal. That is, they have no
> fields
> > in
> > common. Even if the values for these fields are the same (say for some
> > strange reason your name from table1 has a value of "nelson" and the
> > address
> > from table2 also has a value of "nelson". These don't interfere with
> each
> > other since searching for "table1_name:nelson" would never look in the
> > field
> > "table2_address".
> >
> > So, all your tables can be stored in the same *index* (not the same
> > document, and most certainly not in the same field). They are all
> separate
> > because no two fields are the same for rows (documents) from different
> > tables.
> >
> > The basic idea is that you index one lucene document for each row.
> >
> > That said, I can't imagine that this is all you want to do. A
> one-for-one
> > mapping of table rows to documents is almost sure to be not the best
> > design.
> > You'll probably want to de-normalize your data for easy lookup etc.
> > There'll
> > be some up-front design work to get optimal performance. Especially,
> > there's
> > no sense of performing joins in Lucene, and you shouldn't try.
> >
> > Overall, use Lucene for searching/sorting text, use your RDBMS for
> > relational things.
> >
> > Best
> > Erick
> >
> > On 2/26/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
> > >
> > > Hi Erick
> > > thank you and sorry for taking long my reply. I am involving in a
> > project.
> > >
> > > I was thinking of your idea about storing all tables in the same
> field.
> > it
> > > seems to me a good idea, but some vague issues.
> > > first, how to create a lucene's document. did you mean, storing all
> > tables
> > > by joining all tables?
> > > if no, how to determine each row to be inserted in the index file?
> > >
> > > second, let's consider we indexed all tables as you said, how to find
> > out
> > > the data related in hierarchy of tables.
> > >
> > > please have a look at following structure, I want to know whether I
> > > understand you or not?
> > >
> > > table1_id     table1_name table1_family
> table2_id_emp  table2_address
> > >     1                   x1               f1
> > > 1                     street1
> > >     2                   x2               f2
> > > 1                     street2
> > >     3                   x3               f3
> > > 2                     street6
> > >
> > >
> > > On 2/22/07, Erick Erickson <erickerickson@gmail.com> wrote:
> > > >
> > > > OK, I was off on a tangent. We've had several discussions where
> people
> > > > were
> > > > effectively trying to replace a RDBMS with Lucene and finding out it
> > > that
> > > > RDBMSs are very good at what they do <G>...
> > > >
> > > > But in general, I'd probably approach it by doing the RDBMS work
> first
> > > and
> > > > indexing the result. I think this is your option (2). Yes, this will
> > > > de-normalize a bunch of your data and you'll chew up some space, but
> > > disk
> > > > space is cheap. Very cheap <G>.
> > > >
> > > > One thing to remember, though, that took me a while to get used to,
> > > > especially when I had my database hat on. There's no requirement
> that
> > > > every
> > > > document in a Lucene index have the same fields. Conceptually, you
> can
> > > > store
> > > > *all* your tables in the same index. So a document for table one has
> > > > fields
> > > > table_1_field1 table_1_field2 table_1_field3. "documents" for table
> > two
> > > > have
> > > > fields table_2_field1 table_2_field2 etc.....
> > > >
> > > > These documents will never interfere with each other during searches
> > > > because
> > > > they share no fields (and each query goes against a particular
> field).
> > > >
> > > > I mention this because your maintenance will be much easier if you
> > only
> > > > have
> > > > one index <G>....
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On 2/22/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
> > > > >
> > > > > Thanks Erick
> > > > > but we have to because we need to execute very big queries that
> > create
> > > > > traffik network and are very very slow. but with lucene we do it
> in
> > > some
> > > > > milliseconds. and now we indexed our needed information by joining
> > > > tables.
> > > > > it works fine, besides, it returns the exact result as we can get
> > from
> > > > > database. we indexed about one million records.
> > > > > but let me say, we are not using it instead of database, we use it
> > to
> > > > > generate some dynamic reports that if we did it by sql queries, it
> > > would
> > > > > take about 15 minutes.
> > > > >
> > > > > On 2/22/07, Erick Erickson <erickerickson@gmail.com> wrote:
> > > > > >
> > > > > > don't do either one <G>.... Search this mail archive for
> > discussions
> > > > of
> > > > > > databases, there are several long threads discussing this along
> > with
> > > > > > various
> > > > > > options on how to make this work. See particularly a mail
> entitled
> > > > > > *Oracle/Lucene
> > > > > > integration -status- *and any discussions participated in by
> > Marcelo
> > > > > > Ochoa.
> > > > > >
> > > > > > But, in general, Lucene is a text search engine, NOT a RDBMS.
> When
> > > you
> > > > > > start
> > > > > > saying "keep all relation in order to get right result", it
> sounds
> > > > like
> > > > > > you're trying to use Lucene as a RDBMS. It doesn't do this very
> > > well,
> > > > > > that's
> > > > > > not what it was built for. There are several options...
> > > > > > > get clever with your index such that you don't do anything
> like
> > > join
> > > > > > tables. This implies that you re-design your data layout,
> probably
> > > > > > de-normalizing lots of data, etc.
> > > > > > > Use a hybrid solution. That is, use Lucene to search text
and
> > then
> > > > do
> > > > > > whatever further relational processing you need in the database.
> > You
> > > > > need
> > > > > > to
> > > > > > store enough information in the Lucene documents to be able
to
> > query
> > > > the
> > > > > > database.
> > > > > > > stick with a database if it works for you already.
> > > > > >
> > > > > > In general, it's a mis-use of lucene to try to get RDBMS
> behavior
> > > out
> > > > of
> > > > > > it.
> > > > > > When you find yourself trying to do this, take a few minutes
and
> > ask
> > > > > > yourself if this design is appropriate, and continue only if
you
> > can
> > > > > > answer
> > > > > > in the affirmative...
> > > > > >
> > > > > > Best
> > > > > > Erick
> > > > > >
> > > > > > On 2/22/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
> > > > > > >
> > > > > > > Hello
> > > > > > > In our application we have to index the database tables,
there
> > is
> > > > two
> > > > > > way
> > > > > > > to
> > > > > > > make this
> > > > > > >
> > > > > > > 1- index each table in a separate directory and then keep
all
> > > > relation
> > > > > > in
> > > > > > > order to get right result. in this method, we should use
> filters
> > > to
> > > > > > > overcome
> > > > > > > the problem of searching on another search result.
> > > > > > > 2. joining two or more tables and index the result of join
> > query.
> > > > > > >
> > > > > > > which approach is better, reliable, has acceptable
> performance.
> > > > > > >
> > > > > > > thanks
> > > > > > > --
> > > > > > > Regards,
> > > > > > > Mohammad
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Mohammad
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Mohammad
> > >
> >
>
>
>
> --
> Regards,
> Mohammad
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message