lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohammad Norouzi" <mnr...@gmail.com>
Subject Re: a question about indexing database tables
Date Mon, 26 Feb 2007 13:33:40 GMT
Thank you very much Erick.
Really I didnt know about this. I've just thought we can not add two
different documents.
now my question is, if I index the data in the way you said and now I have a
query say, "(table1_name:john) AND (table2_address:Adam street)"
using MultiFieldQueryParser, what is the result?
is the result from both documents and if yes, which document will the Hits
return?
is the result a union of two documents or intersection?

thank you very much indeed

On 2/26/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> No, that's not what I was talking about. Remember that there's no
> requirement in Lucene that each document have the same fields. So, you
> have
> something like...
>
> Document doc = new Document()
> doc.add("table1_id", id);
> doc.add("table1_name", name);
> IndexWriter.add(doc);
>
>
> Document doc = new Document();
> doc.add("table2_id_emp", employeeId);
> doc.add("table2_address", address);
> IndexWriter.add(doc);
>
>
> You now have two documents in the index, one with fields "table1_id",
> "table1_name" and the other has fields "table2_id_emp", "table2_address".
>
> These two documents are entirely orthogonal. That is, they have no fields
> in
> common. Even if the values for these fields are the same (say for some
> strange reason your name from table1 has a value of "nelson" and the
> address
> from table2 also has a value of "nelson". These don't interfere with each
> other since searching for "table1_name:nelson" would never look in the
> field
> "table2_address".
>
> So, all your tables can be stored in the same *index* (not the same
> document, and most certainly not in the same field). They are all separate
> because no two fields are the same for rows (documents) from different
> tables.
>
> The basic idea is that you index one lucene document for each row.
>
> That said, I can't imagine that this is all you want to do. A one-for-one
> mapping of table rows to documents is almost sure to be not the best
> design.
> You'll probably want to de-normalize your data for easy lookup etc.
> There'll
> be some up-front design work to get optimal performance. Especially,
> there's
> no sense of performing joins in Lucene, and you shouldn't try.
>
> Overall, use Lucene for searching/sorting text, use your RDBMS for
> relational things.
>
> Best
> Erick
>
> On 2/26/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
> >
> > Hi Erick
> > thank you and sorry for taking long my reply. I am involving in a
> project.
> >
> > I was thinking of your idea about storing all tables in the same field.
> it
> > seems to me a good idea, but some vague issues.
> > first, how to create a lucene's document. did you mean, storing all
> tables
> > by joining all tables?
> > if no, how to determine each row to be inserted in the index file?
> >
> > second, let's consider we indexed all tables as you said, how to find
> out
> > the data related in hierarchy of tables.
> >
> > please have a look at following structure, I want to know whether I
> > understand you or not?
> >
> > table1_id     table1_name table1_family   table2_id_emp  table2_address
> >     1                   x1               f1
> > 1                     street1
> >     2                   x2               f2
> > 1                     street2
> >     3                   x3               f3
> > 2                     street6
> >
> >
> > On 2/22/07, Erick Erickson <erickerickson@gmail.com> wrote:
> > >
> > > OK, I was off on a tangent. We've had several discussions where people
> > > were
> > > effectively trying to replace a RDBMS with Lucene and finding out it
> > that
> > > RDBMSs are very good at what they do <G>...
> > >
> > > But in general, I'd probably approach it by doing the RDBMS work first
> > and
> > > indexing the result. I think this is your option (2). Yes, this will
> > > de-normalize a bunch of your data and you'll chew up some space, but
> > disk
> > > space is cheap. Very cheap <G>.
> > >
> > > One thing to remember, though, that took me a while to get used to,
> > > especially when I had my database hat on. There's no requirement that
> > > every
> > > document in a Lucene index have the same fields. Conceptually, you can
> > > store
> > > *all* your tables in the same index. So a document for table one has
> > > fields
> > > table_1_field1 table_1_field2 table_1_field3. "documents" for table
> two
> > > have
> > > fields table_2_field1 table_2_field2 etc.....
> > >
> > > These documents will never interfere with each other during searches
> > > because
> > > they share no fields (and each query goes against a particular field).
> > >
> > > I mention this because your maintenance will be much easier if you
> only
> > > have
> > > one index <G>....
> > >
> > > Best
> > > Erick
> > >
> > > On 2/22/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
> > > >
> > > > Thanks Erick
> > > > but we have to because we need to execute very big queries that
> create
> > > > traffik network and are very very slow. but with lucene we do it in
> > some
> > > > milliseconds. and now we indexed our needed information by joining
> > > tables.
> > > > it works fine, besides, it returns the exact result as we can get
> from
> > > > database. we indexed about one million records.
> > > > but let me say, we are not using it instead of database, we use it
> to
> > > > generate some dynamic reports that if we did it by sql queries, it
> > would
> > > > take about 15 minutes.
> > > >
> > > > On 2/22/07, Erick Erickson <erickerickson@gmail.com> wrote:
> > > > >
> > > > > don't do either one <G>.... Search this mail archive for
> discussions
> > > of
> > > > > databases, there are several long threads discussing this along
> with
> > > > > various
> > > > > options on how to make this work. See particularly a mail entitled
> > > > > *Oracle/Lucene
> > > > > integration -status- *and any discussions participated in by
> Marcelo
> > > > > Ochoa.
> > > > >
> > > > > But, in general, Lucene is a text search engine, NOT a RDBMS. When
> > you
> > > > > start
> > > > > saying "keep all relation in order to get right result", it sounds
> > > like
> > > > > you're trying to use Lucene as a RDBMS. It doesn't do this very
> > well,
> > > > > that's
> > > > > not what it was built for. There are several options...
> > > > > > get clever with your index such that you don't do anything like
> > join
> > > > > tables. This implies that you re-design your data layout, probably
> > > > > de-normalizing lots of data, etc.
> > > > > > Use a hybrid solution. That is, use Lucene to search text and
> then
> > > do
> > > > > whatever further relational processing you need in the database.
> You
> > > > need
> > > > > to
> > > > > store enough information in the Lucene documents to be able to
> query
> > > the
> > > > > database.
> > > > > > stick with a database if it works for you already.
> > > > >
> > > > > In general, it's a mis-use of lucene to try to get RDBMS behavior
> > out
> > > of
> > > > > it.
> > > > > When you find yourself trying to do this, take a few minutes and
> ask
> > > > > yourself if this design is appropriate, and continue only if you
> can
> > > > > answer
> > > > > in the affirmative...
> > > > >
> > > > > Best
> > > > > Erick
> > > > >
> > > > > On 2/22/07, Mohammad Norouzi <mnrz57@gmail.com> wrote:
> > > > > >
> > > > > > Hello
> > > > > > In our application we have to index the database tables, there
> is
> > > two
> > > > > way
> > > > > > to
> > > > > > make this
> > > > > >
> > > > > > 1- index each table in a separate directory and then keep all
> > > relation
> > > > > in
> > > > > > order to get right result. in this method, we should use filters
> > to
> > > > > > overcome
> > > > > > the problem of searching on another search result.
> > > > > > 2. joining two or more tables and index the result of join
> query.
> > > > > >
> > > > > > which approach is better, reliable, has acceptable performance.
> > > > > >
> > > > > > thanks
> > > > > > --
> > > > > > Regards,
> > > > > > Mohammad
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Mohammad
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > Mohammad
> >
>



-- 
Regards,
Mohammad

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message