lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lu <chris...@gmail.com>
Subject Re: Combine data from index and db before sorting and pagination
Date Thu, 02 Sep 2010 01:07:21 GMT
If there is an API to adjust the inverted index directly, it would be much
efficient.

I guess Mirko's problem is similar to this: There could be a "main_record"
table and "category" table. Each "main_record" has a "category".
When one "category" is changed, quite some "main_record" are affected.

If we denormalize the data, which is the only way currently for good sorting
performance, we would need to re-index all the affected documents.
However, all the re-indexing work is quite inefficient.

Let's suppose the "category" is using Field.Index.NOT_ANALYZED and
Field.Store.YES.

So in the inverted index is conceptually like this:
 "category_1": doc1,doc2,doc5,doc10.
 "category_2": doc3,doc4,doc7,doc8.
If the only change is that several "category_1" records are changed to
"category_2", take doc5 and doc10 for example, after all the reindexing
effort, the only changes is:
 "category_1": doc1,doc2.
 "category_2": doc3,doc4,doc5,doc7,doc8,doc10.

Of course, to support this efficiently could be a big change, affecting all
the nice efficient DocDelta storage.

-- 
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Wed, Sep 1, 2010 at 4:29 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> The usual first choice when using Lucene to search database data is to
> denormalize the db data into the index. Yes, it's redundant, but it's often
> a better solution than trying to use both. Synchronization can be an issue,
> but you have to deal with that anyway since you're indexing from the db
> anyway.
>
>  But you haven't given us any indication of how much data you're talking
> about here. Without some such detail, it's really hard to make a
> recommendation.
>
> Best
> Erick
>
> On Wed, Sep 1, 2010 at 9:30 AM, Sertic Mirko, Bedag
> <Mirko.Sertic@bedag.ch>wrote:
>
> > The data from db is required for sorting, and one db entry matches to
> many
> > index entries, so storing it in the index would be redundant. Also there
> > would be the challenge to keep index and db in sync. Any ideas?
> >
> > Mirko
> >
> > -----Urspr√ľngliche Nachricht-----
> > Von: Ian Lea [mailto:ian.lea@gmail.com]
> > Gesendet: Mittwoch, 1. September 2010 15:17
> > An: java-user@lucene.apache.org
> > Betreff: Re: Combine data from index and db before sorting and pagination
> >
> > If the sorting and pagination doesn't require data from the database,
> > just do db lookups for the hits on a page, page by page as required.
> > But if the db data is required I'd suggest storing it in the index.
> >
> >
> > --
> > Ian.
> >
> > On Wed, Sep 1, 2010 at 1:43 PM, Sertic Mirko, Bedag
> > <Mirko.Sertic@bedag.ch> wrote:
> > > Hi
> > >
> > >
> > >
> > > I need to implement sorting and pagination of lucene search results.
> > > This is quite easy, but I have to combine Data from the index with data
> > > from a database. The index has the fulltext data plus a unique
> > > identifier for a record from the database. The database stores
> > > additional data. Fulltext search is only done on the index. I need to
> > > combine the search results from the index and the additional data from
> > > the database before sorting and pagination.
> > >
> > >
> > >
> > > Is the IndexReader.document() Method the right place to enrich the data
> > > from the index with data from the db? How should I implement this
> > > functionality with lucene?
> > >
> > >
> > >
> > > Thanks in advance
> > >
> > > Mirko
> > >
> > >
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message