lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kiwi clive <kiwi_cl...@yahoo.com>
Subject Re: Combining The results from DB and Index Regd.,
Date Tue, 13 Nov 2012 12:01:25 GMT
You could do it in page-size chunks. 


Get the db to do the searching and sorting and return the top page-size records. Do the same
for the index. You then can build a ramindex that takes the db output and index output and
creates 2*pagesize entries. Apply the same sorting mechanism and return the top page-size
records.  This way the ramindex only has to be small and the database does the heavy lifting.
- Although at the cost of some sql trickery :-)



________________________________
 From: selvakumar netaji <vvekselva.gm@gmail.com>
To: java-user@lucene.apache.org; kiwi clive <kiwi_clive@yahoo.com> 
Sent: Tuesday, November 13, 2012 11:02 AM
Subject: Re: Combining The results from DB and Index Regd.,
 
Thanks Clive,


Clive,

Can we do this way of   indexing if the RAM is limited. There would be two
indexes, one in the file system and another in in-memory index as already
mentioned. If the in-memory has reached a threshold then can we force the
manual indexing of the databases which is supposed to happen automatically
everyday. Then the RAM constraint would also be handled. Clive are there
are any other solutions.


On Tue, Nov 13, 2012 at 4:18 PM, kiwi clive <kiwi_clive@yahoo.com> wrote:

> I have used the last solution you mention many times to good effect as you
> can sort across the two data sources and merge the results.
> Obviously it depends on your architecture, RAM and and the amount of data
> you are dealing with.
>
> Clive
>
>
>
> ________________________________
>  From: selvakumar netaji <vvekselva.gm@gmail.com>
> To: java-user@lucene.apache.org
> Sent: Tuesday, November 13, 2012 3:15 AM
> Subject: Combining The results from DB and Index Regd.,
>
> Hi All,
>
>
> We are using lucene for searching data from the database in our enterprise
> application.
>
> The  searches would be in a single index, whose documents are indexed from
> two different databases A and B. The frequency of updating the database A
> is linear i.e. for every minute it gets inserted whereas the frequency of
> updating of the database B is on a weekly basis.
>
>
> The problem is with the indexing of the database A. For eg if the indexing
> got completed in t second and and a data(d1) gets inserted in (t+1) second
> then the search of Data d1 would not be in index.
>
> To avoid this data loss,
> Searching can be performed in index and in db(whose record are not in
> index). The problem over here is that we won't be able to get the score
> base ordering in database and there would be problems in combining the
> results from the db and from the index. Is there are any way to get the
> lucene score form the search results in db.
>
> The other alternative would be update the index for every 30(might be less
> than that)  sec so that the whenever the db gets updated the index gets
> updated. Is there are any other solution to update the index  directly
> whenever the db gets updated. Can you please suggest.
>
> The final solution as I've thought would be to have two indexes, one file
> system index and a in-memory index. The file system index would be indexed
> or updated on a daily basis and the in-memory index would be updated
> whenever the db changes. So we'll search both the indexes and we'll combine
> the data since both have the lucene scores. So there would not be any data
> loss.
>
> Can anyone suggest is there any other solution to avoid this kind of data
> loss problems.
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message