lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <>
Subject Re: Lucene as a primary datastore
Date Fri, 22 Jan 2010 23:52:39 GMT
While I know that our situation is fairly unique, but we rebuild our indexes weekly.  The source
of our indexes are data marts generated from flat files.  We do this because our data changes
too rapidly for us to keep up with the changes.  We do update the indexes at runtime, but
only with about 10% of the changes.  The other changes are re processed weekly.  So Lucene
is our runtime data store for search and data retrieval.  However it is not the system of


On Jan 21, 2010, at 12:23 AM, Otis Gospodnetic wrote:

> Guido,
> No, you should absolutely not need to constantly rebuild the index.  If you find you
have to do that, you'll know you are doing something wrong.
> Otis
> --
> Sematext -- -- Solr - Lucene - Nutch
> ----- Original Message ----
>> From: Guido Bartolucci <>
>> To:
>> Sent: Wed, January 20, 2010 4:25:09 PM
>> Subject: Re: Lucene as a primary datastore
>> Thanks for the response. I understand all of what you wrote, but what
>> I care about and what I had a little trouble describing exactly in my
>> previous question is:
>> - Are all problems with Lucene obvious (e.g., you get an exception and
>> you know your data is now bad) or are there subtle corruptions that
>> just happen and because of that it makes sense to constantly rebuild
>> the index?
>> I ask this because if this isn't the case then replication isn't going
>> to help, the problems probably get copied over to the other instances
>> (unless I'm missing something).
>> guido.
>> On Wed, Jan 20, 2010 at 11:40 AM, Chris Lu wrote:
>>> I have 3 concerns of making Lucene as a primary database.
>>> 1) Lucene is stable when it's stable. But you will have java exceptions.
>>> What would you do when FileNotFoundException or "Lucene 2.9.1 'read past
>>> EOF' IOException under system load" happens?
>>> For me, I don't the data is safe this way. Or, you can understand all Lucene
>>> APIs and never make any mistakes.
>>> Some databases, like some versions of mysql, could corrupt data. No better,
>>> but it's still more robust.
>>> 2) As the name suggests, Lucene index is just an index, like database index,
>>> it's an auxiliary data structure. It's only fast in one way, but could be
>>> slow in other ways.
>>> 3) The more robust approach is to pull data out of database, and create a
>>> Lucene index. In case something goes wrong, you can always pull data out
>>> again and create the index again.
>>> --
>>> Chris Lu
>>> -------------------------
>>> Instant Scalable Full-Text Search On Any Database/Application
>>> site:
>>> demo:
>>> Lucene Database Search in 3 minutes:
>>> DBSight customer, a shopping comparison site, (anonymous per request) got
>>> 2.6 Million Euro funding!
>>> Guido Bartolucci wrote:
>>>> I know that the primary use case for Lucene is as an index of data
>>>> that can be reconstructed (e.g., from a relational database or from
>>>> spidering your corporate intranet).
>>>> But, I'm curious if anyone uses Lucene as their primary datastore for
>>>> their gold data. Is it good enough?
>>>> Would anyone consider (or do people already) store data in Lucene
>>>> that, if it was lost, would destroy their business? And no, I'm not
>>>> suggesting that you don't back up this data, I'm just curious if there
>>>> are problems with using Lucene in this way. Are there subtle
>>>> corruptions that might show up in Lucene that wouldn't show up in
>>>> Oracle or MySQL?
>>>> I'm considering using Lucene in this way but I haven't been able to
>>>> find any documentation describing this use case. Are there any studies
>>>> of Lucene vs MySQL running for N years comparing the corruptions and
>>>> recovery times?
>>>> Am I just ignorant and scared of Lucene and too trusting of Oracle and
>>>> MySQL?
>>>> Thanks.
>>>> -guido.
>>>> (BTW, I did find a similar question asked back in 2007 in the archives
>>>> but it doesn't really answer my question)
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail:
>>>> For additional commands, e-mail:
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail:
>>> For additional commands, e-mail:
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message