lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guido Bartolucci <>
Subject Re: Lucene as a primary datastore
Date Wed, 20 Jan 2010 21:25:09 GMT
Thanks for the response. I understand all of what you wrote, but what
I care about and what I had a little trouble describing exactly in my
previous question is:

- Are all problems with Lucene obvious (e.g., you get an exception and
you know your data is now bad) or are there subtle corruptions that
just happen and because of that it makes sense to constantly rebuild
the index?

I ask this because if this isn't the case then replication isn't going
to help, the problems probably get copied over to the other instances
(unless I'm missing something).


On Wed, Jan 20, 2010 at 11:40 AM, Chris Lu <> wrote:
> I have 3 concerns of making Lucene as a primary database.
> 1) Lucene is stable when it's stable. But you will have java exceptions.
> What would you do when FileNotFoundException or "Lucene 2.9.1 'read past
> EOF' IOException under system load" happens?
> For me, I don't the data is safe this way. Or, you can understand all Lucene
> APIs and never make any mistakes.
> Some databases, like some versions of mysql, could corrupt data. No better,
> but it's still more robust.
> 2) As the name suggests, Lucene index is just an index, like database index,
> it's an auxiliary data structure. It's only fast in one way, but could be
> slow in other ways.
> 3) The more robust approach is to pull data out of database, and create a
> Lucene index. In case something goes wrong, you can always pull data out
> again and create the index again.
> --
> Chris Lu
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site:
> demo:
> Lucene Database Search in 3 minutes:
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
> Guido Bartolucci wrote:
>> I know that the primary use case for Lucene is as an index of data
>> that can be reconstructed (e.g., from a relational database or from
>> spidering your corporate intranet).
>> But, I'm curious if anyone uses Lucene as their primary datastore for
>> their gold data. Is it good enough?
>> Would anyone consider (or do people already) store data in Lucene
>> that, if it was lost, would destroy their business? And no, I'm not
>> suggesting that you don't back up this data, I'm just curious if there
>> are problems with using Lucene in this way. Are there subtle
>> corruptions that might show up in Lucene that wouldn't show up in
>> Oracle or MySQL?
>> I'm considering using Lucene in this way but I haven't been able to
>> find any documentation describing this use case. Are there any studies
>> of Lucene vs MySQL running for N years comparing the corruptions and
>> recovery times?
>> Am I just ignorant and scared of Lucene and too trusting of Oracle and
>> MySQL?
>> Thanks.
>> -guido.
>> (BTW, I did find a similar question asked back in 2007 in the archives
>> but it doesn't really answer my question)
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message