lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Lucene as a primary datastore
Date Wed, 20 Jan 2010 22:16:57 GMT
It depends (tm). From what I've seen on this list, *if* the index
gets corrupted, you'll see some exceptions somewhere. They
may be head-scratchers, but you'll get exceptions.

But when I've seen this kind of thing reported, it's been because
of coding errors. Manually unlocking the IndexWriter and opening
a second one in a second process while the first is still running
comes to mind. In other words, you have to work at it to corrupt
your index.

The Lucene guys understand thoroughly that having an index
just decide to corrupt itself is completely unacceptable, and
take great pains to insure it doesn't happen.

We typically have very infrequent updates (every quarter
or more), and we've never had one just go wonky.

So I wouldn't rebuild my indexes unless
1> the application started going west and you saw exceptions
2> you want to periodically rebuild it to test that you *can*.

Of course having your disk go bad hurts, but frequent
rebuilding isn't going to help with that....

FWIW
Erick@IHopeWereOnTrackNow


On Wed, Jan 20, 2010 at 4:25 PM, Guido Bartolucci <
guido.bartolucci@gmail.com> wrote:

> Thanks for the response. I understand all of what you wrote, but what
> I care about and what I had a little trouble describing exactly in my
> previous question is:
>
> - Are all problems with Lucene obvious (e.g., you get an exception and
> you know your data is now bad) or are there subtle corruptions that
> just happen and because of that it makes sense to constantly rebuild
> the index?
>
> I ask this because if this isn't the case then replication isn't going
> to help, the problems probably get copied over to the other instances
> (unless I'm missing something).
>
> guido.
>
>
> On Wed, Jan 20, 2010 at 11:40 AM, Chris Lu <chris.lu@gmail.com> wrote:
> > I have 3 concerns of making Lucene as a primary database.
> > 1) Lucene is stable when it's stable. But you will have java exceptions.
> > What would you do when FileNotFoundException or "Lucene 2.9.1 'read past
> > EOF' IOException under system load" happens?
> > For me, I don't the data is safe this way. Or, you can understand all
> Lucene
> > APIs and never make any mistakes.
> > Some databases, like some versions of mysql, could corrupt data. No
> better,
> > but it's still more robust.
> > 2) As the name suggests, Lucene index is just an index, like database
> index,
> > it's an auxiliary data structure. It's only fast in one way, but could be
> > slow in other ways.
> > 3) The more robust approach is to pull data out of database, and create a
> > Lucene index. In case something goes wrong, you can always pull data out
> > again and create the index again.
> >
> > --
> > Chris Lu
> > -------------------------
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes:
> >
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> > DBSight customer, a shopping comparison site, (anonymous per request) got
> > 2.6 Million Euro funding!
> >
> >
> >
> > Guido Bartolucci wrote:
> >>
> >> I know that the primary use case for Lucene is as an index of data
> >> that can be reconstructed (e.g., from a relational database or from
> >> spidering your corporate intranet).
> >>
> >> But, I'm curious if anyone uses Lucene as their primary datastore for
> >> their gold data. Is it good enough?
> >>
> >> Would anyone consider (or do people already) store data in Lucene
> >> that, if it was lost, would destroy their business? And no, I'm not
> >> suggesting that you don't back up this data, I'm just curious if there
> >> are problems with using Lucene in this way. Are there subtle
> >> corruptions that might show up in Lucene that wouldn't show up in
> >> Oracle or MySQL?
> >>
> >> I'm considering using Lucene in this way but I haven't been able to
> >> find any documentation describing this use case. Are there any studies
> >> of Lucene vs MySQL running for N years comparing the corruptions and
> >> recovery times?
> >>
> >> Am I just ignorant and scared of Lucene and too trusting of Oracle and
> >> MySQL?
> >>
> >> Thanks.
> >>
> >> -guido.
> >>
> >> (BTW, I did find a similar question asked back in 2007 in the archives
> >> but it doesn't really answer my question)
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message