lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: commit often and lot of data cost too much?
Date Wed, 01 Apr 2009 00:54:47 GMT
What kind of updates are these?  new documents?  Small changes to existing
documents?

Are the changing fields important for searching?

If the updates are not involved in searches, then it would be much better to
put the non-searched characteristics onto an alternative storage system.
That would drive down the update rate dramatically and leave you with a
pretty simple system.

If the updates *are* involved in searches, then you might consider using a
system more like Katta than solr.  You can then create a new shard out of
the update and broadcast a mass delete to all nodes just before adding the
new shard to the system.  This has the benefit of very fast add updates and
good balancing, but has the defect that you don't have persistence of your
deletes until you do a full index again.  Your search nodes could right the
updated index back to the persistent store, but that is scary without
something like hadoop to handle failed updates.

On Tue, Mar 31, 2009 at 6:51 AM, sunnyfr <johanna.34@gmail.com> wrote:

>
> I've about 14M of document. My index is about 11G.
> For the moment I update every 20mn about 30 000 documents.
> Lucene alwarys merge data, What would you reckon?
> My replication cost too much for the slave, they always bring back new
> index
> directories and no segment.
>
> Is there a way to get around this issue ? what would you reckon to people
> who need fresh update on the slave with a big amount of data ??
> Thanks a lot,
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message