lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Document updates work as delete/add under the hood
Date Mon, 13 Jul 2015 15:32:28 GMT
bq: Is there any generic benchmark analysis done on the update rate of lucene
saying that It can handle X number of document updates without any
performance issues

_Of course_ there will be a performance hit when indexing, the question is
whether it's tolerable given your environment. How big is the doc? How
complex is the analysis chain? Heavens, you could even be using
ExtractingRequestHandler (in the generic case).

I get approximately 2,000 docs/second when running against a single
solr on my Mac Book Pro. I've seen some situations where that
number is as high as 10K. These numbers are pretty useless to you
though, as would generic benchmarks. When you say
"performance issues" you've implied that there are queries being
fired at Solr at the same time, but you haven't characterized them
at all.


About all you can do is set up a stress test and measure. Here's a long
backgrounder:

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Sun, Jul 12, 2015 at 11:20 PM, chalitha udara Perera
<chalithaudara@gmail.com> wrote:
> Hi Erick,
>
> Thanks for the explanation. I am doing some experiments on off-line
> clustering on document features indexed in lucene and update few document
> fields in order provide different search experience.
> E.g. for text documents insert cluster ID for doc that document belongs to,
> for images create bag-of-visual words.
> In case of assigning cluster IDs it is possible to use numeric DocValues.
> For cases in which that I cannot use DocValues (E.g bag of visual words), I
> will have to use conventional updates. Currently I am not really using a
> massive data-set and therefore update rate is not a problem.
>
> Is there any generic benchmark analysis done on the update rate of lucene
> saying that It can handle X number of document updates without any
> performance issues ?
>
> Thanks,
> Chalitha
>
> On Fri, Jul 10, 2015 at 10:01 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> Well, if it's a docValues field you can do update in place at the Lucene
>> level
>> for certain types of simple values (numerics, strings, but not text types)
>> see: https://issues.apache.org/jira/browse/LUCENE-5189
>>
>> In essence the reason it's a delete/re-add is that the
>> structure of the postings list and the promise that Lucene segments
>> are write-once makes it fiendishly complex and error-prone.
>>
>> So yes, it's extra work. But do you have any evidence that your update
>> rate is such that it's prohibitive?
>>
>> Best,
>> Erick
>>
>> On Fri, Jul 10, 2015 at 2:45 AM, Gimantha Bandara <gimantha@wso2.com>
>> wrote:
>> > ah.. I misread the thread,I thought you were using two APIs to acheive
>> the
>> > same done by updateDocument. Yes it is an overhead and harder for user to
>> > keep track of the fields that he doesn't need to update. Already there
>> is a
>> > Jira opened for this[1].
>> >
>> > [1] https://issues.apache.org/jira/browse/LUCENE-4258
>> >
>> > On Fri, Jul 10, 2015 at 1:58 PM, chalitha udara Perera <
>> > chalithaudara@gmail.com> wrote:
>> >
>> >> Hi Gimatha,
>> >>
>> >> Yes. It is possible to use IndexWriter updateDocument() to update
>> document.
>> >> But with that method what happens under the hood is it deletes matching
>> >> documents and re-index new document. I need to update only a single
>> field.
>> >> Re-indexing a new document with updated field + other fields seems to be
>> >> big overhead. My question is, why lucene does that and currently is
>> there a
>> >> way we can avoid this ?
>> >>
>> >> Thanks,
>> >> Chalitha
>> >>
>> >> On Fri, Jul 10, 2015 at 1:46 PM, Gimantha Bandara <gimantha@wso2.com>
>> >> wrote:
>> >>
>> >> > Hi Chalitha,
>> >> >
>> >> > You can simply use indexWriter.updateDocument to update the existing
>> >> index
>> >> > documents
>> >> >
>> >> > On Fri, Jul 10, 2015 at 11:38 AM, chalitha udara Perera <
>> >> > chalithaudara@gmail.com> wrote:
>> >> >
>> >> > > Hi All,
>> >> > >
>> >> > > I have a requirement for updating lucene index (add single field
for
>> >> > > existing docs and modify value of another field). These documents
>> >> contain
>> >> > > many other fields that do not need any modifications. But as I
>> >> understand
>> >> > > luence provides delete/add mechanism for even single field value
>> >> > updates. I
>> >> > > would really  appreciate if someone can explain me why lucene
use
>> these
>> >> > > delete/add for updates as it feels like a real bottleneck.
>> >> > >
>> >> > > Is there any way to do single fields updates without using
>> delete/add ?
>> >> > >
>> >> > > Thanks,
>> >> > > Chalitha
>> >> > >
>> >> > > --
>> >> > > J.M Chalitha Udara Perera
>> >> > >
>> >> > > *Department of Computer Science and Engineering,*
>> >> > > *University of Moratuwa,*
>> >> > > *Sri Lanka*
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Gimantha Bandara
>> >> > Software Engineer
>> >> > WSO2. Inc : http://wso2.com
>> >> > Mobile : +94714961919
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> J.M Chalitha Udara Perera
>> >>
>> >> *Department of Computer Science and Engineering,*
>> >> *University of Moratuwa,*
>> >> *Sri Lanka*
>> >>
>> >
>> >
>> >
>> > --
>> > Gimantha Bandara
>> > Software Engineer
>> > WSO2. Inc : http://wso2.com
>> > Mobile : +94714961919
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>
> --
> J.M Chalitha Udara Perera
>
> *Department of Computer Science and Engineering,*
> *University of Moratuwa,*
> *Sri Lanka*

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message