lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Items disappearing from Solr index
Date Thu, 27 Sep 2012 11:30:05 GMT
Wild shot in the dark....

What happens if you switch from StreamingUpdateSolrServer to HttpSolrServer?

What I'm wondering is if somehow you're getting a queueing problem. If you have
multiple threads defined for SUSS, it might be possible (and I'm guessing) that
the delete bit is getting sent after some of the adds. Frankly I doubt this is
the case, but this issue is so weird that I'm grasping at straws.

BTW, there's no reason to optimize twice. Actually, the new thinking is that
optimizing usually isn't necessary anyway. But if you insist on optimizing
there's no reason to do it _both_ after the deletes and after the adds, just
do it after the adds.

Best
Erick

On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue <kissuenow@gmail.com> wrote:
> #What is the field type for that field - string or text?
>
> It is a string type.
>
> Thanks.
>
> On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky <jack@basetechnology.com>wrote:
>
>> What is the field type for that field - string or text?
>>
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Kissue Kissue
>> Sent: Wednesday, September 26, 2012 1:43 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Items disappearing from Solr index
>>
>> # It is looking for documents with "Emory" in the specified field OR "Labs"
>> in the default search field.
>>
>> This does not seem to be the case. For instance issuing a deleteByQuery for
>> catalogueId: "PEARL LINGUISTICS LTD" also deletes the contents of a
>> catalogueId with the value: "Ncl_**MacNaughtonMcGregorCoaching_**
>> vf010811".
>>
>> Thanks.
>>
>> On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky <jack@basetechnology.com>*
>> *wrote:
>>
>>  It is looking for documents with "Emory" in the specified field OR "Labs"
>>> in the default search field.
>>>
>>> -- Jack Krupansky
>>>
>>> -----Original Message----- From: Kissue Kissue
>>> Sent: Wednesday, September 26, 2012 7:47 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Items disappearing from Solr index
>>>
>>>
>>> I have just solved this problem.
>>>
>>> We have a field called catalogueId. One possible value for this field
>>> could
>>> be "Emory Labs". I found out that when the following delete by query is
>>> sent to solr:
>>>
>>> getSolrServer().deleteByQuery(****catalogueId + ":" + Emory Labs)
>>>  [Notice
>>>
>>> that
>>> there are no quotes surrounding the catalogueId value - Emory Labs]
>>>
>>> For some reason this delete by query ends up deleting the contents of some
>>> other random catalogues too which is the reason why we are loosing items
>>> from the index. When the query is changed to:
>>>
>>> getSolrServer().deleteByQuery(****catalogueId + ":" + "Emory Labs"),
>>> then it
>>>
>>> starts to correctly delete only items in the Emory Labs catalogue.
>>>
>>> So my first question is, what exactly does deleteByQuery do in the first
>>> query without the quotes? How is it determining which catalogues to
>>> delete?
>>>
>>> Secondly, shouldn't the correct behaviour be not to delete anything at all
>>> in this case since when a search is done for the same catalogueId without
>>> the quotes it just simply returns no results?
>>>
>>> Thanks.
>>>
>>>
>>> On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue <kissuenow@gmail.com>
>>> wrote:
>>>
>>>  Hi Erick,
>>>
>>>>
>>>> Thanks for your reply. Yes i am using delete by query. I am currently
>>>> logging the number of items to be deleted before handing off to solr. And
>>>> from solr logs i can it deleted exactly that number. I will verify
>>>> further.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson <erickerickson@gmail.com
>>>> >
>>>> **wrote:
>>>>
>>>>
>>>>  How do you delete items? By ID or by query?
>>>>
>>>>>
>>>>> My guess is that one of two things is happening:
>>>>> 1> your delete process is deleting too much data.
>>>>> 2> your index process isn't indexing what you think.
>>>>>
>>>>> I'd add some logging to the SolrJ program to see what
>>>>> it thinks is has deleted or added to the index and go from there.
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>> On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue <kissuenow@gmail.com>
>>>>> wrote:
>>>>> > Hi,
>>>>> >
>>>>> > I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer
>>>>> to
>>>>> > index and delete items from solr.
>>>>> >
>>>>> > I basically index items from the db into solr every night. Existing
>>>>> items
>>>>> > can be marked for deletion in the db and a delete request sent to
solr
>>>>> to
>>>>> > delete such items.
>>>>> >
>>>>> > My process runs as follows every night:
>>>>> >
>>>>> > 1. Check if items have been marked for deletion and delete from
solr.
>>>>> > I
>>>>> > commit and optimize after the entire solr deletion runs.
>>>>> > 2. Index any new items to solr. I commit and optimize after all
the >
>>>>> new
>>>>> > items have been added.
>>>>> >
>>>>> > Recently i started noticing that huge chunks of items that have
not >
>>>>> been
>>>>> > marked for deletion are disappearing from the index. I checked the
>
>>>>> solr
>>>>> > logs and the logs indicate that it is deleting exactly the number
of
>>>>> items
>>>>> > requested but still a lot of other items disappear from the index
from
>>>>> time
>>>>> > to time. Any ideas what might be causing this or what i am doing
>
>>>>> wrong.
>>>>> >
>>>>> >
>>>>> > Thanks.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>

Mime
View raw message