Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id ABC9AD5EF for ; Thu, 27 Sep 2012 11:30:37 +0000 (UTC) Received: (qmail 69022 invoked by uid 500); 27 Sep 2012 11:30:34 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 68845 invoked by uid 500); 27 Sep 2012 11:30:33 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 68826 invoked by uid 99); 27 Sep 2012 11:30:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 11:30:33 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 11:30:26 +0000 Received: by obhx4 with SMTP id x4so2134781obh.35 for ; Thu, 27 Sep 2012 04:30:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=GqaRpw6ebrqeKKGlxwH6g70yZfdUGFRDdiEnbF9ol5E=; b=kd1zZV2c/jR7zIIdb0FeTg0MkMJuBVdawqd0qO5xdFJlmgcnEftZvoZwusVBvUOL9B nsQcT/5dWg33QmG1RDWIImzpy2ZTJAo2WYxe0Zo+BSmW6YMOp7jwbhvEZ45ScbDObBZ7 tnN6fDj7SvfMDdBX93EaazyUk8++9yfTm4agZ8KSfUtL6puse37DL8CJxE8FzX5jZtsW 9MsjVNkA98EGHE8gCcrUo18FuKyvCHcPJ2xkLJRqdyeADLzyL2RNOsKyXLRhgoZGjNX4 RnokQPp36Ll1mzHF4Ro1ZyY9WclAEqv2a5Rt5MLlfaYxv21n6Djn/ZVv7vr+HKkJ/xsL txAQ== MIME-Version: 1.0 Received: by 10.60.28.6 with SMTP id x6mr2950679oeg.61.1348745405658; Thu, 27 Sep 2012 04:30:05 -0700 (PDT) Received: by 10.60.84.39 with HTTP; Thu, 27 Sep 2012 04:30:05 -0700 (PDT) In-Reply-To: References: <0452A9C787884B6993243BCAA0A34181@JackKrupansky> Date: Thu, 27 Sep 2012 07:30:05 -0400 Message-ID: Subject: Re: Items disappearing from Solr index From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Wild shot in the dark.... What happens if you switch from StreamingUpdateSolrServer to HttpSolrServer? What I'm wondering is if somehow you're getting a queueing problem. If you have multiple threads defined for SUSS, it might be possible (and I'm guessing) that the delete bit is getting sent after some of the adds. Frankly I doubt this is the case, but this issue is so weird that I'm grasping at straws. BTW, there's no reason to optimize twice. Actually, the new thinking is that optimizing usually isn't necessary anyway. But if you insist on optimizing there's no reason to do it _both_ after the deletes and after the adds, just do it after the adds. Best Erick On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue wrote: > #What is the field type for that field - string or text? > > It is a string type. > > Thanks. > > On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky wrote: > >> What is the field type for that field - string or text? >> >> >> -- Jack Krupansky >> >> -----Original Message----- From: Kissue Kissue >> Sent: Wednesday, September 26, 2012 1:43 PM >> >> To: solr-user@lucene.apache.org >> Subject: Re: Items disappearing from Solr index >> >> # It is looking for documents with "Emory" in the specified field OR "Labs" >> in the default search field. >> >> This does not seem to be the case. For instance issuing a deleteByQuery for >> catalogueId: "PEARL LINGUISTICS LTD" also deletes the contents of a >> catalogueId with the value: "Ncl_**MacNaughtonMcGregorCoaching_** >> vf010811". >> >> Thanks. >> >> On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky * >> *wrote: >> >> It is looking for documents with "Emory" in the specified field OR "Labs" >>> in the default search field. >>> >>> -- Jack Krupansky >>> >>> -----Original Message----- From: Kissue Kissue >>> Sent: Wednesday, September 26, 2012 7:47 AM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Items disappearing from Solr index >>> >>> >>> I have just solved this problem. >>> >>> We have a field called catalogueId. One possible value for this field >>> could >>> be "Emory Labs". I found out that when the following delete by query is >>> sent to solr: >>> >>> getSolrServer().deleteByQuery(****catalogueId + ":" + Emory Labs) >>> [Notice >>> >>> that >>> there are no quotes surrounding the catalogueId value - Emory Labs] >>> >>> For some reason this delete by query ends up deleting the contents of some >>> other random catalogues too which is the reason why we are loosing items >>> from the index. When the query is changed to: >>> >>> getSolrServer().deleteByQuery(****catalogueId + ":" + "Emory Labs"), >>> then it >>> >>> starts to correctly delete only items in the Emory Labs catalogue. >>> >>> So my first question is, what exactly does deleteByQuery do in the first >>> query without the quotes? How is it determining which catalogues to >>> delete? >>> >>> Secondly, shouldn't the correct behaviour be not to delete anything at all >>> in this case since when a search is done for the same catalogueId without >>> the quotes it just simply returns no results? >>> >>> Thanks. >>> >>> >>> On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue >>> wrote: >>> >>> Hi Erick, >>> >>>> >>>> Thanks for your reply. Yes i am using delete by query. I am currently >>>> logging the number of items to be deleted before handing off to solr. And >>>> from solr logs i can it deleted exactly that number. I will verify >>>> further. >>>> >>>> Thanks. >>>> >>>> >>>> On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson >>> > >>>> **wrote: >>>> >>>> >>>> How do you delete items? By ID or by query? >>>> >>>>> >>>>> My guess is that one of two things is happening: >>>>> 1> your delete process is deleting too much data. >>>>> 2> your index process isn't indexing what you think. >>>>> >>>>> I'd add some logging to the SolrJ program to see what >>>>> it thinks is has deleted or added to the index and go from there. >>>>> >>>>> Best >>>>> Erick >>>>> >>>>> On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue >>>>> wrote: >>>>> > Hi, >>>>> > >>>>> > I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer >>>>> to >>>>> > index and delete items from solr. >>>>> > >>>>> > I basically index items from the db into solr every night. Existing >>>>> items >>>>> > can be marked for deletion in the db and a delete request sent to solr >>>>> to >>>>> > delete such items. >>>>> > >>>>> > My process runs as follows every night: >>>>> > >>>>> > 1. Check if items have been marked for deletion and delete from solr. >>>>> > I >>>>> > commit and optimize after the entire solr deletion runs. >>>>> > 2. Index any new items to solr. I commit and optimize after all the > >>>>> new >>>>> > items have been added. >>>>> > >>>>> > Recently i started noticing that huge chunks of items that have not > >>>>> been >>>>> > marked for deletion are disappearing from the index. I checked the > >>>>> solr >>>>> > logs and the logs indicate that it is deleting exactly the number of >>>>> items >>>>> > requested but still a lot of other items disappear from the index from >>>>> time >>>>> > to time. Any ideas what might be causing this or what i am doing > >>>>> wrong. >>>>> > >>>>> > >>>>> > Thanks. >>>>> >>>>> >>>>> >>>> >>>> >>> >>