Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 553A917336 for ; Fri, 31 Oct 2014 12:11:37 +0000 (UTC) Received: (qmail 67319 invoked by uid 500); 31 Oct 2014 12:11:27 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 67251 invoked by uid 500); 31 Oct 2014 12:11:27 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 67231 invoked by uid 99); 31 Oct 2014 12:11:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2014 12:11:27 +0000 X-ASF-Spam-Status: No, hits=2.3 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_SOFTFAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of oyshanss@gmail.com does not designate 216.139.236.26 as permitted sender) Received: from [216.139.236.26] (HELO sam.nabble.com) (216.139.236.26) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Oct 2014 12:11:22 +0000 Received: from ben.nabble.com ([192.168.236.152]) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1XkB2k-0008Rz-BD for solr-user@lucene.apache.org; Fri, 31 Oct 2014 05:11:02 -0700 Date: Fri, 31 Oct 2014 05:11:02 -0700 (PDT) From: 5ton3 To: solr-user@lucene.apache.org Message-ID: <1414757462338-4166822.post@n3.nabble.com> Subject: The exact same query gets executed n times for the nth row when retrieving body (plaintext) from BLOB column with Tika Entity Processor MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi! Not sure if this is a problem or if I just don't understand the debug response, but it seems somewhat odd to me. The "main" entity can have multiple BLOB documents. I'm using Tika Entity Processor to retrieve the body (plaintext) from these documents and put the result in a multivalued field, "filedata". The data-config looks like this: It seems to work properly, but when I debug the data import, it seems that the query on TABLE2 on the BLOB column ("FILEDATA_BIN") gets executed 1 time for document #1, which is correct, but 2 times for document #2, 3 times for document #3, and so on. I.e. for document #1: And for document #2: The result seems correct, ie. it doesn't duplicate the filedata. But why does it query the DB two times for document #2? Any ideas? Maybe something wrong in my config? -- View this message in context: http://lucene.472066.n3.nabble.com/The-exact-same-query-gets-executed-n-times-for-the-nth-row-when-retrieving-body-plaintext-from-BLOB-r-tp4166822.html Sent from the Solr - User mailing list archive at Nabble.com.