Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BD2221794D for ; Fri, 27 Mar 2015 00:14:20 +0000 (UTC) Received: (qmail 71833 invoked by uid 500); 27 Mar 2015 00:14:08 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 71760 invoked by uid 500); 27 Mar 2015 00:14:07 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 71747 invoked by uid 99); 27 Mar 2015 00:14:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Mar 2015 00:14:07 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jules@limitless.co.uk designates 85.119.248.221 as permitted sender) Received: from [85.119.248.221] (HELO smtp002.apm-internet.net) (85.119.248.221) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Mar 2015 00:14:01 +0000 Received: (qmail 92219 invoked from network); 27 Mar 2015 00:11:39 -0000 X-AV-Scan: clean X-APM-Authkey: 49369 1 Received: from unknown (HELO Jules-MacBook-Air.local) (85.119.248.236) by smtp002.apm-internet.net with SMTP; 27 Mar 2015 00:11:38 -0000 Message-ID: <5514A034.1090201@limitless.co.uk> Date: Fri, 27 Mar 2015 13:11:32 +1300 From: Julian Perry Organization: Limitless Internet Solutions User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: solr-user@lucene.apache.org Subject: Re: Build index from Oracle, adding fields References: <5514940B.2050608@limitless.co.uk> <5514994D.3030306@elyograg.org> In-Reply-To: <5514994D.3030306@elyograg.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 27/03/2015 12:42, Shawn Heisey wrote: > If that's not practical, then the only real option you have is to drop > back to one entity, and build a single SELECT statement (using JOIN and > some form of CONCAT) that will gather all the information from all the > tables at the same time, and combine multiple values together into one > SQL result field with some kind of delimiter. Then you can use the > RegexTransformer's "splitBy" functionality to turn the concatenated data > back into multiple values for your multi-valued field. Database servers > tend to be REALLY good at JOIN operations, so the database would be > doing the heavy lifting. I did try that in fact (and do it with one of my other indexes). However, with this index the sub-select can return 200 rows of 200 characters - and that blows up in Oracle as the field is over 4000 characters long (and the work-around for that is to use clob's - but that has its own performance problems). Currently I am doing this by exporting a CSV file and processing it with a C program - and then reading the CSV with SOLR :( -- Cheers Jules.