Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 82248108D7 for ; Wed, 13 Nov 2013 18:56:56 +0000 (UTC) Received: (qmail 6763 invoked by uid 500); 13 Nov 2013 18:56:52 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 6676 invoked by uid 500); 13 Nov 2013 18:56:52 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 6659 invoked by uid 99); 13 Nov 2013 18:56:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Nov 2013 18:56:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of williams.tricia.list@gmail.com designates 209.85.219.47 as permitted sender) Received: from [209.85.219.47] (HELO mail-oa0-f47.google.com) (209.85.219.47) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Nov 2013 18:56:46 +0000 Received: by mail-oa0-f47.google.com with SMTP id i7so962115oag.20 for ; Wed, 13 Nov 2013 10:56:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=QbAR5eu7/BqbJ03pQMbvSrjZ3OLNZhdbNsLeJ+Oe+Sc=; b=fqI9f18Bn7zcvXHMqWNXoD2ieW9koKHN2LQ11Hq3yFdmFt4O9V1KB+6Pu0D7i2YtHV eM+34qDrSWKZ2uleUwv86cH4A0GMt/YZTW+odARAEoMwUWIuRJB5GAAQpeulr5HaN3h+ KWZc7kApPnKIvHHH832BvVm4xjXQFPqjj0YaDoIGTqg1iowJYpnG4I/Cxncxdu0VKi2q qjkgjBP2VERUCUGQb8MOcPGimjYce4MGhs6hm3CwGkTOD6jYGxmE7aERzR6uzVthTLlA hjKcNZEMPonlwmzEOoig0LwxJjilQQ2qzyhxhCCRzyvDqo0LWnV17zkcXxQSYdV2yx5M pYYQ== X-Received: by 10.182.73.231 with SMTP id o7mr31173305obv.34.1384368985928; Wed, 13 Nov 2013 10:56:25 -0800 (PST) MIME-Version: 1.0 Received: by 10.60.132.39 with HTTP; Wed, 13 Nov 2013 10:55:45 -0800 (PST) From: P Williams Date: Wed, 13 Nov 2013 11:55:45 -0700 Message-ID: Subject: Using data-config.xml from DIH in SolrJ To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=047d7bfe93fea4fad804eb1385c8 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bfe93fea4fad804eb1385c8 Content-Type: text/plain; charset=ISO-8859-1 Hi All, I'm building a utility (Java jar) to create SolrInputDocuments and send them to a HttpSolrServer using the SolrJ API. The intention is to find an efficient way to create documents from a large directory of files (where multiple files make one Solr document) and be sent to a remote Solr instance for update and commit. I've already solved the problem using the DataImportHandler (DIH) so I have a data-config.xml that describes the templated fields and cross-walking of the source(s) to the schema. The original data won't always be able to be co-located with the Solr server which is why I'm looking for another option. I've also already solved the problem using ant and xslt to create a temporary (and unfortunately a potentially large) document which the UpdateHandler will accept. I couldn't think of a solution that took advantage of the XSLT support in the UpdateHandler because each document is created from multiple files. Our current dated Java based solution significantly outperforms this solution in terms of disk and time. I've rejected it based on that and gone back to the drawing board. Does anyone have any suggestions on how I might be able to reuse my DIH configuration in the SolrJ context without re-inventing the wheel (or DIH in this case)? If I'm doing something ridiculous I hope you'll point that out too. Thanks, Tricia --047d7bfe93fea4fad804eb1385c8--