Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 29831 invoked from network); 3 Sep 2009 20:37:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Sep 2009 20:37:22 -0000 Received: (qmail 73001 invoked by uid 500); 3 Sep 2009 20:37:22 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 72945 invoked by uid 500); 3 Sep 2009 20:37:22 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 72935 invoked by uid 99); 3 Sep 2009 20:37:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 20:37:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 20:37:18 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7A087234C004 for ; Thu, 3 Sep 2009 13:36:57 -0700 (PDT) Message-ID: <1146863580.1252010217485.JavaMail.jira@brutus> Date: Thu, 3 Sep 2009 13:36:57 -0700 (PDT) From: "Jason Rutherglen (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Updated: (SOLR-1395) Integrate Katta In-Reply-To: <2053115520.1251587733028.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated SOLR-1395: ----------------------------------- Attachment: KATTA-SOLR.patch SOLR-1395.patch This is our first cut at integrating Katta with Solr. The KattaClientTest test case shows a Katta cluster being created locally, a couple of cores/shards being placed into the cluster, then a query being executed that returns the correct number of results. It takes about 30s - 1.5min to run (hopefully that can be reduced?). Today Solr shards map to Solr servers. Here we map shards to cores, where there can be multiple shards per server or in Katta parlance a node. We assume the shards exist in Hadoop HDFS. Katta copies the shards to a local Solr server to make them searchable (and incrementally updateable). h3. Instructions for Installation * Download Katta trunk "svn co https://katta.svn.sourceforge.net/svnroot/katta/trunk kattatrunk". Download the KATTA-SOLR.patch to kattatrunk. run "patch -p 0 -i KATTA-SOLR.patch", "ant -jar", "ant jar-test". * Download a Solr trunk "svn co http://svn.apache.org/repos/asf/lucene/solr/trunk solrtrunk". Copy from kattatrunk: lib/log4j-1.2.13.jar lib/zookeeper-3.1.1.jar lib/hadoop-core-0.19.0.jar build/katta-core-0.6-dev.jar build/test-katta-core-0.6-dev.jar to solrtrunk/lib * Download SOLR-1395.patch to solrtrunk. Run "patch -p 0 -i SOLR-1395.patch". * Run a test while in solrtrunk "ant test-core -Dtestcase=KattaClientTest" h3. General Notes * SearchHandler's HttpCommComponent has been abstracted out. There's a CommComponent interface, AbstractCommComponent implements the generic multithreading ShardRequest -> ShardResponse logic. EmbeddedSearchHandler executes requests on a set of local cores. HttpCommComponent implements requests over HTTP. KattaCommComponent distributes requests using Katta's Hadoop RPC mechanism. * The patch enables all of Solr's distributed request types. All current distributed requests should work as is with no modifications. * Shards/Solr cores may be managed dynamically and remotely administered from a centralized location (whereas today Solr typically requires SSHing and manually editing files etc) * Solr Katta has built in failover, this is tested in KattaClientFailoverTest * When a shard is deployed to a Solr server, the schema and solrconfig are deployed with it. This begs the question of how updates to the solrconfig and schema are deployed. Redeploying solrconfig is fairly simple, whereas a schema change implies recreating the entire shard. * Maybe there's an easy way to interface with Hadoop index creation (i.e. as easy as Solr's HTTP based update command) The patch was created by Jason Venner and Jason Rutherglen > Integrate Katta > --------------- > > Key: SOLR-1395 > URL: https://issues.apache.org/jira/browse/SOLR-1395 > Project: Solr > Issue Type: New Feature > Affects Versions: 1.4 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 1.5 > > Attachments: KATTA-SOLR.patch, SOLR-1395.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > We'll integrate Katta into Solr so that: > * Distributed search uses Hadoop RPC > * Shard/SolrCore distribution and management > * Zookeeper based failover > * Indexes may be built using Hadoop -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.