Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 86590 invoked from network); 9 Apr 2008 06:01:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Apr 2008 06:01:08 -0000 Received: (qmail 34299 invoked by uid 500); 9 Apr 2008 06:01:02 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 34265 invoked by uid 500); 9 Apr 2008 06:01:02 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 34256 invoked by uid 99); 9 Apr 2008 06:01:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Apr 2008 23:01:02 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.69.42.181] (HELO radix.cryptio.net) (208.69.42.181) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Apr 2008 06:00:20 +0000 Received: by radix.cryptio.net (Postfix, from userid 1007) id 2527371C385; Tue, 8 Apr 2008 23:00:33 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by radix.cryptio.net (Postfix) with ESMTP id 1F69F71C34D for ; Tue, 8 Apr 2008 23:00:33 -0700 (PDT) Date: Tue, 8 Apr 2008 23:00:33 -0700 (PDT) From: Chris Hostetter To: solr-user Subject: Re: Solr + Complex Legacy Schema -- Best Practices? In-Reply-To: <24445447.331207688460844.JavaMail.ntkach@tux> Message-ID: References: <24445447.331207688460844.JavaMail.ntkach@tux> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org : I just was wondering, has anybody dealt with trying to "translate" the : data from a big, legacy DB schema to a Solr installation? What I mean there's really no general answer to that question -- it all comes down to what you want to query on, and what kinds of results you want to get out... if you want your queries to result in lists of "products" then you should have one Document per product -- if you want to be able to query on the text of user reviews then you need to flatten all the user reviews for each product into the Document for each product. Sometimes you'll want two types of Documents ... one Document per product, containing all the text of all the user reviews, and one Document per user review, with the Product information duplicated in each so you can search for... q=reviewtext:solr&fq=doctype:product&fq=productype:camera ...to get a list of all the products that are cameras that contain the word solr in the text of *a* review, or you can search for... q=reviewtext:solr&fq=doctype:review&fq=producttype:camera ...to get a list of all the reviews that contain the word solr, and are about products that are cameras. Your use cases and goals will be differnet then everyone elses. -Hoss