Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 80394 invoked from network); 5 Jun 2005 05:12:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 5 Jun 2005 05:12:07 -0000 Received: (qmail 3651 invoked by uid 500); 5 Jun 2005 05:12:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 3620 invoked by uid 500); 5 Jun 2005 05:12:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 3607 invoked by uid 99); 5 Jun 2005 05:12:01 -0000 X-ASF-Spam-Status: No, hits=1.8 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from smtp109.mail.sc5.yahoo.com (HELO smtp109.mail.sc5.yahoo.com) (66.163.170.7) by apache.org (qpsmtpd/0.28) with SMTP; Sat, 04 Jun 2005 22:12:00 -0700 Received: (qmail 66489 invoked from network); 5 Jun 2005 05:11:58 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Received:Message-ID:Date:From:User-Agent:X-Accept-Language:MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding; b=e42s0rdaDeura650AKJ7avjwShblLF5YD0fxsiwyJCCtj8iZ0qna9phkTHCw9g3XN/Hokoch6VRajeE2UJe5MufuaOfBzNGqEdgFdPI8yICsLRzoLrsWNu6qvxMIemqkTMHdi66fp8bPtnwPEzEl+lyc+CoBCP8cA29kPbX68Yk= ; Received: from unknown (HELO ?192.168.15.5?) (rhodepc@67.171.78.161 with plain) by smtp109.mail.sc5.yahoo.com with SMTP; 5 Jun 2005 05:11:58 -0000 Message-ID: <42A2899D.6060303@yahoo.com> Date: Sun, 05 Jun 2005 01:11:57 -0400 From: Phillip Rhodes User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: What *is* a lucene document? Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N I understand that "Documents are the primary retrievable units from a Lucene query" But I don't know if I want to have 12 documents in the lucene index that represent the same business object, or if I should place 12 different business documents within the lucene index. Here is the background: I want to index a product catalog (some data in database and some data on the filesystem, I have cross-reference between the two). Each product is associated to attributes, categories and one or more PDF/MS Word documents, HTML descriptions, images, etc... A product could have 12 different files associated to it. Is it okay if I create as many documents as assets that I want to return from a search and add information to each document tying it back to the product that it is assocated with? Is that the right approach? Thanks, it's keeping me up at night. BTW, I am working on a release of a professional-grade ecommerce suite that is open-source (apache license), I wouldn't mind help on the lucene/search stuff. There's plenty more for me to do. 120+ tables, going to prod for a client this weekend (without search;) Contact me! --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org