From java-dev-return-12115-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Fri Nov 11 16:29:58 2005 Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 47470 invoked from network); 11 Nov 2005 16:29:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 11 Nov 2005 16:29:57 -0000 Received: (qmail 23901 invoked by uid 500); 11 Nov 2005 16:29:53 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 23860 invoked by uid 500); 11 Nov 2005 16:29:53 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 23842 invoked by uid 99); 11 Nov 2005 16:29:52 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Nov 2005 08:29:52 -0800 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [216.136.174.139] (HELO smtp101.mail.sc5.yahoo.com) (216.136.174.139) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 11 Nov 2005 08:29:45 -0800 Received: (qmail 25993 invoked from network); 11 Nov 2005 16:29:31 -0000 Received: from unknown (HELO ?192.168.1.104?) (dmsmith555@24.210.224.227 with plain) by smtp101.mail.sc5.yahoo.com with SMTP; 11 Nov 2005 16:29:31 -0000 Message-ID: <4374C6EF.6050506@gmail.com> Date: Fri, 11 Nov 2005 11:29:35 -0500 From: DM Smith User-Agent: Mozilla Thunderbird 1.0.7 (Windows/20050923) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: Basic Question on Documents and File Format References: <20051111155116.39743.qmail@web36102.mail.mud.yahoo.com> In-Reply-To: <20051111155116.39743.qmail@web36102.mail.mud.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Ashwin Satyanarayana wrote: >Hello, > >I am new to Lucene. I was trying to use Lucene with TREC-6 Data. The dataset for TREC-6 used in 1997 contains many input files. Each input file hasmultiple documents >(some files contain over 200 documents) tagged by DOCNO. The result given >by Lucene to a query is a list of files and not documents. > >Q1) Is there a way of getting the query results in terms of documents >within the files rather than files ( without modifying the code)? > > In lucene a Document object is the unit of search/storage/indexing. It may or may not correspond to an user's view of files or documents. > >Q2) If the above is not posssible, what would be the best way to modify >the code? > > To achieve what you want, I think you need to store and/or index each of your documents as a lucene Document. You may also want to store the file name and document identifier as a lucene field in the lucene Document. > >Thanks and Regards, >Ashwin > Questions on how to use lucene should be addressed to the lucene users mailing list. This one is for developers developing lucene itself. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org