Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 13771 invoked from network); 19 Feb 2007 16:05:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Feb 2007 16:05:40 -0000 Received: (qmail 38400 invoked by uid 500); 19 Feb 2007 16:05:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 38372 invoked by uid 500); 19 Feb 2007 16:05:41 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 38361 invoked by uid 99); 19 Feb 2007 16:05:41 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Feb 2007 08:05:41 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of erickerickson@gmail.com designates 64.233.182.191 as permitted sender) Received: from [64.233.182.191] (HELO nf-out-0910.google.com) (64.233.182.191) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Feb 2007 08:05:30 -0800 Received: by nf-out-0910.google.com with SMTP id i2so2243383nfe for ; Mon, 19 Feb 2007 08:05:09 -0800 (PST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=HKGZziKpOzRr4xx+ER4y7xCnI308XVrhrH3Hp31FRHCiC3VobhgTcm7cK81GGZUSZ/FcAp9khNrvjhNIUKVyzQaNoNRiL1UiZDM0IPBWd+rOMJqKDEevxZq9Wa91ZqhTVNDtcFfpeeIAk3RjLqV0dwcxvJeMjN1gMLX69J8JJW8= Received: by 10.82.177.3 with SMTP id z3mr11074856bue.1171901108878; Mon, 19 Feb 2007 08:05:08 -0800 (PST) Received: by 10.82.162.20 with HTTP; Mon, 19 Feb 2007 08:05:08 -0800 (PST) Message-ID: <359a92830702190805s352a055ct13a6a7418f2e6f65@mail.gmail.com> Date: Mon, 19 Feb 2007 11:05:08 -0500 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: Fields In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_38280_26919114.1171901108600" References: X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_38280_26919114.1171901108600 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline See below. On 2/19/07, Kainth, Sachin wrote: > > Hi all, > > I have a few question regarding indexing documents. > > 1. With my experience of indexing documents with lucene so far I have > done things like: > > Doc.Add(Field.Text("album", Album)); > > Where Album is a string representing an album name. Now with this sort > of indexing what I do is a search such as: > > "album:Thriller" > > a) Does this mean that I cannot do an search across all fields by > submitting the query: > > "Thriller"? In other words by submitting this query would my code > search all fields? No. If you just submit "Thriller", you'll only search the default field. See QueryParser for the default field. b) Is there a way in which I can index elements of a document without > naming the field. What would the impact of such a use of the indexing > capabilities of Lucene be? I don't think this makes sense in Lucene terms. All elements in a document have a field. You can index everything into one field if you need an aggregate, which gives you this same result. Do note, however, that there's no requirement that all documents have the same fields. 2. Is there a limit to the number of > a) named fields per document that I can store I think there is, but it's absurdly high. Don't worry about this.... b) non-named fields per document that I can store 0 since I don't think you can. 3. > > a) Is it possible in Lucene to conduct searches that are very complex > such as: > > ((album = Thriller AND artist = (Michael OR Jackson)) OR (date between X > AND Y)) AND (label = sony OR Epic) etc... Yes b) For such a query what are the performance penalties compared to a > simple search involving 1 term? In the immortal words of Mr. Hatcher.. .it depends. You'll really just have to experiment and find out. It can probably be approximated by taking the sum of the individual queries as the upper limit. The real killer is wildcards..... The real question isn't "what is the effect on performance", it's "is the performance good enough for my application". Which varies as the characteristics of the database change. I would argue that a 1M index will process arbitrarily complex queries "fast enough". The same may not be true for a 100G index. So this question is really unanswerable in the abstract. Cheers > > Sachin > > > > This email and any attached files are confidential and copyright > protected. If you are not the addressee, any dissemination of this > communication is strictly prohibited. Unless otherwise expressly agreed in > writing, nothing stated in this communication shall be legally binding. > > The ultimate parent company of the Atkins Group is WS Atkins > plc. Registered in England No. 1885586. Registered Office Woodcote Grove, > Ashley Road, Epsom, Surrey KT18 5BW. > > Consider the environment. Please don't print this e-mail unless you really > need to. > ------=_Part_38280_26919114.1171901108600--