Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 64218 invoked from network); 17 Dec 2002 23:05:20 -0000 Received: from exchange.sun.com (HELO nagoya.betaversion.org) (192.18.33.10) by daedalus.apache.org with SMTP; 17 Dec 2002 23:05:20 -0000 Received: (qmail 19961 invoked by uid 97); 17 Dec 2002 23:06:32 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 19920 invoked by uid 97); 17 Dec 2002 23:06:31 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 19908 invoked by uid 98); 17 Dec 2002 23:06:30 -0000 X-Antivirus: nagoya (v4218 created Aug 14 2002) Date: Tue, 17 Dec 2002 15:10:45 -0800 (Pacific Standard Time) From: "Joshua O'Madadhain" To: Lucene Users List Subject: Re: Empty phrase search In-Reply-To: <02ef01c2a59b$6c27d960$3edea8c0@MinhsWindowsBox> Message-ID: X-X-Sender: jmadden@smtp.ics.uci.edu MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-MailScanner: Found to be clean X-ICS-MailScanner-SpamCheck: not spam, SpamAssassin (score=-3.3, required 5, EMAIL_ATTRIBUTION, IN_REP_TO, QUOTED_EMAIL_TEXT, SPAM_PHRASE_01_02, USER_AGENT_PINE) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Minh: Assuming that the fields aren't binary (and thus capable of having any arbitrary string in them), you should be able to come up with a short string that will never appear in the field ("emptystring", "xyxyqu23", etc.) even if there is no single character that would work as a marker. As an extra layer of insurance, you could throw out any documents whose field only contained that string _as a substring_. This may not be completely bulletproof, but it's pretty close. :) Joshua jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall It's that moment of dawning comprehension that I live for--Bill Watterson My opinions are too rational and insightful to be those of any organization. On Tue, 17 Dec 2002, Minh Kama Yie wrote: > Yep, thought of that and having argued it out with the lead on this, it > would be suitable in my opinion but even I would concede that it is a hack > in our case since there is no limit to the variety of characters that could > appear as the value for the field which may be an empty string. Hence there > isn't the 100% guarantee that there will never be the circumstance when this > special character or String of characters we choose to represent empty > fields will appear as a valid value. > > Thanks anyway though. > > Minh > > ----- Original Message ----- > From: "Joshua O'Madadhain" > To: "Lucene Users List" > Sent: Tuesday, December 17, 2002 6:03 PM > Subject: Re: Empty phrase search > > > > Minh: > > > > Why not just use a special character or string (one that won't appear in a > > non-empty field) to represent an empty field? It doesn't have to appear > > in the user interface, if that's a concern; you could convert a query > > including "FieldA:NULL" (or whatever) into a query containing > > "FieldA:emptyfield" automatically. > > > > This allows you to finesse the entire issue of adding something to > > Lucene--which may be for the best anyway, since this is really just a > > special case of looking for fields whose contents have a specific > > characteristic. > > > > Good luck-- > > > > Joshua O'Madadhain > > > > jmadden@ics.uci.edu...Obscurium Per Obscurius...www.ics.uci.edu/~jmadden > > Joshua O'Madadhain: Information Scientist, Musician, Philosopher-At-Tall > > It's that moment of dawning comprehension that I live for--Bill Watterson > > My opinions are too rational and insightful to be those of any > organization. > > > > On Tue, 17 Dec 2002, Minh Kama Yie wrote: > > > > > Thanks for that Peter. > > > > > > Unfortunately I'm not looking for "all" documents but rather documents > where > > > the fields can be empty. > > > Hence using the universal field wouldn't quite work. > > > The "empty:true" approach is interesting however it effectively doubles > the > > > number of indexable fields for a document. > > > > > > Need to assess whether or not we want to support this feature I guess. > > > > > > Currently, considering Lucene's architecture this feature needs to be > > > inherently supported rather than worked around with various fields I > think > > > for it to be used/done properly... > > > > > > Would anyone be able to point me in the general direction of where to > look > > > in the Lucene code to attempt this? > > > Hopefully this way I can give a proper cost/benefit analysis as to > whether > > > or not to support this feature...and if all goes well, release something > > > back for the great work you guys do.... > > > > > > > > > ----- Original Message ----- > > > From: "Peter Carlson" > > > To: "Lucene Users List" > > > Sent: Tuesday, December 17, 2002 4:25 PM > > > Subject: Re: Empty phrase search > > > > > > > > > > I don't think so. > > > > > > > > One approach to look for everything, or not something is to add a > field > > > > to each document which is a constant value. Like a field named exists > > > > and a value of true. > > > > > > > > Then you can do search like > > > > > > > > exists:true NOT microsoft > > > > > > > > This will find all documents without the term microsoft in them. > > > > > > > > Just to have it find documents with nothing might be a little tricky. > > > > You might want to put a field in the document which indicates the size > > > > or something like that. Or just create an empty field and look for > > > > > > > > empty:true > > > > > > > > I hope this rambling helps > > > > > > > > --Peter > > > > > > > > On Monday, December 16, 2002, at 03:24 PM, Minh Kama Yie wrote: > > > > > > > > > Hi guys, > > > > > > > > > > Just wondering if lucene indexes empty strings and if so, how to > > > > > search for this using the query language? > > > > > > > > > > > > > > > Regards, > > > > > > > > > > Minh Kama Yie > > > > > > > > > > This message is intended only for the named recipient. > > > > > If you are not the intended recipient you are notified that > > > > > disclosing, copying, distributing or taking any action > > > > > in reliance on the contents of this information is strictly > > > > > prohibited. > > > > > > > > > > > > -- > > > > To unsubscribe, e-mail: > > > > > > > For additional commands, e-mail: > > > > > > > > > > > > -- > > > To unsubscribe, e-mail: > > > > For additional commands, e-mail: > > > > > > > > > > -- > > To unsubscribe, e-mail: > > > For additional commands, e-mail: > > > > -- > To unsubscribe, e-mail: > For additional commands, e-mail: > -- To unsubscribe, e-mail: For additional commands, e-mail: