Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 36093 invoked from network); 29 Jul 2009 17:05:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Jul 2009 17:05:51 -0000 Received: (qmail 28142 invoked by uid 500); 29 Jul 2009 17:05:50 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 28087 invoked by uid 500); 29 Jul 2009 17:05:50 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 28077 invoked by uid 99); 29 Jul 2009 17:05:50 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jul 2009 17:05:50 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of markrmiller@gmail.com designates 209.85.211.177 as permitted sender) Received: from [209.85.211.177] (HELO mail-yw0-f177.google.com) (209.85.211.177) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jul 2009 17:05:39 +0000 Received: by ywh7 with SMTP id 7so61241ywh.21 for ; Wed, 29 Jul 2009 10:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=8Xqgi23bvaJBHHOyX4cOG7SiSHAUhArO4TjF2u4KhS4=; b=VMoM7VR6EUKu3AcZgVCVOMaskpq1/woIQqWHQVyPVe/QPY/lDEg44LXrta/ZLl7JxP jgJOsaDxQtC5Q7wTMr7P0h7NejcYwwYu1/NIEDa1g+j5TqwihT21Lto/DWxyiDHXGQJc itHzbdnafWDLWJxwhGQG4JIbixktE5dKqyRg0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=YOWSPIRKeUArEG7NdtG/JKBjB9olmq26Zq1IFa3AiNaUYpOTHFSPJGnTAeaNZU8v95 gKhrYpTlYmHdw7x2alHM+Su0eB8TXIecpWSOAkCDPrZRnIIgykaQk75uAFl8AidXa9Mo OeSZPIX75jIeYRm0Jd48zMVk5w08SJcie3xfA= MIME-Version: 1.0 Received: by 10.100.251.7 with SMTP id y7mr12127247anh.178.1248887118915; Wed, 29 Jul 2009 10:05:18 -0700 (PDT) In-Reply-To: <24723404.post@talk.nabble.com> References: <24667109.post@talk.nabble.com> <359a92830907280925t431f452asf79be663cdc1fb8a@mail.gmail.com> <24703547.post@talk.nabble.com> <2D127F11DC79714E9B6A43AC9458147F1BD65BD7@suex07-mbx-03.ad.syr.edu> <24711445.post@talk.nabble.com> <2D127F11DC79714E9B6A43AC9458147F1BD65BE3@suex07-mbx-03.ad.syr.edu> <24723404.post@talk.nabble.com> Date: Wed, 29 Jul 2009 17:05:18 +0000 Message-ID: Subject: Re: Multiline Regex with Lucene From: Mark Miller To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016368e1a5e1667f3046fdb312a X-Virus-Checked: Checked by ClamAV on apache.org --0016368e1a5e1667f3046fdb312a Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit >>I came across qsol where in the paragraphseperator and sentence seperator >>can be specified and string can be searched within the paragraph. Qsol does this by using SpanQuerys. First you inject special marker tokens as your paragraph/sentence markers, then you use a SpanNotQuery that looks for a SpanNearQuery that doesn't intersect with a SpanTermQuery containing the special marker term. -- - Mark http://www.lucidimagination.com On Wed, Jul 29, 2009 at 4:56 PM, ba3 wrote: > > Hi Steve, > > I went through the article. Thanks for the link. The span query mentions > the > position i.e n positions from the terms. The problem was like this : > > Lucene was made by Doug cutting > > If Doug is found between the words Lucene and cutting then it is a hit. > [there can be any number of positions which is unknown]. If the number of > positions are known then the spans could be used. > > I came across qsol where in the paragraphseperator and sentence seperator > can be specified and string can be searched within the paragraph. > > Can you give your comments. > > - Rgds > Ba3 > > > > > > Steven A Rowe wrote: > > > > Hi ba3, > > > > Did you read the Lucid Imagination article I linked to?: > > > > http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ > > > > > > It has examples, including specifying the term indicating the end of the > > span. > > > > If the article doesn't do it for you, I need more information to be able > > to help. Can you give an example of what you want to do? > > > > Thanks, > > Steve > > > >> -----Original Message----- > >> From: ba3 [mailto:sbadhrinath@gmail.com] > >> Sent: Tuesday, July 28, 2009 10:39 PM > >> To: java-user@lucene.apache.org > >> Subject: RE: Multiline Regex with Lucene > >> > >> > >> Hi Steve, > >> > >> In case of span queries, the span first query can specify the start of > >> the > >> span, is it possible to specify the term [not the position] indicating > >> the > >> end of the span ? > >> > >> -- Regards > >> Ba3 > >> > >> > >> Steven A Rowe wrote: > >> > > >> > Hi ba3, > >> > > >> > Check out the list of "Direct Known Subclasses" from the SpanQuery > >> > javadocs to see what's available: > >> > > >> > > >> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/spans/ > >> SpanQuery.html > >> > > >> > SpanRegexQuery may be what you're looking for: > >> > > >> > > >> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/regex/ > >> SpanRegexQuery.html > >> > > >> > > >> > Steve > >> > > >> >> -----Original Message----- > >> >> From: ba3 [mailto:sbadhrinath@gmail.com] > >> >> Sent: Tuesday, July 28, 2009 12:53 PM > >> >> To: java-user@lucene.apache.org > >> >> Subject: Re: Multiline Regex with Lucene > >> >> > >> >> > >> >> Hi, > >> >> > >> >> Thanks for the pointers. I will try the span queries. > >> >> But can span query support regexp as a term ? > >> >> > >> >> Also for more details in the problem : > >> >> The problem is like this: > >> >> find a search string inside a block of statements. > >> >> The block starts with a string and ends with a character. > >> >> > >> >> -- Regards > >> >> Ba3 > >> >> > >> >> > >> >> > >> >> Erick Erickson wrote: > >> >> > > >> >> > I doubt you're thinking in terms of tokens. Your inputstream is > >> broken > >> >> up > >> >> > into tokens (think of them as words, > >> >> > depending upon the analyzer) and regex searchers are > >> >> > confined to those *tokens*. So the concept of a multi-line > >> >> > regex in a search is kind of ...odd... > >> >> > > >> >> > You could possibly index your input as UN_TOKENIZED, but > >> >> > I really have no clue what Lucene would do with that. I think > >> >> > you're off in uncharted territory here. > >> >> > > >> >> > Perhaps a better thing would be for you to explain *why* you > >> >> > want to do this and perhaps folks can come up with some > >> >> > suggestions, I suspect this may be an XY problem, see > >> >> > http://www.perlmonks.org/index.pl?node_id=542341 > >> >> > > >> >> > Best > >> >> > Erick > >> >> > > >> >> > On Sun, Jul 26, 2009 at 9:52 AM, ba3 > >> wrote: > >> >> > > >> >> >> > >> >> >> I was trying to do a regex search with the lucene and > >> >> >> JavaUtilRegexCapabilities. > >> >> >> The code used is : > >> >> >> RegexQuery query = new RegexQuery(new > >> >> >> Term("contents","(?m)hello.*(\r[^#]*)This is to be > >> >> >> searched.*(\r[^#]*)#")); > >> >> >> query.setRegexImplementation(new JavaUtilRegexCapabilities()); > >> >> >> > >> >> >> I verified the regex in : http://www.gskinner.com/RegExr/ [with > >> the > >> >> >> multi > >> >> >> line checked] > >> >> >> In lucene though there are no hits. Can you please point me in > >> the > >> >> right > >> >> >> direction > >> >> >> > >> >> >> -- Rgds > >> >> >> Ba3 > >> >> >> > >> >> >> Regex : > >> >> >> hello.*(\r[^#]*)This is to be searched.*(\r[^#]*)# > >> >> >> > >> >> >> Content : > >> >> >> hello world > >> >> >> This is to be searched > >> >> >> # > >> >> >> Test line should not be selected > >> >> >> hello > >> >> >> This should not work > >> >> >> some other lines > >> >> >> # > >> >> >> Not to be selected > >> >> >> hello world > >> >> >> Some lines > >> >> >> This is to be searched > >> >> >> Some lines > >> >> >> # > >> >> >> hello earth > >> >> >> some lines > >> >> >> # > >> >> >> -- > >> >> >> View this message in context: > >> >> >> > >> >> http://www.nabble.com/Multiline-Regex-with-Lucene- > >> tp24667109p24667109.html > >> >> >> Sent from the Lucene - Java Users mailing list archive at > >> Nabble.com. > >> >> >> > >> >> >> > >> >> >> ----------------------------------------------------------------- > >> ---- > >> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> >> >> > >> >> >> > >> >> > > >> >> > > >> >> > >> >> -- > >> >> View this message in context: http://www.nabble.com/Multiline-Regex- > >> with- > >> >> Lucene-tp24667109p24703547.html > >> >> Sent from the Lucene - Java Users mailing list archive at > >> Nabble.com. > >> >> > >> >> > >> >> -------------------------------------------------------------------- > >> - > >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> > For additional commands, e-mail: java-user-help@lucene.apache.org > >> > > >> > > >> > > >> > >> -- > >> View this message in context: http://www.nabble.com/Multiline-Regex- > >> with-Lucene-tp24667109p24711445.html > >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > -- > View this message in context: > http://www.nabble.com/Multiline-Regex-with-Lucene-tp24667109p24723404.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016368e1a5e1667f3046fdb312a--