lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seid Mohammed <seidy...@gmail.com>
Subject Re: RegexQuery Incomplete Results
Date Tue, 12 May 2009 14:47:10 GMT
Thanks
it works


On 5/12/09, Mark Miller <markrmiller@gmail.com> wrote:
> Use JavaUtilRegexCapabilities or put the Jakarata RegEx jar on your
> classpath: http://jakarta.apache.org/regexp/index.html
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
> Seid Mohammed wrote:
>> I need it similar functionality, but while running the above code it
>> breaks after outputing the following
>> ========================================================================
>> Added Knowing yourself
>> Added Old clinic
>> Added INSIDE
>> Added Not INSIDE
>>
>> Default
>> RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
>>
>> org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
>> 0 hits for text:.in
>> 2 hits for text:.*in
>> 0 hits for text:.IN
>> 2 hits for text:.*IN
>> org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/regexp/RE
>> 	at
>> org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32)
>> 	at
>> org.apache.lucene.search.regex.RegexTermEnum.<init>(RegexTermEnum.java:47)
>> 	at org.apache.lucene.search.regex.RegexQuery.getEnum(RegexQuery.java:59)
>> 	at
>> org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55)
>> 	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162)
>> 	at org.apache.lucene.search.Query.weight(Query.java:94)
>> 	at org.apache.lucene.search.Hits.<init>(Hits.java:76)
>> 	at org.apache.lucene.search.Searcher.search(Searcher.java:50)
>> 	at org.apache.lucene.search.Searcher.search(Searcher.java:40)
>> 	at Regex2.main(Regex2.java:43)
>> Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE
>> 	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> 	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
>> 	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>> 	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>> 	... 10 more
>> ===================================================================
>>
>> thanks a lot
>>
>> On 5/11/09, Huntsman84 <tpgarcia84@gmail.com> wrote:
>>
>>> That's it!!!
>>>
>>> The problem was with the regular expression, the one I need is ".*IN"!!
>>>
>>> Thank you so much, I was turning mad... =)
>>>
>>>
>>> Ian Lea wrote:
>>>
>>>> The little self-contained program below runs regex queries for a few
>>>> regexps against a few phrases for both the java.util and jakarta
>>>> regexp packages.
>>>>
>>>> Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is
>>>>
>>>> Added Knowing yourself
>>>> Added Old clinic
>>>> Added INSIDE
>>>> Added Not INSIDE
>>>>
>>>> Default
>>>> RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
>>>>
>>>> org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
>>>> 0 hits for text:.in
>>>> 2 hits for text:.*in
>>>> 0 hits for text:.IN
>>>> 2 hits for text:.*IN
>>>> org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
>>>> 2 hits for text:.in
>>>> 2 hits for text:.*in
>>>> 1 hits for text:.IN
>>>> 2 hits for text:.*IN
>>>>
>>>> Hope that helps.
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> import org.apache.lucene.index.*;
>>>> import org.apache.lucene.store.*;
>>>> import org.apache.lucene.document.*;
>>>> import org.apache.lucene.analysis.*;
>>>> import org.apache.lucene.analysis.standard.*;
>>>> import org.apache.lucene.search.*;
>>>> import org.apache.lucene.search.regex.*;
>>>>
>>>> public class luctest {
>>>>
>>>>     public static void main(String[] _args) throws Exception {
>>>> 	RAMDirectory rdir = new RAMDirectory();
>>>> 	IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(),
>>>> true);
>>>> 	String[] docterms = { "Knowing yourself",
>>>> 			      "Old clinic",
>>>> 			      "INSIDE",
>>>> 			      "Not INSIDE" };
>>>>
>>>> 	for (String s : docterms) {
>>>> 	    Document d = new Document();
>>>> 	    d.add(new Field("text",
>>>> 			    s,
>>>> 			    Field.Store.YES,
>>>> 			    Field.Index.NOT_ANALYZED));
>>>> 	    writer.addDocument(d);
>>>> 	    System.out.printf("Added %s\n", s);
>>>> 	}
>>>> 	writer.close();
>>>>
>>>> 	IndexSearcher searcher = new IndexSearcher(rdir);
>>>> 	String[] queries = { ".in", ".*in", ".IN", ".*IN" };
>>>> 	RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(),
>>>> 				      new JakartaRegexpCapabilities() };
>>>> 	RegexQuery qx = new RegexQuery(new Term("x", "x"));
>>>> 	System.out.printf("\nDefault RegexCapabilities=%s\n\n",
>>>> 			  qx.getRegexImplementation());
>>>> 	for (RegexCapabilities rcap : rcaps) {
>>>> 	    System.out.println(rcap);
>>>> 	    for (String s : queries) {
>>>> 		Term t = new Term("text", s);
>>>> 		RegexQuery q = new RegexQuery(t);
>>>> 		q.setRegexImplementation(rcap);
>>>> 		Hits h = searcher.search(q);
>>>> 		System.out.printf("%s hits for %s\n",
>>>> 				  h.length(),
>>>> 				  q.toString());
>>>> 	    }
>>>> 	}
>>>>     }
>>>> }
>>>>
>>>>
>>>> On Mon, May 11, 2009 at 1:39 PM, Huntsman84 <tpgarcia84@gmail.com>
>>>> wrote:
>>>>
>>>>> The RegexQuery class uses that package, and for that reason the
>>>>> expression
>>>>> matches.
>>>>>
>>>>> If my records contained only one word each, this code would work, but
I
>>>>> need
>>>>> to apply that regular expression to a phrase...
>>>>>
>>>>>
>>>>> Ian Lea wrote:
>>>>>
>>>>>> The default regex package is java.util.regex and I can't see anywhere
>>>>>> that you tell it to use the Jakarta regexp package.  So I don't think
>>>>>> that ".in" will match.  Also, you are storing your contents field
as
>>>>>> NOT_ANALYZED so you will need to be wary of case sensitivity.  Maybe
>>>>>> this is what you want, but maybe not.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ian.
>>>>>>
>>>>>>
>>>>>> On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarcia84@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> This is the code for searching:
>>>>>>>
>>>>>>> String index = "index";
>>>>>>> String field = "contents";
>>>>>>> IndexReader reader = IndexReader.open(index);
>>>>>>> Searcher searcher = new IndexSearcher(reader);
>>>>>>>
>>>>>>> System.out.println("Enter query: ");
>>>>>>> String line = ".IN.";//in jakarta regexp this is like * IN *
>>>>>>> RegexQuery rxquery = new RegexQuery(new Term(field,line));
>>>>>>> Hits hits = searcher.search(rxquery);
>>>>>>>
>>>>>>> if(hits!=null){
>>>>>>>    for(int k = 0; k<100 && k<hits.length(); k++){
>>>>>>>        if(hits.doc(k)!=null)
>>>>>>>
>>>>>>>  System.out.println(hits.doc(k).getField("contents").stringValue());
>>>>>>>    }
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> And this is the part of creating the index:
>>>>>>>
>>>>>>>
>>>>>>> File directory = new File("index");
>>>>>>> IndexWriter writer = new IndexWriter(directory, new
>>>>>>> StandardAnalyzer(),
>>>>>>> true,
>>>>>>>                            IndexWriter.MaxFieldLength.LIMITED);
>>>>>>> List<String> records = getRecords();//returns a list of
record values
>>>>>>> from
>>>>>>> database, all of them are phrases
>>>>>>> Iterator<String> i = records.iterator();
>>>>>>> while(i.hasNext()){
>>>>>>>           Document doc = new Document();
>>>>>>>           doc.add(new Field(field, i.next(), Field.Store.YES,
>>>>>>> Field.Index.NOT_ANALYZED));
>>>>>>>        writer.addDocument(doc);
>>>>>>> }
>>>>>>> writer.optimize();
>>>>>>> writer.close();
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This code works as I want but just matching with the first word
of
>>>>>>> the
>>>>>>> phrase. I think the problem is the index building, but I don't
know
>>>>>>> how
>>>>>>> to
>>>>>>> fix it...
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Thank you so much!!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Steven A Rowe wrote:
>>>>>>>
>>>>>>>> On 5/8/2009 at 9:13 AM, Ian Lee wrote:
>>>>>>>>
>>>>>>>>> I'm surprised that it matches either - don't you need
".*in" where
>>>>>>>>> .*
>>>>>>>>> means match any character zero or more times?  See the
javadoc for
>>>>>>>>> java.util.regex.Pattern, or for Jakarta Regexp if you
are using
>>>>>>>>> that
>>>>>>>>> package.
>>>>>>>>>
>>>>>>>>> Unless you're an expert in regexps it is probably worth
playing
>>>>>>>>> with
>>>>>>>>> them outside your lucene code to start with e.g. with
simple
>>>>>>>>> String.matches(regexp) calls.  They can take some getting
used to.
>>>>>>>>> And try to avoid anything with backslashes if you can!
>>>>>>>>>
>>>>>>>> The java.util.regex.Pattern implementation (the default RegexQuery
>>>>>>>> implementation) actually uses Matcher.lookingAt(), which
is
>>>>>>>> equivalent
>>>>>>>> to
>>>>>>>> prepending a "^" anchor to the beginning of the pattern,
so if
>>>>>>>> Huntsman84
>>>>>>>> is using the default implementation, then I agree with Ian:
I'm
>>>>>>>> surprised
>>>>>>>> it matches either.
>>>>>>>>
>>>>>>>> However, the Jakarta Regexp implementation uses RE.match(),
which
>>>>>>>> does
>>>>>>>> *not* require a beginning-of-string match.
>>>>>>>>
>>>>>>>> Hunstman84, are you using the Jakarta Regexp implementation?
 If so,
>>>>>>>> then
>>>>>>>> like you, I'm surprised it's not matching both :).
>>>>>>>>
>>>>>>>> It would be useful to see some real code, including how you
index
>>>>>>>> your
>>>>>>>> records.
>>>>>>>>
>>>>>>>> Steve
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I am using RegexQuery for searching in a set of records
wich are
>>>>>>>>>> phrases of several words each. My aim is to find
any phrase that
>>>>>>>>>> contains the given group of letters (e.g. "in").
For that case,
>>>>>>>>>> I am building the query with the regular expression
".in.", so it
>>>>>>>>>> should return all phrases with contain "in", but
the search only
>>>>>>>>>> matches with the first word of the phrase.
>>>>>>>>>>
>>>>>>>>>> For example, if my records are "Knowing yourself"
and "Old
>>>>>>>>>> clinic", the correct search would return 2 matches,
but it only
>>>>>>>>>> matches with "Knowing yourself".
>>>>>>>>>>
>>>>>>>>>> How could I fix this?
>>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
>>>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
"RABI ZIDNI ILMA"

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message