lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seid Mohammed <seidy...@gmail.com>
Subject Re: RegexQuery Incomplete Results
Date Tue, 12 May 2009 14:21:24 GMT
I need it similar functionality, but while running the above code it
breaks after outputing the following
========================================================================
Added Knowing yourself
Added Old clinic
Added INSIDE
Added Not INSIDE

Default RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0

org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
0 hits for text:.in
2 hits for text:.*in
0 hits for text:.IN
2 hits for text:.*IN
org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/regexp/RE
	at org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32)
	at org.apache.lucene.search.regex.RegexTermEnum.<init>(RegexTermEnum.java:47)
	at org.apache.lucene.search.regex.RegexQuery.getEnum(RegexQuery.java:59)
	at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55)
	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162)
	at org.apache.lucene.search.Query.weight(Query.java:94)
	at org.apache.lucene.search.Hits.<init>(Hits.java:76)
	at org.apache.lucene.search.Searcher.search(Searcher.java:50)
	at org.apache.lucene.search.Searcher.search(Searcher.java:40)
	at Regex2.main(Regex2.java:43)
Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE
	at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
	at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
	... 10 more
===================================================================

thanks a lot

On 5/11/09, Huntsman84 <tpgarcia84@gmail.com> wrote:
>
> That's it!!!
>
> The problem was with the regular expression, the one I need is ".*IN"!!
>
> Thank you so much, I was turning mad... =)
>
>
> Ian Lea wrote:
>>
>> The little self-contained program below runs regex queries for a few
>> regexps against a few phrases for both the java.util and jakarta
>> regexp packages.
>>
>> Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is
>>
>> Added Knowing yourself
>> Added Old clinic
>> Added INSIDE
>> Added Not INSIDE
>>
>> Default
>> RegexCapabilities=org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
>>
>> org.apache.lucene.search.regex.JavaUtilRegexCapabilities@0
>> 0 hits for text:.in
>> 2 hits for text:.*in
>> 0 hits for text:.IN
>> 2 hits for text:.*IN
>> org.apache.lucene.search.regex.JakartaRegexpCapabilities@0
>> 2 hits for text:.in
>> 2 hits for text:.*in
>> 1 hits for text:.IN
>> 2 hits for text:.*IN
>>
>> Hope that helps.
>>
>> --
>> Ian.
>>
>>
>> import org.apache.lucene.index.*;
>> import org.apache.lucene.store.*;
>> import org.apache.lucene.document.*;
>> import org.apache.lucene.analysis.*;
>> import org.apache.lucene.analysis.standard.*;
>> import org.apache.lucene.search.*;
>> import org.apache.lucene.search.regex.*;
>>
>> public class luctest {
>>
>>     public static void main(String[] _args) throws Exception {
>> 	RAMDirectory rdir = new RAMDirectory();
>> 	IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), true);
>> 	String[] docterms = { "Knowing yourself",
>> 			      "Old clinic",
>> 			      "INSIDE",
>> 			      "Not INSIDE" };
>>
>> 	for (String s : docterms) {
>> 	    Document d = new Document();
>> 	    d.add(new Field("text",
>> 			    s,
>> 			    Field.Store.YES,
>> 			    Field.Index.NOT_ANALYZED));
>> 	    writer.addDocument(d);
>> 	    System.out.printf("Added %s\n", s);
>> 	}
>> 	writer.close();
>>
>> 	IndexSearcher searcher = new IndexSearcher(rdir);
>> 	String[] queries = { ".in", ".*in", ".IN", ".*IN" };
>> 	RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(),
>> 				      new JakartaRegexpCapabilities() };
>> 	RegexQuery qx = new RegexQuery(new Term("x", "x"));
>> 	System.out.printf("\nDefault RegexCapabilities=%s\n\n",
>> 			  qx.getRegexImplementation());
>> 	for (RegexCapabilities rcap : rcaps) {
>> 	    System.out.println(rcap);
>> 	    for (String s : queries) {
>> 		Term t = new Term("text", s);
>> 		RegexQuery q = new RegexQuery(t);
>> 		q.setRegexImplementation(rcap);
>> 		Hits h = searcher.search(q);
>> 		System.out.printf("%s hits for %s\n",
>> 				  h.length(),
>> 				  q.toString());
>> 	    }
>> 	}
>>     }
>> }
>>
>>
>> On Mon, May 11, 2009 at 1:39 PM, Huntsman84 <tpgarcia84@gmail.com> wrote:
>>>
>>> The RegexQuery class uses that package, and for that reason the
>>> expression
>>> matches.
>>>
>>> If my records contained only one word each, this code would work, but I
>>> need
>>> to apply that regular expression to a phrase...
>>>
>>>
>>> Ian Lea wrote:
>>>>
>>>> The default regex package is java.util.regex and I can't see anywhere
>>>> that you tell it to use the Jakarta regexp package.  So I don't think
>>>> that ".in" will match.  Also, you are storing your contents field as
>>>> NOT_ANALYZED so you will need to be wary of case sensitivity.  Maybe
>>>> this is what you want, but maybe not.
>>>>
>>>>
>>>> --
>>>> Ian.
>>>>
>>>>
>>>> On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarcia84@gmail.com>
>>>> wrote:
>>>>>
>>>>> This is the code for searching:
>>>>>
>>>>> String index = "index";
>>>>> String field = "contents";
>>>>> IndexReader reader = IndexReader.open(index);
>>>>> Searcher searcher = new IndexSearcher(reader);
>>>>>
>>>>> System.out.println("Enter query: ");
>>>>> String line = ".IN.";//in jakarta regexp this is like * IN *
>>>>> RegexQuery rxquery = new RegexQuery(new Term(field,line));
>>>>> Hits hits = searcher.search(rxquery);
>>>>>
>>>>> if(hits!=null){
>>>>>    for(int k = 0; k<100 && k<hits.length(); k++){
>>>>>        if(hits.doc(k)!=null)
>>>>>
>>>>>  System.out.println(hits.doc(k).getField("contents").stringValue());
>>>>>    }
>>>>> }
>>>>>
>>>>>
>>>>>
>>>>> And this is the part of creating the index:
>>>>>
>>>>>
>>>>> File directory = new File("index");
>>>>> IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
>>>>> true,
>>>>>                            IndexWriter.MaxFieldLength.LIMITED);
>>>>> List<String> records = getRecords();//returns a list of record
values
>>>>> from
>>>>> database, all of them are phrases
>>>>> Iterator<String> i = records.iterator();
>>>>> while(i.hasNext()){
>>>>>           Document doc = new Document();
>>>>>           doc.add(new Field(field, i.next(), Field.Store.YES,
>>>>> Field.Index.NOT_ANALYZED));
>>>>>        writer.addDocument(doc);
>>>>> }
>>>>> writer.optimize();
>>>>> writer.close();
>>>>>
>>>>>
>>>>>
>>>>> This code works as I want but just matching with the first word of the
>>>>> phrase. I think the problem is the index building, but I don't know how
>>>>> to
>>>>> fix it...
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> Thank you so much!!
>>>>>
>>>>>
>>>>>
>>>>> Steven A Rowe wrote:
>>>>>>
>>>>>> On 5/8/2009 at 9:13 AM, Ian Lee wrote:
>>>>>>> I'm surprised that it matches either - don't you need ".*in"
where .*
>>>>>>> means match any character zero or more times?  See the javadoc
for
>>>>>>> java.util.regex.Pattern, or for Jakarta Regexp if you are using
that
>>>>>>> package.
>>>>>>>
>>>>>>> Unless you're an expert in regexps it is probably worth playing
with
>>>>>>> them outside your lucene code to start with e.g. with simple
>>>>>>> String.matches(regexp) calls.  They can take some getting used
to.
>>>>>>> And try to avoid anything with backslashes if you can!
>>>>>>
>>>>>> The java.util.regex.Pattern implementation (the default RegexQuery
>>>>>> implementation) actually uses Matcher.lookingAt(), which is equivalent
>>>>>> to
>>>>>> prepending a "^" anchor to the beginning of the pattern, so if
>>>>>> Huntsman84
>>>>>> is using the default implementation, then I agree with Ian: I'm
>>>>>> surprised
>>>>>> it matches either.
>>>>>>
>>>>>> However, the Jakarta Regexp implementation uses RE.match(), which
does
>>>>>> *not* require a beginning-of-string match.
>>>>>>
>>>>>> Hunstman84, are you using the Jakarta Regexp implementation?  If
so,
>>>>>> then
>>>>>> like you, I'm surprised it's not matching both :).
>>>>>>
>>>>>> It would be useful to see some real code, including how you index
your
>>>>>> records.
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > Hi,
>>>>>>> >
>>>>>>> > I am using RegexQuery for searching in a set of records
wich are
>>>>>>> > phrases of several words each. My aim is to find any phrase
that
>>>>>>> > contains the given group of letters (e.g. "in"). For that
case,
>>>>>>> > I am building the query with the regular expression ".in.",
so it
>>>>>>> > should return all phrases with contain "in", but the search
only
>>>>>>> > matches with the first word of the phrase.
>>>>>>> >
>>>>>>> > For example, if my records are "Knowing yourself" and "Old
>>>>>>> > clinic", the correct search would return 2 matches, but
it only
>>>>>>> > matches with "Knowing yourself".
>>>>>>> >
>>>>>>> > How could I fix this?
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context:
> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
"RABI ZIDNI ILMA"

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message