lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huntsman84 <tpgarci...@gmail.com>
Subject RE: RegexQuery Incomplete Results
Date Mon, 11 May 2009 08:00:32 GMT

This is the code for searching:

String index = "index";
String field = "contents";
IndexReader reader = IndexReader.open(index);
Searcher searcher = new IndexSearcher(reader);

System.out.println("Enter query: ");
String line = ".IN.";//in jakarta regexp this is like * IN *
RegexQuery rxquery = new RegexQuery(new Term(field,line));
Hits hits = searcher.search(rxquery);

if(hits!=null){
    for(int k = 0; k<100 && k<hits.length(); k++){
        if(hits.doc(k)!=null)
	    System.out.println(hits.doc(k).getField("contents").stringValue());
    }
}



And this is the part of creating the index:


File directory = new File("index");
IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
true, 
                            IndexWriter.MaxFieldLength.LIMITED);
List<String> records = getRecords();//returns a list of record values from
database, all of them are phrases
Iterator<String> i = records.iterator();
while(i.hasNext()){
           Document doc = new Document();
           doc.add(new Field(field, i.next(), Field.Store.YES,
Field.Index.NOT_ANALYZED));
	writer.addDocument(doc);
}
writer.optimize();
writer.close();



This code works as I want but just matching with the first word of the
phrase. I think the problem is the index building, but I don't know how to
fix it...

Any ideas?

Thank you so much!!



Steven A Rowe wrote:
> 
> On 5/8/2009 at 9:13 AM, Ian Lee wrote:
>> I'm surprised that it matches either - don't you need ".*in" where .*
>> means match any character zero or more times?  See the javadoc for
>> java.util.regex.Pattern, or for Jakarta Regexp if you are using that
>> package.
>> 
>> Unless you're an expert in regexps it is probably worth playing with
>> them outside your lucene code to start with e.g. with simple
>> String.matches(regexp) calls.  They can take some getting used to.
>> And try to avoid anything with backslashes if you can!
> 
> The java.util.regex.Pattern implementation (the default RegexQuery
> implementation) actually uses Matcher.lookingAt(), which is equivalent to
> prepending a "^" anchor to the beginning of the pattern, so if Huntsman84
> is using the default implementation, then I agree with Ian: I'm surprised
> it matches either.  
> 
> However, the Jakarta Regexp implementation uses RE.match(), which does
> *not* require a beginning-of-string match.  
> 
> Hunstman84, are you using the Jakarta Regexp implementation?  If so, then
> like you, I'm surprised it's not matching both :).
> 
> It would be useful to see some real code, including how you index your
> records.
> 
> Steve
> 
>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
>> wrote:
>> >
>> > Hi,
>> >
>> > I am using RegexQuery for searching in a set of records wich are
>> > phrases of several words each. My aim is to find any phrase that
>> > contains the given group of letters (e.g. "in"). For that case,
>> > I am building the query with the regular expression ".in.", so it
>> > should return all phrases with contain "in", but the search only
>> > matches with the first word of the phrase.
>> >
>> > For example, if my records are "Knowing yourself" and "Old
>> > clinic", the correct search would return 2 matches, but it only
>> > matches with "Knowing yourself".
>> >
>> > How could I fix this?
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message