lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: RegexQuery Incomplete Results
Date Mon, 11 May 2009 09:04:08 GMT
The default regex package is java.util.regex and I can't see anywhere
that you tell it to use the Jakarta regexp package.  So I don't think
that ".in" will match.  Also, you are storing your contents field as
NOT_ANALYZED so you will need to be wary of case sensitivity.  Maybe
this is what you want, but maybe not.


--
Ian.


On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarcia84@gmail.com> wrote:
>
> This is the code for searching:
>
> String index = "index";
> String field = "contents";
> IndexReader reader = IndexReader.open(index);
> Searcher searcher = new IndexSearcher(reader);
>
> System.out.println("Enter query: ");
> String line = ".IN.";//in jakarta regexp this is like * IN *
> RegexQuery rxquery = new RegexQuery(new Term(field,line));
> Hits hits = searcher.search(rxquery);
>
> if(hits!=null){
>    for(int k = 0; k<100 && k<hits.length(); k++){
>        if(hits.doc(k)!=null)
>            System.out.println(hits.doc(k).getField("contents").stringValue());
>    }
> }
>
>
>
> And this is the part of creating the index:
>
>
> File directory = new File("index");
> IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
> true,
>                            IndexWriter.MaxFieldLength.LIMITED);
> List<String> records = getRecords();//returns a list of record values from
> database, all of them are phrases
> Iterator<String> i = records.iterator();
> while(i.hasNext()){
>           Document doc = new Document();
>           doc.add(new Field(field, i.next(), Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>        writer.addDocument(doc);
> }
> writer.optimize();
> writer.close();
>
>
>
> This code works as I want but just matching with the first word of the
> phrase. I think the problem is the index building, but I don't know how to
> fix it...
>
> Any ideas?
>
> Thank you so much!!
>
>
>
> Steven A Rowe wrote:
>>
>> On 5/8/2009 at 9:13 AM, Ian Lee wrote:
>>> I'm surprised that it matches either - don't you need ".*in" where .*
>>> means match any character zero or more times?  See the javadoc for
>>> java.util.regex.Pattern, or for Jakarta Regexp if you are using that
>>> package.
>>>
>>> Unless you're an expert in regexps it is probably worth playing with
>>> them outside your lucene code to start with e.g. with simple
>>> String.matches(regexp) calls.  They can take some getting used to.
>>> And try to avoid anything with backslashes if you can!
>>
>> The java.util.regex.Pattern implementation (the default RegexQuery
>> implementation) actually uses Matcher.lookingAt(), which is equivalent to
>> prepending a "^" anchor to the beginning of the pattern, so if Huntsman84
>> is using the default implementation, then I agree with Ian: I'm surprised
>> it matches either.
>>
>> However, the Jakarta Regexp implementation uses RE.match(), which does
>> *not* require a beginning-of-string match.
>>
>> Hunstman84, are you using the Jakarta Regexp implementation?  If so, then
>> like you, I'm surprised it's not matching both :).
>>
>> It would be useful to see some real code, including how you index your
>> records.
>>
>> Steve
>>
>>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
>>> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I am using RegexQuery for searching in a set of records wich are
>>> > phrases of several words each. My aim is to find any phrase that
>>> > contains the given group of letters (e.g. "in"). For that case,
>>> > I am building the query with the regular expression ".in.", so it
>>> > should return all phrases with contain "in", but the search only
>>> > matches with the first word of the phrase.
>>> >
>>> > For example, if my records are "Knowing yourself" and "Old
>>> > clinic", the correct search would return 2 matches, but it only
>>> > matches with "Knowing yourself".
>>> >
>>> > How could I fix this?
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> --
> View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message