lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huntsman84 <tpgarci...@gmail.com>
Subject Re: RegexQuery Incomplete Results
Date Mon, 11 May 2009 12:39:09 GMT

The RegexQuery class uses that package, and for that reason the expression
matches.

If my records contained only one word each, this code would work, but I need
to apply that regular expression to a phrase...


Ian Lea wrote:
> 
> The default regex package is java.util.regex and I can't see anywhere
> that you tell it to use the Jakarta regexp package.  So I don't think
> that ".in" will match.  Also, you are storing your contents field as
> NOT_ANALYZED so you will need to be wary of case sensitivity.  Maybe
> this is what you want, but maybe not.
> 
> 
> --
> Ian.
> 
> 
> On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarcia84@gmail.com> wrote:
>>
>> This is the code for searching:
>>
>> String index = "index";
>> String field = "contents";
>> IndexReader reader = IndexReader.open(index);
>> Searcher searcher = new IndexSearcher(reader);
>>
>> System.out.println("Enter query: ");
>> String line = ".IN.";//in jakarta regexp this is like * IN *
>> RegexQuery rxquery = new RegexQuery(new Term(field,line));
>> Hits hits = searcher.search(rxquery);
>>
>> if(hits!=null){
>>    for(int k = 0; k<100 && k<hits.length(); k++){
>>        if(hits.doc(k)!=null)
>>          
>>  System.out.println(hits.doc(k).getField("contents").stringValue());
>>    }
>> }
>>
>>
>>
>> And this is the part of creating the index:
>>
>>
>> File directory = new File("index");
>> IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
>> true,
>>                            IndexWriter.MaxFieldLength.LIMITED);
>> List<String> records = getRecords();//returns a list of record values
>> from
>> database, all of them are phrases
>> Iterator<String> i = records.iterator();
>> while(i.hasNext()){
>>           Document doc = new Document();
>>           doc.add(new Field(field, i.next(), Field.Store.YES,
>> Field.Index.NOT_ANALYZED));
>>        writer.addDocument(doc);
>> }
>> writer.optimize();
>> writer.close();
>>
>>
>>
>> This code works as I want but just matching with the first word of the
>> phrase. I think the problem is the index building, but I don't know how
>> to
>> fix it...
>>
>> Any ideas?
>>
>> Thank you so much!!
>>
>>
>>
>> Steven A Rowe wrote:
>>>
>>> On 5/8/2009 at 9:13 AM, Ian Lee wrote:
>>>> I'm surprised that it matches either - don't you need ".*in" where .*
>>>> means match any character zero or more times?  See the javadoc for
>>>> java.util.regex.Pattern, or for Jakarta Regexp if you are using that
>>>> package.
>>>>
>>>> Unless you're an expert in regexps it is probably worth playing with
>>>> them outside your lucene code to start with e.g. with simple
>>>> String.matches(regexp) calls.  They can take some getting used to.
>>>> And try to avoid anything with backslashes if you can!
>>>
>>> The java.util.regex.Pattern implementation (the default RegexQuery
>>> implementation) actually uses Matcher.lookingAt(), which is equivalent
>>> to
>>> prepending a "^" anchor to the beginning of the pattern, so if
>>> Huntsman84
>>> is using the default implementation, then I agree with Ian: I'm
>>> surprised
>>> it matches either.
>>>
>>> However, the Jakarta Regexp implementation uses RE.match(), which does
>>> *not* require a beginning-of-string match.
>>>
>>> Hunstman84, are you using the Jakarta Regexp implementation?  If so,
>>> then
>>> like you, I'm surprised it's not matching both :).
>>>
>>> It would be useful to see some real code, including how you index your
>>> records.
>>>
>>> Steve
>>>
>>>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarcia84@gmail.com>
>>>> wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > I am using RegexQuery for searching in a set of records wich are
>>>> > phrases of several words each. My aim is to find any phrase that
>>>> > contains the given group of letters (e.g. "in"). For that case,
>>>> > I am building the query with the regular expression ".in.", so it
>>>> > should return all phrases with contain "in", but the search only
>>>> > matches with the first word of the phrase.
>>>> >
>>>> > For example, if my records are "Knowing yourself" and "Old
>>>> > clinic", the correct search would return 2 matches, but it only
>>>> > matches with "Knowing yourself".
>>>> >
>>>> > How could I fix this?
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message