lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: [RegexQuery] how to check what words were founded in particulary Documents ?
Date Sat, 21 Jul 2007 18:00:27 GMT
StandardAnalyzer lowercases terms. The term you pass to the term enum 
has an upper case character. You really need to be careful if you are 
going to use an analyzer to store and then a term enum to cycle 
through...the term you pass the term enum must match the output of the 
analyzer.

- Mark

mhzmark@interia.eu wrote:
> To begin with, thanks for Yours quick response.
>
> I am still trying write this code and I've already written this one:
>
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.store.*;
> import org.apache.lucene.index.*;
> import org.apache.lucene.document.*;
> import org.apache.lucene.search.*;
> import org.apache.lucene.search.regex.*;
>
> public class TmpMain {
> 	public static void main(String[] args) throws Exception {	
> 		RAMDirectory directory = new RAMDirectory(); 
> 		IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), 
>                                                       true); 
> 		Document doc = new Document(); 
> 		doc.add(new Field("field", "Cat Cot Cit Cet", Field.Store.YES, 
>                                     Field.Index.TOKENIZED)); 
> 		writer.addDocument(doc); 
> 		writer.optimize();
> 		writer.close(); 
> 		IndexReader reader=IndexReader.open(directory);
> 		RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field", 
> 				"C.t"), new JavaUtilRegexCapabilities());
> 	/*
> 	//Second Version:
> 		WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new 
>                                           Term("field", "C?t"));
> 	*/
> 		while(regexTermEnum.next()) {
> 			System.out.println("Next:");
> 			System.out.println(regexTermEnum.term().text());
> 		}
> 		System.out.println("End.");		
> 	}
> }
>
>
> But the output of this code is:
> End.
>
> As you can see above I have tried change line:
> 	RegexTermEnum regexTermEnum=new RegexTermEnum(reader, new Term("field", 
> 		"C.t"), new JavaUtilRegexCapabilities());
> on
> 	WildcardTermEnum regexTermEnum=new WildcardTermEnum(reader, new Term("field", 
> 		"C?t"));
>
> but the result is the same :/
> What am I doing wrong ?
>
>
>
>
>   
>> Erick - you're not missing anything, except that the original poster  
>> is after RegexQuery, not WildcardQuery.  Both work basically the same  
>> way, except in the pattern matching capabilities.
>>
>> 	Erik
>>
>> On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote:
>>     
>>> Erik:
>>>
>>> Well, you wrote the book <G>. But I thought something like this
>>> would work
>>>
>>> TermDocs td = reader.termDocs();
>>> WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
>>> "c*t"));
>>> while (we.next()) {
>>>  td.seek(we);
>>>  while (td.next()) {
>>>     report document contains term;
>>>  }
>>> }
>>>
>>> Although I admit I haven't tried it, so I could be totally off  
>>> base. What
>>> am I missing?
>>>
>>> Erick
>>>
>>> On 7/20/07, Erik Hatcher <erik@ehatchersolutions.com> wrote:
>>>       
>>>> Erick - I think you're mixing things up with WildcardQuery.
>>>> RegexQuery does support all regex capabilities (depending on the
>>>> underlying regex matcher used).
>>>>
>>>> A couple of techniques you could use to achieve the goal:
>>>>
>>>>         * Use RegexTermEnum, though that'll give you the terms  
>>>> across the
>>>> entire index, so maybe in your use case you could index a single
>>>> document into a RAMDirectory and RegexTermEnum on it.
>>>>
>>>>         * Try out SpanRegexQuery and use getSpans() to get the exact
>>>> matches.
>>>>
>>>> Erik
>>>>
>>>>
>>>>
>>>> On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:
>>>>
>>>>         
>>>>> First, the period (.) isn't part of the syntax, so make sure you  
>>>>>           
>>>> look
>>>>         
>>>>> more carefully at the Lucene syntax...
>>>>>
>>>>> Then, you might be able to use WildcardTermEnum to find
>>>>> the terms that match and TermDocs to find the documents
>>>>> that contain those terms.
>>>>>
>>>>> There's nothing built into Lucene to do this out of the box, you
>>>>> have to roll your own.
>>>>>
>>>>> Best
>>>>> Erick
>>>>>
>>>>> On 20 Jul 2007 21:27:40 +0200, mhzmark@interia.eu  
>>>>>           
>>>> <mhzmark@interia.eu>
>>>>         
>>>>> wrote:
>>>>>           
>>>>>> Hello.
>>>>>>
>>>>>> Let assume that I have this code in my application:
>>>>>>
>>>>>>    (...)
>>>>>>    Query query = new RegexQuery(new Term("field", "C.T"));;
>>>>>>    // searching...
>>>>>>    (...)
>>>>>>
>>>>>> And now, I would like to know if my application founded "cat" or
>>>>>> "cot" or
>>>>>> something else. How can I check what was founded by my  
>>>>>>             
>>>> application ?
>>>>         
>>>>>> I would like to write application like this:
>>>>>>    INPUT -> regular expression
>>>>>>    OUTPUT -> file  ---> word
>>>>>>
>>>>>> example: INPUT = "C.T"
>>>>>>          OUTPUT =
>>>>>>                   a.txt --> CAT
>>>>>>                   a.txt --> COT
>>>>>>                   b.txt --> CAT
>>>>>>                   b.txt --> CAT
>>>>>>                   b.txt --> COT
>>>>>>                   (...)
>>>>>>
>>>>>> So, how to check what words were founded in particulary Documents
>>>>>> after
>>>>>> searching? I see that Hits class contains only founded  
>>>>>>             
>>>> documents and
>>>>         
>>>>>> nothing more (I am new in this technology so I can be wrong...)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  
>>>>>>             
>>>> ---------------------------------------------------------------------
>>>>         
>>>>>> -
>>>>>> Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz,  
>>>>>>             
>>>> latwiej je
>>>>         
>>>>>> oczarujesz
>>>>>>
>>>>>>             
>>>>>>>>> http://link.interia.pl/f1b17
>>>>>>>>>                   
>>>>>>  
>>>>>>             
>>>> ---------------------------------------------------------------------
>>>>         
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>>>             
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>>>         
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>
> ----------------------------------------------------------------------
> Najbogatsza Polka 
> Zobacz >>> http://link.interia.pl/f1ae2
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message