lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: [RegexQuery] how to check what words were founded in particulary Documents ?
Date Sat, 21 Jul 2007 01:00:51 GMT
Erick - you're not missing anything, except that the original poster  
is after RegexQuery, not WildcardQuery.  Both work basically the same  
way, except in the pattern matching capabilities.

	Erik

On Jul 20, 2007, at 5:45 PM, Erick Erickson wrote:
> Erik:
>
> Well, you wrote the book <G>. But I thought something like this
> would work
>
> TermDocs td = reader.termDocs();
> WildcardTermEnum we = new WildcardTermEnum(reader, new term("field",
> "c*t"));
> while (we.next()) {
>  td.seek(we);
>  while (td.next()) {
>     report document contains term;
>  }
> }
>
> Although I admit I haven't tried it, so I could be totally off  
> base. What
> am I missing?
>
> Erick
>
> On 7/20/07, Erik Hatcher <erik@ehatchersolutions.com> wrote:
>>
>> Erick - I think you're mixing things up with WildcardQuery.
>> RegexQuery does support all regex capabilities (depending on the
>> underlying regex matcher used).
>>
>> A couple of techniques you could use to achieve the goal:
>>
>>         * Use RegexTermEnum, though that'll give you the terms  
>> across the
>> entire index, so maybe in your use case you could index a single
>> document into a RAMDirectory and RegexTermEnum on it.
>>
>>         * Try out SpanRegexQuery and use getSpans() to get the exact
>> matches.
>>
>> Erik
>>
>>
>>
>> On Jul 20, 2007, at 4:10 PM, Erick Erickson wrote:
>>
>> > First, the period (.) isn't part of the syntax, so make sure you  
>> look
>> > more carefully at the Lucene syntax...
>> >
>> > Then, you might be able to use WildcardTermEnum to find
>> > the terms that match and TermDocs to find the documents
>> > that contain those terms.
>> >
>> > There's nothing built into Lucene to do this out of the box, you
>> > have to roll your own.
>> >
>> > Best
>> > Erick
>> >
>> > On 20 Jul 2007 21:27:40 +0200, mhzmark@interia.eu  
>> <mhzmark@interia.eu>
>> > wrote:
>> >>
>> >> Hello.
>> >>
>> >> Let assume that I have this code in my application:
>> >>
>> >>    (...)
>> >>    Query query = new RegexQuery(new Term("field", "C.T"));;
>> >>    // searching...
>> >>    (...)
>> >>
>> >> And now, I would like to know if my application founded "cat" or
>> >> "cot" or
>> >> something else. How can I check what was founded by my  
>> application ?
>> >>
>> >> I would like to write application like this:
>> >>    INPUT -> regular expression
>> >>    OUTPUT -> file  ---> word
>> >>
>> >> example: INPUT = "C.T"
>> >>          OUTPUT =
>> >>                   a.txt --> CAT
>> >>                   a.txt --> COT
>> >>                   b.txt --> CAT
>> >>                   b.txt --> CAT
>> >>                   b.txt --> COT
>> >>                   (...)
>> >>
>> >> So, how to check what words were founded in particulary Documents
>> >> after
>> >> searching? I see that Hits class contains only founded  
>> documents and
>> >> nothing more (I am new in this technology so I can be wrong...)
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>  
>> ---------------------------------------------------------------------
>> >> -
>> >> Dowiedz sie, co naprawde podnieca kobiety. Wiecej wiesz,  
>> latwiej je
>> >> oczarujesz
>> >>
>> >> >>>http://link.interia.pl/f1b17
>> >>
>> >>
>> >>  
>> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message