lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Search Expansion - more
Date Mon, 05 Apr 2004 19:49:41 GMT
On Apr 5, 2004, at 2:53 PM, hgadm@cswebmail.com wrote:
> Hi Erik,
>
> I am really desperate because I cannot clarify the
> problem to you - and I am really desperate for help now
> as well.

Don't feel so desperate.... mocking up a simple main() example should 
be easy.  If it is not, then it indicates too much complexity.  
However, it should be easy to mock up something that isolates the 
indexing of a single document (maybe only one "subject" field) with 
"blah blah host defense blah blah" in it, and go from there.  I'm 
afraid we've made things more complex than they need to be, and only a 
simple example of the situation will help.  I simply cannot devote the 
necessary time into understanding your elaborate description below, 
sorry.

Please try to create such an example - it will help you understand 
things better too, not just me.  By narrowing things down to very 
simple examples (look at most of Lucene's test suite to get an idea) as 
main() or even better, JUnit tests, helps you tinker with design easily 
and clearly.

Simplicity - it's the only way to true understanding.  :)

	Erik


> Creating a sample application would be possible (and
> the next step). I call Lucene as web service (could
> however try to wrap the WS function with a main() and
> create an application for you to run from the command
> line).
>
> However please allow me once again to try to explain:
>
> I have lots of small xml files that I want to show only
> depending on whether their <subject> tag contains
> certain keywords / keyphrases.
>
> They have been indexed using StandardAnalyser
>
> As search criterion I pass on terms from a domain
> ontology to see what XML files match these terms within
> <subject>.
>
> I started using QueryParser:
> Query query = QueryParser.parse(line, "name",
> analyzer);
> where 'line' was simply a whitespace-delimited line of
> concepts
>
> Worked fine, even could search for keyphrases by
> linking the words with underscore, e.g. host_defense.
>
> Did produce an error however if the user chooses a very
> high concept level in the domain ontology resulting in
>> 200 terms to be put into the query string.
>
> As you pointed out the limitation was obviously the
> QueryParser (which I could reproduce) so you suggested
> to bypass QueryParser by constructing a boolean query
> using TermQuery.
>
> This worked and could take more than 800 (!) terms
> without errors (could not test more) but because of
> using TermQuery I lost the functionality to search for
> phrases, e.g. 'host defense'.
>
> After your last response the only question that remains
> to me is the syntax for adding a PhraseQuery on field
> <subject>. I could not make sense of the sparse
> description in the apidoc for that.
>
> Why am I using the array myquery[]? Well it's simply
> the one that passes on the massive amount of query
> terms to the web service. I though by using a string
> array I could maintain the aspect of each search term,
> especially when they represent phrases and not single
> terms, e.g. myquery[n]="host defense"
>
> I would need something that recognises whether the term
> in myquery[n] is a single term (then adding to the
> boolean search with TermQuery as usual) OR whether it
> is a phrase, then adding with PhraseQuery (for which I
> do not know the syntax).
> Maybe the PhraseQuery can also add single terms as well
> - then I would only need this.
>
> Thanks for your help, Erik
>
> -Holger
>
> ___________________________________________________
> The ALL NEW CS2000 from CompuServe
>  Better!  Faster! More Powerful!
>  250 FREE hours! Sign-on Now!
>  http://www.compuserve.com/trycsrv/cs2000/webmail/
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message