lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hg...@cswebmail.com
Subject Re: Search Expansion - more
Date Tue, 06 Apr 2004 17:01:09 GMT
Hi Erik, all,

I will write a small application (using RAM indexing)
soon. At the moment I realised my application using the
QueryParser and breaking the search string down in
pieces of 100 keywords/keyphrases. This is still fast
enough for my application as searches > 100 terms are
quite infrequent.

The reason why I abandoned the BooleanQuery plus
StandardAnalyzer combination:
BooleanQuery is capable to 'digest' more than 800 terms
as input from my search expansion. However bringing
StandardAnalyzer into the equation ... same problem as
with query parser: Produced a 'line too long' error
when more than ~ 150 terms are supplied.

Thanks for all your support this time, I've learned a
lot about Lucene - and after all I have a working
application.

-Holger

On Mon, 5 Apr 2004 15:49:41 -0400, Erik Hatcher wrote:

> 
> On Apr 5, 2004, at 2:53 PM, hgadm@cswebmail.com wrote:
> > Hi Erik,
> >
> > I am really desperate because I cannot clarify the
> > problem to you - and I am really desperate for help
> now
> > as well.
> 
> Don't feel so desperate.... mocking up a simple main()
> example should 
> be easy.  If it is not, then it indicates too much
> complexity.  
> However, it should be easy to mock up something that
> isolates the 
> indexing of a single document (maybe only one
"subject"
> field) with 
> "blah blah host defense blah blah" in it, and go from
> there.  I'm 
> afraid we've made things more complex than they need
to
> be, and only a 
> simple example of the situation will help.  I simply
> cannot devote the 
> necessary time into understanding your elaborate
> description below, 
> sorry.
> 
> Please try to create such an example - it will help
you
> understand 
> things better too, not just me.  By narrowing things
> down to very 
> simple examples (look at most of Lucene's test suite
to
> get an idea) as 
> main() or even better, JUnit tests, helps you tinker
> with design easily 
> and clearly.
> 
> Simplicity - it's the only way to true understanding. 
> :)
> 
> 	Erik
> 
> 
> > Creating a sample application would be possible (and
> > the next step). I call Lucene as web service (could
> > however try to wrap the WS function with a main()
and
> > create an application for you to run from the
command
> > line).
> >
> > However please allow me once again to try to
explain:
> >
> > I have lots of small xml files that I want to show
> only
> > depending on whether their <subject> tag contains
> > certain keywords / keyphrases.
> >
> > They have been indexed using StandardAnalyser
> >
> > As search criterion I pass on terms from a domain
> > ontology to see what XML files match these terms
> within
> > <subject>.
> >
> > I started using QueryParser:
> > Query query = QueryParser.parse(line, "name",
> > analyzer);
> > where 'line' was simply a whitespace-delimited line
of
> > concepts
> >
> > Worked fine, even could search for keyphrases by
> > linking the words with underscore, e.g.
host_defense.
> >
> > Did produce an error however if the user chooses a
> very
> > high concept level in the domain ontology resulting
in
> >> 200 terms to be put into the query string.
> >
> > As you pointed out the limitation was obviously the
> > QueryParser (which I could reproduce) so you
suggested
> > to bypass QueryParser by constructing a boolean
query
> > using TermQuery.
> >
> > This worked and could take more than 800 (!) terms
> > without errors (could not test more) but because of
> > using TermQuery I lost the functionality to search
for
> > phrases, e.g. 'host defense'.
> >
> > After your last response the only question that
> remains
> > to me is the syntax for adding a PhraseQuery on
field
> > <subject>. I could not make sense of the sparse
> > description in the apidoc for that.
> >
> > Why am I using the array myquery[]? Well it's simply
> > the one that passes on the massive amount of query
> > terms to the web service. I though by using a string
> > array I could maintain the aspect of each search
term,
> > especially when they represent phrases and not
single
> > terms, e.g. myquery[n]="host defense"
> >
> > I would need something that recognises whether the
> term
> > in myquery[n] is a single term (then adding to the
> > boolean search with TermQuery as usual) OR whether
it
> > is a phrase, then adding with PhraseQuery (for
which I
> > do not know the syntax).
> > Maybe the PhraseQuery can also add single terms as
> well
> > - then I would only need this.
> >
> > Thanks for your help, Erik
> >
> > -Holger
> >
> > ___________________________________________________
> > The ALL NEW CS2000 from CompuServe
> >  Better!  Faster! More Powerful!
> >  250 FREE hours! Sign-on Now!
> >  http://www.compuserve.com/trycsrv/cs2000/webmail/
> >
> >
> >
> >
> >
> >
>
---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> lucene-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail:
> lucene-dev-help@jakarta.apache.org
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail:
> lucene-dev-help@jakarta.apache.org

___________________________________________________
The ALL NEW CS2000 from CompuServe
 Better!  Faster! More Powerful!
 250 FREE hours! Sign-on Now!
 http://www.compuserve.com/trycsrv/cs2000/webmail/





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message