lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tavi Nathanson <tavi.nathan...@gmail.com>
Subject Complex Query Parsing and Tokenization: ANTLR, JavaCC, Solr
Date Tue, 27 Apr 2010 02:17:29 GMT
Hey everyone,

My organization uses our own homebrew QueryParser class, unrelated to
Lucene's JavaCC-based QueryParser, to parse our queries. We don't currently
use anything from Solr. Our QueryParser class has gotten quite cumbersome,
and I'm looking into alternatives. Grammar-based parsing seems like the way
to go, but I've got some questions:

- ANTLR seems to be very well-supported and well-liked, but I see that
Lucene's QueryParser and StandardTokenizer use JavaCC. Does anyone have
experience writing a Lucene or Solr parser using ANTLR? Any thoughts on
whether it would be helpful to stick with JavaCC, or problematic to use
ANTLR, in light of Lucene's default usage of JavaCC?
- Any experience using ANTLR for tokenization?
- I was told that Solr might be componentizing its query parsing in such a
way that we might be able to use that instead of a homebrew grammar-based
solution. However, I haven't found anything written about that. I don't know
much about Solr's query parsing, other than what I saw looking at
QParser.java and QParserPlugin.java: it seems that one can plug in any
parser needed. That doesn't really help us, as our goal is to simplify our
parsing logic. Is there any way to structure our query parsing logic without
needing to write a grammar from scratch, whether it's a Solr component or
something else?

In a nutshell, I'm trying to get a sense of the best practices in this
situation (namely, custom query parsing that's getting very complex) before
I dive into implementing a solution.

Thanks!
Tavi

Mime
View raw message