lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Gilbert <>
Subject Re: Another way to handle large numeric range queries
Date Wed, 09 Jun 2004 17:05:57 GMT


Thanks for the comments.  

> I'm particularly interested in the XPath stuff I saw in LGQueryParser.   

   * xpathFieldParse
   'xpath' parser: param allfields[], with query or field[] possibly
    having wild-card notation:   *.start  annotation.*.text 
    allowing '/' and '.' field separator

This is an *unfinished* attempt to support xpath style queries with
wild-cards or parts when you have indexed XML data, such as 

  query: /annotation/*/text:term
I had to put this aside when I saw the problem of pulling the xpath
fields from a query string would take a fair amount of thought and code.

> > -- NumberField formats field for indexing
> *whew* - that is one complex piece of code.  I like the DebugFilter  

Mostly it is just a collection of small 10 line classes, packaged
as inner classes (I hate java's insistence on 1 file/class :)
Some of the complexity there is because the standard lucene analyzer
won't work for biology data (which uses a lot of symbols, upper/lowercase,
etc.) and this code allows one to build an analyzer/indexer 
which is tuned to different types in each field of data.
The configuration for a given biology database parsing includes 
statements like:

## field tokenizers - base CharTokenizer, work before Filters 

## field filters - base TokenFilter, only are used if fieldtype=Text or UnStored

## fieldrecoder classes manipulate data before indexing, maybe making new fields

This method then generates TokenStream using such field-specific parsers, 
  public TokenStream tokenStream( String fieldName, Reader reader) {
    TokenStream result = null;
    try { result= getTokenizer(fieldName, reader);  }
    catch (Exception e) {
      result = new org.apache.lucene.analysis.standard.StandardTokenizer(reader);
    try { result= getFilter(fieldName, result); }
    catch (Exception e) {
      LowerDataFilter ldf= new LowerDataFilter(); 
      ldf.setInput(result); result= ldf;
    return result;

-- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message