lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Gilbert <gilbe...@bio.indiana.edu>
Subject Re: Another way to handle large numeric range queries
Date Wed, 09 Jun 2004 17:05:57 GMT

Erik,

Thanks for the comments.  

> I'm particularly interested in the XPath stuff I saw in LGQueryParser.   

   * xpathFieldParse
   'xpath' parser: param allfields[], with query or field[] possibly
    having wild-card notation:   *.start  annotation.*.text 
    allowing '/' and '.' field separator

This is an *unfinished* attempt to support xpath style queries with
wild-cards or parts when you have indexed XML data, such as 

  query: /annotation/*/text:term
       
I had to put this aside when I saw the problem of pulling the xpath
fields from a query string would take a fair amount of thought and code.

> > BioDataAnalyzer.java -- NumberField formats field for indexing
> 
> *whew* - that is one complex piece of code.  I like the DebugFilter  

Mostly it is just a collection of small 10 line classes, packaged
as inner classes (I hate java's insistence on 1 file/class :)
Some of the complexity there is because the standard lucene analyzer
won't work for biology data (which uses a lot of symbols, upper/lowercase,
etc.) and this code allows one to build an analyzer/indexer 
which is tuned to different types in each field of data.
The configuration for a given biology database parsing includes 
statements like:

## field tokenizers - base CharTokenizer, work before Filters 
tokenizer.SYM=org.eugenes.index.BiodataAnalyzer$DataTokenizer

## field filters - base TokenFilter, only are used if fieldtype=Text or UnStored
tokenfilter.BLOC.start=org.eugenes.index.BiodataAnalyzer$NumberFilter

## fieldrecoder classes manipulate data before indexing, maybe making new fields
fieldrecoder.BLOC=LucegeneIndexers$Location_FieldRecoder

This method then generates TokenStream using such field-specific parsers, 
  public TokenStream tokenStream( String fieldName, Reader reader) {
    TokenStream result = null;
    try { result= getTokenizer(fieldName, reader);  }
    catch (Exception e) {
      result = new org.apache.lucene.analysis.standard.StandardTokenizer(reader);
      }
    try { result= getFilter(fieldName, result); }
    catch (Exception e) {
      LowerDataFilter ldf= new LowerDataFilter(); 
      ldf.setInput(result); result= ldf;
      }
    return result;
  }


-- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd@indiana.edu--http://marmot.bio.indiana.edu/

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message