lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Beginner: Specific indexing
Date Fri, 05 Sep 2008 17:46:13 GMT

: Interesting if you are not going to use an analyser... what then ? I'm
: thinking of using javacc, because I oversimplified somewhat the 3 field
: string structure, so I need a kind of small grammar for that.

Well, the specifics of "what else" is in your files is going to be the 
biggest factor in deciding how to find the bits of info you need.  

Let me try to put in perspective for you how your question sounded to me, 
as someone unfamiliar with your specific problem.  the question sounded 
equivilent to if someone had asked;


"I have a bunch of XML files, some of these XML files contain syntax that 
loks like this...
   <property name="${keyword}" min="${x}" max="${y}" />
where ${x} and ${y} are small numbers, and ${keyword} is from a fixed list 
of words.  My idea is to simply build a TokenFilter that will look for 
those... do I have it right ?"

...and i would say: "Not really.  Use an XML parser to parse your XML and 
extract your structured data, then add them to your Lucene Document."

You're files may not be XML, but the basic premise is the same; use 
whatever code makes the most sense to parse whatever file format you are 
dealing with given what you know aboutthe files (not just the parts you 
want, but the other parts as well)


Where an Analyzer might make sense is if you want to do processing on 
those bits of data after you find them ... stemming your keywords, or 
mapping them to synonyms, etc...


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message