lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: StandardAnalyzer vs KeywordAnalyzer in Luke
Date Tue, 02 Dec 2008 18:23:38 GMT
elguillelmo wrote:
> Kai_testing Middleton wrote:
>> The nutch analyzer is NutchDocumentAnalyzer.  Does anyone know how to add
>> this to the Luke classpath?  I tried this kind of thing but it didn't work
> I'm trying to work out the same thing, to no avail. Would anybody be able to
> detail how to add Nutch's Analyzer to the Luke's classpath?
> What I'm doing at the moment is:
> java -classpath lukeall-0.8.1.jar:/path/to/nutchAnalyzer.jar
> org.getopt.luke.Luke

Well ... It could be done, but not easily.

First, NutchDocumentAnalyzer is dependent on other Nutch classes (so you 
need nutch-${version}.jar) but they in turn depend on Hadoop (so you 
need hadoop-core*.jar), which in turn depends on a dozen or so other 
jars ... All of this needs to be added to classpath.

Second, this analyzer doesn't have a no-args constructor, it needs a 
Hadoop Configuration argument. Luke can handle only no-args or single 
String arg constructors. I would have to change the way Analyzers are 
instantiated in Luke so that you can pass an existing instance (e.g. one 
that you created in the scripting plugin context).

Third, NutchDocumentAnalyzer uses CommonGrams, which in turn _require_ 
the presence of a common-grams.utf8 resource on the classpath.

To summarize: unless you want to get your hands dirty with Luke 
internals it can't be done.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message