lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jim shirreffs" <j...@verizon.net>
Subject Re: Indexing help needed
Date Fri, 25 May 2007 19:30:24 GMT
Thanks for the advice, I just don't see where in the Lucene code I should 
plug OOParcer into Lucene.

I've walked the code in LIUS and Nutch (moving on to Solr) trying to find 
common objects. If I can find common objects in Lucene and Nutch I'll know 
where to plug in.


Lucene Objects looks like this

IndexWriter
                        Analyzer
                            StandardAnalyzer
                        Document
                            Reader
                                FileReader
                                StringReader
                        DocumentWriter


But when I search thru the Nutch or LIUS code I can not find these objects. 
LIUS uses reflection so I'm not going to find anything in the code, but 
unforturnately the liusConfig.xml is incomplete and I can not find the class 
names for the OpenOffice stuff in it.

This is all very frustrating since it should be a realatively easy to add 
support for unsupported formats. The Lucene code is very nice, lius code 
less so. Seems Lucene is setup to drop in new file formats I just do not 
know where to drop it in or what kind of objects need to be dropped in.

Oh well guess I will code up a Reader the just spites out "Here I am" a few 
hundred times and see what happens. LOL.


thank you for the reply and advice.

jim s



----- Original Message ----- 
From: "Andrzej Bialecki" <ab@getopt.org>
To: <java-user@lucene.apache.org>
Sent: Friday, May 25, 2007 1:10 PM
Subject: Re: Indexing help needed


> jim shirreffs wrote:
>
>> Thanks to all that try to help me out
>>
>> Jim S
>>
>> P.S. If I get it working I will be happy to email post the code.
>
> If you looked at the code in Nutch, you can take most of the parse-oo 
> plugin verbatim, because all this plugin does is it extracts the text 
> content and metadata from OO files.
>
>
>
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message