lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jim shirreffs" <>
Subject Re: Indexing help needed
Date Fri, 25 May 2007 19:30:24 GMT
Thanks for the advice, I just don't see where in the Lucene code I should 
plug OOParcer into Lucene.

I've walked the code in LIUS and Nutch (moving on to Solr) trying to find 
common objects. If I can find common objects in Lucene and Nutch I'll know 
where to plug in.

Lucene Objects looks like this


But when I search thru the Nutch or LIUS code I can not find these objects. 
LIUS uses reflection so I'm not going to find anything in the code, but 
unforturnately the liusConfig.xml is incomplete and I can not find the class 
names for the OpenOffice stuff in it.

This is all very frustrating since it should be a realatively easy to add 
support for unsupported formats. The Lucene code is very nice, lius code 
less so. Seems Lucene is setup to drop in new file formats I just do not 
know where to drop it in or what kind of objects need to be dropped in.

Oh well guess I will code up a Reader the just spites out "Here I am" a few 
hundred times and see what happens. LOL.

thank you for the reply and advice.

jim s

----- Original Message ----- 
From: "Andrzej Bialecki" <>
To: <>
Sent: Friday, May 25, 2007 1:10 PM
Subject: Re: Indexing help needed

> jim shirreffs wrote:
>> Thanks to all that try to help me out
>> Jim S
>> P.S. If I get it working I will be happy to email post the code.
> If you looked at the code in Nutch, you can take most of the parse-oo 
> plugin verbatim, because all this plugin does is it extracts the text 
> content and metadata from OO files.
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>  Contact: info at sigram dot com
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message