jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Gash" <Simon.G...@gossinteractive.com>
Subject RE: Two problems
Date Thu, 16 Jun 2005 09:28:23 GMT
Not sure if you are interested but I'm using a PDF text filter class
provided by the Apache Slide team. Its available from the archives or I
can post it again if you want.


-----Original Message-----
From: Miklos Pocsaji [mailto:miklos.pocsaji@gmail.com] 
Sent: 16 June 2005 10:24
To: jackrabbit-dev@incubator.apache.org
Subject: Two problems


Started working with Jackrabbit a month ago and I ran into two problems:

1. I saw a post here that the time-consuming startup is maintained by
somebody. Is there an improvement? Even if there are few hundred
megabytes of stored data, startup time (repository creation is really

2. I started writing a TextFilter which knows how to extract text from
PDF (I implemented the TextFilter interface). It is simple, I only have
to return a java.io.Reader from which Jackrabbit extracts text.
Obvious and ugly method would be to extract a text to a string and then
return a StringReader but this would require a lot of memory. I decided
to use PiperReader-PipedWriter - a separate thread writes the text to a
PipedWriter and I return the PipedReader instance from the
doFilter() method. It seems that Jackrabbit won't read through the
passed stream immediately. I see my writing thread to stop, then after
performing a search, it throws an exception that the other end of the
pipe is closed...
I do not know if my approach is correct, so if somebody could, please
inform me if this thing could work somehow. I'm thinking about examining
the source itself but if somebody could help me I can spare a lot of

Thank you in advance,
Miklos Pocsaji.

Come visit us at:
Internet World 2005. June 14 - 16, Earls Court, Stand # A60

Government Computing Expo. June 21 & 22, Earls Court, Stand # 804

SOCITM Annual Event. October 16 - 18 Brighton Hotel, Stand # 28
GOSS - Ranked 4th in the Deloitte Technology Fast 50 Awards 2004 and 88th in the Deloitte
Technology Fast 500 EMEA. 

This email contains proprietary information, some or all of which may be legally privileged.
It is for the intended recipient only. If an addressing or transmission error has misdirected
this email, please notify the author by replying to this email. If you are not the intended
recipient you may not use, disclose, distribute, copy, print or rely on this email. 


Email transmission cannot be guaranteed to be secure or error free, as information may be
intercepted, corrupted, lost, destroyed, arrive late or incomplete or contain viruses. This
email and any files attached to it have been checked with virus detection software before
transmission. You should nonetheless carry out your own virus check before opening any attachment.
GOSS Interactive Ltd accepts no liability for any loss or damage that may be caused by software

View raw message