jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Gash" <Simon.G...@gossinteractive.com>
Subject RE: Two problems
Date Thu, 16 Jun 2005 10:11:51 GMT
This is the posted message link from the archive, I found it very
useful. You also need to know how to add it to JackRabbit as a
TextFilterService so that JackRabbit can find it at start up and then
apply it to the right mimetype document. Don't forget your node needs to
include the type nt:Resource.

Simon 

-----Original Message-----
From: Miklos Pocsaji [mailto:miklos.pocsaji@gmail.com] 
Sent: 16 June 2005 11:06
To: Simon Gash
Subject: Re: Two problems

Can you post it again, please? Thank you.

Miklos Pocsaji.

On 6/16/05, Simon Gash <Simon.Gash@gossinteractive.com> wrote:
> Not sure if you are interested but I'm using a PDF text filter class 
> provided by the Apache Slide team. Its available from the archives or 
> I can post it again if you want.
> 
> Simon
> 
> -----Original Message-----
> From: Miklos Pocsaji [mailto:miklos.pocsaji@gmail.com]
> Sent: 16 June 2005 10:24
> To: jackrabbit-dev@incubator.apache.org
> Subject: Two problems
> 
> Hi!
> 
> Started working with Jackrabbit a month ago and I ran into two
problems:
> 
> 1. I saw a post here that the time-consuming startup is maintained by 
> somebody. Is there an improvement? Even if there are few hundred 
> megabytes of stored data, startup time (repository creation is really
> slow)
> 
> 2. I started writing a TextFilter which knows how to extract text from

> PDF (I implemented the TextFilter interface). It is simple, I only 
> have to return a java.io.Reader from which Jackrabbit extracts text.
> Obvious and ugly method would be to extract a text to a string and 
> then return a StringReader but this would require a lot of memory. I 
> decided to use PiperReader-PipedWriter - a separate thread writes the 
> text to a PipedWriter and I return the PipedReader instance from the
> doFilter() method. It seems that Jackrabbit won't read through the 
> passed stream immediately. I see my writing thread to stop, then after

> performing a search, it throws an exception that the other end of the 
> pipe is closed...
> I do not know if my approach is correct, so if somebody could, please 
> inform me if this thing could work somehow. I'm thinking about 
> examining the source itself but if somebody could help me I can spare 
> a lot of time.
> 
> Thank you in advance,
> Miklos Pocsaji.
> 
> Come visit us at:
> 
> Internet World 2005. June 14 - 16, Earls Court, Stand # A60
> 
> Government Computing Expo. June 21 & 22, Earls Court, Stand # 804
> 
> SOCITM Annual Event. October 16 - 18 Brighton Hotel, Stand # 28 GOSS -

> Ranked 4th in the Deloitte Technology Fast 50 Awards 2004 and 88th in
the Deloitte Technology Fast 500 EMEA.
> 
> This email contains proprietary information, some or all of which may
be legally privileged. It is for the intended recipient only. If an
addressing or transmission error has misdirected this email, please
notify the author by replying to this email. If you are not the intended
recipient you may not use, disclose, distribute, copy, print or rely on
this email.
> 
> 
> 
> Email transmission cannot be guaranteed to be secure or error free, as
information may be intercepted, corrupted, lost, destroyed, arrive late
or incomplete or contain viruses. This email and any files attached to
it have been checked with virus detection software before transmission.
You should nonetheless carry out your own virus check before opening any
attachment. GOSS Interactive Ltd accepts no liability for any loss or
damage that may be caused by software viruses.
> 
> 
>

Mime
View raw message