Mailing-List: contact jackrabbit-dev-help@incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jackrabbit-dev@incubator.apache.org
Received-SPF: pass (hermes.apache.org: domain of marcel.reutegger@gmx.net
 designates 213.165.64.20 as permitted sender)
Message-ID: <42B1501C.5000201@gmx.net>
Date: Thu, 16 Jun 2005 12:10:36 +0200
From: Marcel Reutegger <marcel.reutegger@gmx.net>
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
MIME-Version: 1.0
To: jackrabbit-dev@incubator.apache.org
Subject: Re: Two problems
References: <139eaef50506160223189d474d@mail.gmail.com>
In-Reply-To: <139eaef50506160223189d474d@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Miklos Pocsaji wrote:
> 2. I started writing a TextFilter which knows how to extract text from
> PDF (I implemented the TextFilter interface). It is simple, I only
> have to return a java.io.Reader from which Jackrabbit extracts text.
> Obvious and ugly method would be to extract a text to a string and
> then return a StringReader but this would require a lot of memory. I
> decided to use PiperReader-PipedWriter - a separate thread writes the
> text to a PipedWriter and I return the PipedReader instance from the
> doFilter() method. It seems that Jackrabbit won't read through the
> passed stream immediately.

Indexing of nodes is buffered in jackrabbit. this may mean that nodes 
are not added to the index until a query is issued.

As far as I can see you have to make sure that the PipedWriter is not 
closed until the PipedReader is closed.

regards
  marcel