lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: indexing images of a document
Date Tue, 10 Jun 2008 19:29:54 GMT
Hi Bernd,

On 06/10/2008 at 2:40 PM, Bernd Mueller wrote:
> Actually it is not about searching the binary data. I just
> wanna have the whole document (together with the "image stuff")
> to be stored in the index. So when I got a hit of a search on a
> document that I can extract the document with its images (and
> the images' meta information).
> 
> Nice would be some data structure to be stored in the
> "image"-field of the document (maybe a HashSet?). So I
> could extract the found documents with all their image information
> from the index for easy visualization purpose of the search results.

Lucene field values are flat - the kind of complex object serialization and de-serialization
you're talking about is not directly supported.

That said, if you manage the packing and unpacking yourself, Lucene can help.  For example,
if you serialize the image data and its metadata as a YAML file, with the binary data encoded
as Base64 or something similar, you should be able to store and retrieve it as a String. 
Or maybe you could store a .jar file in a binary field - that would probably be simplest.

Steve

> Steven A Rowe wrote:
> > Hi Bernd,
> > 
> > It's still not clear what you want to do.  What will a search look like?
> > 
> > On 06/10/2008 at 8:36 AM, Bernd Mueller wrote:
> > > I will try to explain what I mean with image stuff. An image in
> > > xml-documents is usually an url to the location where the image is
> > > stored. Additionally, such an image tag contains information like
> > > offset, width and height. I am thinking about storing the images
> > > (binary data) and meta-information (offset, width, height) in one
> > > field instead of using these tagging information as values for a field
> > > in the index.
> > 
> > You seem to be saying that you want to store the binary image along
> > with its metadata all in one field - is this correct?  Why do you want
> > to do this?
> > 
> > > I guess, your last statement "Lucene doesn't index binary data..."
> > > indicates that this isn't possible...
> > 
> > While Lucene will not *index* binary data, it can *store* it.
> > 
> > Steve
> > 
> > > Erick Erickson wrote:
> > > > You add as many fields to the document while indexing as you need to
> > > > correctly contain your "image stuff".
> > > > 
> > > > If this answer seems cryptic, it's as clear as your problem statement
> > > > <G>. To give you a meaningful answer, we need a much clearer problem
> > > > statement. What is "image stuff", what format is it in, and what do
> > > > you want to do with it?
> > > > 
> > > > Lucene doesn't index binary data...
> > > > 
> > > > Best
> > > > Erick
> > > > 
> > > > On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller
> > > > <bernd.mueller@scai.fhg.de> wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > I have XML-documents containing image information. These images
> > > > > should be indexed with the document by having one additional field
> > > > > with the image stuff. Could anyone please give me some hints how
I
> > > > > can manage this?
> > > > > 
> > > > > Regards,
> > > > > Bernd

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message