lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Stored fields: decompression slows down in my scenario ... any idea for a workaround?
Date Sun, 23 Jun 2013 17:40:19 GMT
Hi,

To do this type of processing, use the new DocValues field type. They are like FieldCache
but persisted to disk. Different datatypes exist and can be used to get random access based
on document number. They are organized as column-stride fields, means each column is a separate
data structure with random access like a big array (persisted on disk).

Stored Fields should *only* ever be used to display search results!

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: mathias.lux@gmail.com [mailto:mathias.lux@gmail.com] On Behalf Of
> Mathias Lux
> Sent: Sunday, June 23, 2013 7:27 PM
> To: java-user@lucene.apache.org
> Subject: Stored fields: decompression slows down in my scenario ... any idea
> for a workaround?
> 
> Hi!
> 
> I'm managing the development of LIRE
> (https://code.google.com/p/lire/), a image search toolbox based on Lucene.
> While optimizing different search routines for global image features I came
> around to take a look at the CPU usage, i.e. to see if my new distance
> function is faster than the old one :)
> 
> Unfortunately I found out the the decompression routine for stored fields
> made up for nearly 60% of the search time. (see
> http://www.semanticmetadata.net/?p=1092)
> 
> So what I basically do is to open each document in an index sequentially,
> check it upon distance to a query feature and maintain my result list. The
> image features are in stored fields, byte[] arrays. I optimized quite a lot to
> get them really small and fast to parse and store.
> 
> I know that this is not the way Lucene is intended to use, I'm working with
> Lucene for years now :) And just to ensure you: approximate indexing and
> local feature search are based on terms, ... and fast.
> But linear search makes up an important part of LIRE, so I'd be glad to get
> some suggestions how either to disable compression, or how to sneak in
> byte[] data with some textual data that is "fast as hell" to read.
> 
> cheers,
>   Mathias
> 
> ps. I know that it'd be possible to write it to a data file, put it into memory
> and gain a lot of speed. But of course I'd prefer to maintain "just one" index
> and not two of them :)
> 
> --
> Dr. Mathias Lux
> Assistant Professor, Klagenfurt University, Austria http://tinyurl.com/mlux-
> itec
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message