lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: What's the best way to store metadata?
Date Mon, 16 Feb 2009 21:52:21 GMT

On Feb 11, 2009, at 1:05 PM, Mandy G√ľnther wrote:

> Hi all,
> I want to use Lucene in my project but I have the following problem:
> My goal is to store metadata in the index.
> Her is an example of the stuff I need to index
> 1; My; determiner;			
> 2; project; noun; 1
> 3; uses; verb; 1
> 4; Lucene; noun; 1
> 6; I; noun;		
> 7; need; verb; 6
> 8; help; noun; 6
> It's a file containing rows with 4 columns.
> The first column is a sequential number. The second column is the text
> that need to be indexed. In my original file it's more than one word.
> The third column
> classifies the second column and the fourth column tells me where the
> sentence of the word in the second column starts.
> I will have large datafiles(up to 30GB) to index. Indexing and
> searching performance is very important.
> I tried to store the meta data(column no. 3 and 4) over the field  
> names, e.g.
> doc.add(new Field("1|determiner", My, ...));
> doc.add(new Field("1|noun", project, ...));
> ...
> doc.add(new Field("2|noun", I, ...));
> But that's not a classy way!
> I was thinking about using Payloads. But the API says the Payload
> Status is experimantal and the performance is getting bad using
> payloads.
> I am not even sure if Payloads are meant to be used to store that  
> kind of data.
> Is it possible to store that kind of metadata as payload?

I think it makes sense to store these as payloads.  You might even  
look at the 2.9-dev trunk version for the new AttributeSource  
capabilities that allow you to, essentially, strongly type payloads  
and provide custom serialization capabilities.

As for the "experimental" phase of payloads, I'd say they are here to  
stay, it's just that the exact methods may change.

How do you plan to search using the info?  Note, that the new  
AttributeSource stuff is not yet searchable, whereas the Payload stuff  
is.  Performance, I would think, should be roughly equivalent to using  
Spans, but don't quote me on it.


Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message