lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mike.schultz" <>
Subject Re: capturing field length into a stored document field
Date Fri, 04 Sep 2009 16:32:53 GMT

Sorry wrong list

mike.schultz wrote:
> For various statistics I collect from an index it's important for me to
> know the length (measured in tokens) of a document field.  I can get that
> information to some degree from the "norms" for the field but a) the
> resolution isn't that great, and b) more importantly, if boosts are used
> it's almost impossible to get lengths from this.
> Here's two ideas I was thinking about that maybe some can comment on.
> 1) Use copyto to copy the field in question, fieldA to an addition field,
> fieldALength, which has an extra filter that just counts the tokens and
> only outputs a token representing the length of the field.  This has the
> disadvantage of retokenizing basically the whole document (because the
> field in question is basically the body).  Plus I would think littering
> the term space with these tokens might be bad for performance, I'm not
> sure.
> 2) Add a filter to the field in question which again counts the tokens. 
> This filter allows the regular tokens to be indexed as usual but somehow
> manages to get the token-count into a stored field of the document.  This
> has the advantage of not having to retokenize the field and instead of
> littering the token space, the count becomes docdata for each doc.  Can
> this be done?  Maybe using threadLocal to temporarily store the count?
> Thanks.

View this message in context:
Sent from the Solr - Dev mailing list archive at

View raw message