lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Index of Lucene
Date Sat, 23 Aug 2008 13:45:52 GMT

It is in fact 1 byte per field (that stores norms), per document.  So  
if you have 7 fields in the doc that store norms, that uses up 7 bytes.

And, because the storage is non-sparse, even documents which don't  
have a given field X will still use up 1 byte, if field X stores norms.

Also, beware when disabling norms: you must disable norms for every  
single occurrence of that field in any document in your index.  If  
even one document exists that did not disable norms for that field  
then that will "spread" to all other docs, during segment merging.

Mike

Otis Gospodnetic wrote:

> Is that really 1 byte for each document?  Not 1 byte for each field  
> of each document?
>
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Doron Cohen <cdoronc@gmail.com>
>> To: java-user@lucene.apache.org
>> Sent: Monday, August 18, 2008 12:11:28 AM
>> Subject: Re: Index of Lucene
>>
>> Norms information comes mainly from lengths of documents - allowing  
>> the
>> search time scoring to take into account the effect of document  
>> lengths
>> (actually
>> field length within a document). In practice, norms stored within  
>> the index
>> may include
>> other information, such as index time boosts - for a document, for  
>> a field.
>> A single
>> byte is stored for each field, - so for this the actual value is  
>> compressed.
>> At search
>> time, norms are loaded into memory, and so consume 1 byte for each  
>> document.
>> It is possible to disable norms for a field while indexing. This is
>> explained
>> better in the javadoc for Similarity, and here:
>> http://lucene.apache.org/java/2_3_2/scoring.html
>>
>> Doron
>>
>> On Mon, Aug 18, 2008 at 5:59 AM, blazingwolf7 wrote:
>>
>>>
>>> Hi,
>>>
>>> I am currently using Lucene for indexing. After a index a file, I  
>>> will use
>>> LUKE to open it and check the index. And there is 1 part that I am  
>>> curious
>>> about. In Luke, under the Document tab, I randomly select a  
>>> document and
>>> display it. At the bottom will be 4 columns, Field, ITSVopLBC,  
>>> Norm and
>>> String Value.
>>>
>>> I am wondering, what is Norm for? And where is it created during  
>>> indexing
>>> time? Which method calculates it?
>>>
>>> Could anyone advise me on this? Thanks for the help
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Index-of-Lucene-tp19025490p19025490.html
>>> Sent from the Lucene - Java Users mailing list archive at  
>>> Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message