lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: indexing size
Date Thu, 09 Sep 2004 07:18:18 GMT
Dmitry Serebrennikov wrote:

> Niraj Alok wrote:
>
>> Hi PA,
>>
>> Thanks for the detail ! Since we are using lucene to store the data 
>> also, I
>> guess I would not be able to use it.
>>  
>>
> By the way, I could be wrong, but I think the 35% figure you 
> referenced in the your first e-mail actually does not include any 
> stored fields. The deal with 35% was, I think, to illustrate that 
> index data structures used for searching by Lucene are efficient. But 
> Lucene does nothing special about stored content - no compression or 
> anything like that. So you end up with the pure size of your data plus 
> the 35% of the indexed data.

There will be a patch available to the end of this week, which allows 
you to store binary values compressed within a lucene index. It means 
that you will be able to store and retrieve whole documents within 
lucene in a very efficient way ;-)

regards
bernhard

>
>
> Cheers.
> Dmitry.
>
>> Regards,
>> Niraj
>> ----- Original Message -----
>> From: "petite_abeille" <petite_abeille@mac.com>
>> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>> Sent: Wednesday, September 01, 2004 1:14 PM
>> Subject: Re: indexing size
>>
>>
>>  
>>
>>> Hi Niraj,
>>>
>>> On Sep 01, 2004, at 06:45, Niraj Alok wrote:
>>>
>>>   
>>>
>>>> If I make some of them Field.Unstored, I can see from the javadocs
>>>> that it
>>>> will be indexed and tokenized but not stored. If it is not stored, how
>>>> can I
>>>> use it while searching?
>>>>     
>>>
>>> The different type of fields don't impact how you do your search. This
>>> is always the same.
>>>
>>> Using Unstored fields simply means that you use Lucene as a pure index
>>> for search purpose only, not for storing any data.
>>>
>>> Specifically, the assumption is that your original data lives somewhere
>>> else, outside of Lucene. If this assumption is true, then you can index
>>> everything as Unstored with the addition of one Keyword per document.
>>> The Keyword field holds some sort of unique identifier which allows you
>>> to retrieve the original data if necessary (e.g. a primary key, an URI,
>>> what not).
>>>
>>> Here is an example of this approach:
>>>
>>> (1) For indexing, check the indexValuesWithID() method
>>>
>>> http://cvs.sourceforge.net/viewcvs.py/zoe/ZOE/Frameworks/SZObject/
>>> SZIndex.java?view=markup
>>>
>>> Note the addition of a Field.Keyword for each document and the use of
>>> Field.UnStored for everything else
>>>
>>> (2) For fetching, check objectsWithSpecificationAndHitsInStore()
>>>
>>> http://cvs.sourceforge.net/viewcvs.py/zoe/ZOE/Frameworks/SZObject/
>>> SZFinder.java?view=markup
>>>
>>> HTH.
>>>
>>> Cheers,
>>>
>>> PA.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>>
>>>   
>>
>>
>>  
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message