lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Smith <tsm...@attivio.com>
Subject Re: Stored fields access
Date Thu, 25 Feb 2010 20:52:07 GMT
yeah,

i would like to see a more "term-vector"/sax like api for extracting
values that requires no extra object overhead as well

pass in a "collector" that will call methods as fields are encountered
(and can return false if walking the document should stop (or some Enum
for more options))

i just throw away the lucene Document and Field objects when i'm done
with them anyway (well i'll cache them in an LRU cache for later reuse,
but i could do smarter things if i didn't need the lucene Document
object in the first place)

 -- Tim

Earwin Burrfoot wrote:
> Missed that, I have a heap of unread Jira mails :/
>
> Okay, you're reusing Document object and the list inside. To reuse
> Fieldable instances you'd have to do some very awkward things.
> More awkward things are required to extract your longed-for values
> from the Document.
> To add insult to injury, Document and Fieldable define a boatload of
> stuff that is used at indexation-time, but has zero meaning at
> search-time.
> This is just broken, quickly-hacked-together API.
>
> 2010/2/25 Tim Smith <tsmith@attivio.com>:
>   
>> I created LUCENE-2276 a couple of days ago to at least allow reusing
>> Document objects (didn't see any interest from anyone though)
>>
>>  -- Tim
>>
>> Erick Erickson wrote:
>>
>> OK, never mind <G>....
>> Erick
>>
>> On Thu, Feb 25, 2010 at 1:48 PM, Earwin Burrfoot <earwin@gmail.com> wrote:
>>     
>>> My issue is with extra objects created in the process. Field selection
>>> can be handled with, well, FieldSelector.
>>>
>>> 2010/2/25 Erick Erickson <erickerickson@gmail.com>:
>>>       
>>>> Does LazyLoading address this? I'm assuming your issue is
>>>> that the default behavior loads the entire document regardless
>>>> of whether you actually want all the fields.....
>>>> Erick
>>>>
>>>> On Thu, Feb 25, 2010 at 7:52 AM, Earwin Burrfoot <earwin@gmail.com>
>>>> wrote:
>>>>         
>>>>> I'm thinking, should Lucene introduce new interface to read stored
>>>>> document fields?
>>>>>
>>>>> Current 'Document document(int n)' mechanism is barely usable due to
>>>>> overhead involved. While I believe underlying index structure works
>>>>> pretty fast (if it fits in memory, as is the case for most
>>>>> performance-concerned installations), there's no adequate access to it
>>>>> and people are forced to introduce contraptions like LinkedIn's
>>>>> payload-assisted luceneId<->appId mapping or similar caches we
employ.
>>>>>
>>>>> What I am thinking about is something along the lines of existing
>>>>> iterators like TermDocs/TermPositions. Iterate over docs, then iterate
>>>>> over fields stored for each, extract data, ???, profit.
>>>>> Comments?
>>>>>
>>>>> --
>>>>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>>>>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>>>>> ICQ: 104465785
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>>>
>>>>>           
>>>>         
>>>
>>> --
>>> Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
>>> Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
>>> ICQ: 104465785
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>>
>>>       
>>
>>     
>
>
>
>   


Mime
View raw message