lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Wang <john.w...@gmail.com>
Subject Re: custom segment files
Date Fri, 18 Sep 2009 00:14:24 GMT
Sure.

A simple example:

Say you have a type of field with fixed length data per doc, e.g. a 8 bytes.
It might be good to store in a segment:
<numdocs><v1><v2>....<vn>

so if you have 1000 docs, your seg file is 8k+4 bytes.

Merging would be rather trivial as well.

Doing this right now involves storing into payload, which pays a cost of
parsing byte[] to say a long per doc.

I think this problem is orthogonal to 1458.

There are other usecases, so I thought it might be a good idea to abstract
it out, since on a high level it is rather similar:

start
write per doc
end
merge

Hopefully I am describing it clearly.

Thanks

-John


On Thu, Sep 17, 2009 at 10:35 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> I'm actively working on LUCENE-1458, to enable differenct codecs for
> reading/writing the terms dict and doc/freq/prox/payload postings.
> I'm working now towards getting PforDelta working...
>
> However, that change doesn't [yet] do anything for norms, stored
> fields nor term vectors.
>
> Can you describe more details about what kinds of customization you're
> looking to do?
>
> Mike
>
> On Thu, Sep 17, 2009 at 10:00 AM, John Wang <john.wang@gmail.com> wrote:
> > Hi guys:
> >
> >      I am trying to figure how to add the ability to create custom
> segment
> > files. Hopefully it is possible to create a plugin framework where one
> can
> > provide some sort of callback to add to a segment given a doc and provide
> > some sort of merge logic. This is in light of the flexible indexing
> effort.
> >
> >      After digging thru the latest trunk code in that area, I see a
> > Writer/WriterPerThread pattern for different types of segment files, e.g.
> > Stored data, norms, inverted doc, etc.
> >
> >      Do you think it is a good idea to consolidate them? Are there
> > intricacies where there are cross dependency between different types of
> > writers?
> >
> >      Merge logic seems to be in the SegmentMerger class. Seems to do
> this,
> > it would be good to separate it out to per writer type.
> >
> >       I am still trying to understand the code, any help is greatly
> > appreciated.
> >
> > Thoughts?
> >
> > Thanks
> >
> > -John
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message