lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: External strings sort and case folding.
Date Tue, 14 Jun 2011 12:24:08 GMT
In theory, you could use the codec API directly, adding "chunks" of
pre-sorted terms, and then fake up a SegmentInfo to make it look like
some kind of degenerate segment, and then merge them?

But it's gonna be a lot of work to do that :)

Merging FSTs sounds cool!

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 14, 2011 at 8:18 AM, Dawid Weiss
<dawid.weiss@cs.put.poznan.pl> wrote:
>> So actually it would work if you just enum'd the terms yourself, after
>> indexing and optimizing.  And this does amount to an external sort, I
>> think!
>
> Yep. I was just curious if there's a way to do it without the overhead
> of creating fields, documents, etc. If I have a spare minute I'll try
> to write a merge sort from disk splits. It'd be neat to write FST
> merging too (so that, given to FSTs you could merge them into one by
> creating a new FST and adding sequences in order from one or the other
> source).
>
> Dawid
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message