lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earwin Burrfoot <>
Subject Re: FST and FieldCache?
Date Thu, 19 May 2011 08:24:48 GMT
You cannot get a string out of automaton by its ordinal without
storing additional data.
The string is stored there not as a single arc, but as a sequence of
them (basically.. err.. as a string),
so referencing them is basically writing the string asis. Space
savings here come from sharing arcs between strings.

Though, it's possible to do if you associate an additional number with
each node. (I invented some way, shared it with Mike and forgot.. good
grief :/)

Perfect hashing, on the other hand, is like a Map<String, Integer>
that accepts a predefined set of N strings and returns an int in
0..N-1 interval.
And it can't do the reverse lookup, by design, that's a lossy
compression for all good perfect hashing algos.
So, it's irrelevant here, huh?

On Thu, May 19, 2011 at 08:53, David Smiley (
<> wrote:
> I've been pondering how to reduce the size of FieldCache entries when there
> are a large number of Strings. I'd like to facet on such a field with Solr
> but with less memory.  As I understand it, FSTs are a highly compressed
> representation of a set of Strings (among other possibilities).  The
> fieldCache would need to point to an FST entry (an "arc"?) using something
> small, say an integer.  Is there a way to point to an FST entry with an
> integer, and then somehow with relative efficiency construct the String from
> the arcs to get there?
> ~ David Smiley
> -----
>  Author:
> --
> View this message in context:
> Sent from the Lucene - Java Developer mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Kirill Zakharenko/Кирилл Захаренко
Phone: +7 (495) 683-567-4
ICQ: 104465785

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message