incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Rutherglen <jason.rutherg...@gmail.com>
Subject Re: Cassandra in memory key index
Date Sat, 09 Jun 2012 14:47:32 GMT
> does FST provide a predecessor lookup function

Yes [1].  seekCeil, seekFloor, seekExact are the methods provided for
seeking.

1.
https://github.com/apache/lucene-solr/blob/trunk/lucene/core/src/java/org/apache/lucene/util/fst/BytesRefFSTEnum.java

On Fri, Jun 8, 2012 at 4:03 PM, Pavel Yaskevich <povel.y@gmail.com> wrote:

> Yeah, that is why I wrote "if possible" :) Also, does FST provide a
> predecessor lookup function, wasn't clear from the blog post?
>
> On Friday 8 June 2012 at 22:53, Jason Rutherglen wrote:
>
> > Yeah that's fine, however if there isn't a Java implementation that's a
> lot
> > of extra work. The FST approach should be a clear quick and easy win. The
> > current system of in heap keys and a binary search is what the FST
> replaced
> > in Lucene. There are plenty of references to the improvement.
> >
> > On Fri, Jun 8, 2012 at 3:50 PM, Pavel Yaskevich <povel.y@gmail.com(mailto:
> povel.y@gmail.com)> wrote:
> >
> > > I would vote, if possible, to compare it with y-fast trie [1] (it
> doesn't
> > > seem to be available java implementation unfortunately) by means of
> memory
> > > efficiency and lookup performance. As we use big integer tokens the
> main
> > > benefit from that trie could be O(log log M) predecessor lookup and
> compact
> > > in-memory size.
> > >
> > > [1] https://en.wikipedia.org/wiki/Y-fast_trie
> > >
> > > Best Regards
> > > --
> > > Pavel Yaskevich
> > >
> > >
> > > On Friday 8 June 2012 at 22:19, Jason Rutherglen wrote:
> > >
> > > > Ok looks like the IndexSummary encapsulates everything, I can start
> with
> > > > hacking that.
> > > >
> > > > On Fri, Jun 8, 2012 at 11:50 AM, Jason Rutherglen <
> > > > jason.rutherglen@gmail.com (mailto:jason.rutherglen@gmail.com)>
> wrote:
> > > >
> > > > > The Cassandra integration is probably beyond the time I have
> available.
> > > > > If the locations in the code that need to be rewritten to use the
> FST
> > > > >
> > > >
> > > >
> > >
> > > are
> > > > > known, and a patch simply 'plugs-in' the FST, that would be much
> > > >
> > >
> > > easier.
> > > > > Eg, I don't know how Cassandra stores the current key index for
> > > >
> > >
> > > example...
> > > > >
> > > > > Basically I can write FST serializing, deserializing, and key
> lookup
> > > code
> > > > > fairly easy by basing it on Lucene's terms dict.
> > > > >
> > > > >
> > > > > On Fri, Jun 8, 2012 at 6:00 AM, Radim Kolar <hsn@filez.com(mailto:
> hsn@filez.com) (mailto:
> > > hsn@filez.com (mailto:hsn@filez.com))> wrote:
> > > > >
> > > > > >
> > > > > > If you are interested I can help, I used the FST on a Hadoop
> project
> > > > > > > to implement a fast map side range join.
> > > > > >
> > > > > >
> > > > > > create JIRA item with patch attached, i will test it.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message