cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Luciani <jak...@gmail.com>
Subject Re: Lucandra Limitations
Date Thu, 27 Jan 2011 20:46:50 GMT
The latest iteration of Lucandra, called Solandra, creates localized
sub-indexes of size N and spreads them around the cassandra ring. Then using
solr, will behind the scenes search all the subindexes in parallel. This
approach should give you what you need and it would be great to have such a
large dataset used for testing out the limits of solandra.

Solandra is here: http://github.com/tjake/lucandra

-Jake

On Thu, Jan 27, 2011 at 3:30 PM, David G. Boney <
dboney1@semanticartifacts.com> wrote:

> I am new to Lucene and Lucandra.
>
> My use case is that I have a trillion URIs to index with Lucene. Each URI
> is either a resource or literal in an RDF graph. Each URI is a document for
> Lucene
>
> If I were using Lucene, my understanding is that it would create a segment,
> stuff as many URIs in the segment until it hit either the document limit,
> around 2 billion, of the maximum size of the index. Lets say for the sake of
> argument that I only store 1billion URIs in a segment, then I would have
> 1000 segments to index my URIs.
>
> Lucandra does not support segments. How would I index a trillion URIs?
> Based on the below comments, I could only have around 2 billion URIs, or
> documents, per index. Would I have to create separate indexes to store all
> the URIs? Using the case where I store only 1 billion URIs in an index,
> would I have to create 1000 indexes? Since these are indexes and not
> segments, which would have been handled by Lucene, do I have to do my search
> against each index? Lucene supports the ability to create multiple
> IndexSearchers and stick them in a MultiSearcher.
>
> Is this the right way to view the problem?
>
> -------------
> Sincerely,
> David G. Boney
> dboney1@semanticartifacts.com
> http://www.semanticartifacts.com
>
>
>
>
> On Jan 27, 2011, at 12:45 PM, Jake Luciani wrote:
>
> Yes, but that's also the lucene limit
> http://lucene.apache.org/java/3_0_1/fileformats.html#Limitations
>
> "Lucene uses a Java int to refer to document numbers, and the index file
> format uses an Int32"
>
>
>
> On Thu, Jan 27, 2011 at 1:40 PM, David G. Boney <
> dboney1@semanticartifacts.com> wrote:
>
>> I was reviewing the Lucandra schema presented on the below page at
>> Datastax:
>>
>> http://www.datastax.com/docs/0.7/data_model/lucandra
>>
>> In the TermInfo Super Column Family, docID is the key for a supercolumn.
>> Does this imply that the maximum number of documents that can be index for a
>> term with Lucandra is two billion, the maximum number of columns?
>>
>> -------------
>> Sincerely,
>> David G. Boney
>> dboney1@semanticartifacts.com
>> http://www.semanticartifacts.com
>>
>>
>>
>>
>>
>
>

Mime
View raw message