cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammed Guller <moham...@glassbeam.com>
Subject RE: Lucene index plugin for Apache Cassandra
Date Fri, 12 Jun 2015 16:21:10 GMT
The plugin looks cool. Thank you for open sourcing it.

Does it support faceting and other Solr functionality?

Mohammed

From: Andres de la Peña [mailto:adelapena@stratio.com]
Sent: Friday, June 12, 2015 3:43 AM
To: user@cassandra.apache.org
Subject: Re: Lucene index plugin for Apache Cassandra

I really appreciate your interest

Well, the first recommendation is to not use it unless you need it, because a properly Cassandra
denormalized model is almost always preferable to indexing. Lucene indexing is a good option
when there is no viable denormalization alternative. This is the case of range queries over
multiple dimensions, full-text search or maybe complex boolean predicates. It's also appropriate
for Spark/Hadoop jobs mapping a small fraction of the total amount of rows in a certain table,
if you can pay the cost of indexing.

Lucene indexes run inside C*, so users should closely monitor the amount of used memory. It's
also a good idea to put the Lucene directory files in a separate disk to those used by C*
itself. Additionally, you should consider that indexed tables write throughput will be appreciably
reduced, maybe to a few thousands rows per second.

It's really hard to estimate the amount of resources needed by the index due to the great
variety of indexing and querying ways that Lucene offers, so the only thing we can suggest
is to empirically find the optimal setup for your use case.

2015-06-12 12:00 GMT+02:00 Carlos Rolo <rolo@pythian.com<mailto:rolo@pythian.com>>:
Seems like an interesting tool!
What operational recommendations would you make to users of this tool (Extra hardware capacity,
extra metrics to monitor, etc)?

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo<http://linkedin.com/in/carlosjuzarterolo>
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com<http://www.pythian.com/>

On Fri, Jun 12, 2015 at 11:07 AM, Andres de la Peña <adelapena@stratio.com<mailto:adelapena@stratio.com>>
wrote:
Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon
as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr,
with some overhead due to the need for indexing both the Cassandra's row key and the partition's
token. You can also take a look at this presentation<http://planetcassandra.org/video-presentations/vp/cassandra-summit-europe-2014/vd/stratio-advanced-search-and-top-k-queries-in-cassandra/>
to see how cluster distribution is done.

2015-06-12 0:45 GMT+02:00 Ben Bromhead <ben@instaclustr.com<mailto:ben@instaclustr.com>>:
Looks awesome, do you have any examples/benchmarks of using these indexes for various cluster
sizes e.g. 20 nodes, 60 nodes, 100s+?

On 10 June 2015 at 09:08, Andres de la Peña <adelapena@stratio.com<mailto:adelapena@stratio.com>>
wrote:
Hi all,

With the release of Cassandra 2.1.6, Stratio is glad to present its open source Lucene-based
implementation of C* secondary indexes<https://github.com/Stratio/cassandra-lucene-index>
as a plugin that can be attached to Apache Cassandra. Before the above changes, Lucene index
was distributed inside a fork of Apache Cassandra, with all the difficulties implied. As of
now, the fork is discontinued and new users should use the recently created plugin, which
maintains all the features of Stratio Cassandra<https://github.com/Stratio/stratio-cassandra>.

Stratio's Lucene index extends Cassandra’s functionality to provide near real-time distributed
search engine capabilities such as with ElasticSearch or Solr, including full text search
capabilities, free multivariable search, relevance queries and field-based sorting. Each node
indexes its own data, so high availability and scalability is guaranteed.

We hope this will be useful to the Apache Cassandra community.

Regards,

--

Andrés de la Peña

[http://www.stratio.com/wp-content/uploads/2014/05/stratio_logo_2014.png]<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42<tel:%2B34%2091%20352%2059%2042> // @stratiobd<https://twitter.com/StratioBD>



--

Ben Bromhead

Instaclustr | www.instaclustr.com<https://www.instaclustr.com/> | @instaclustr<http://twitter.com/instaclustr>
| (650) 284 9692



--

Andrés de la Peña

[http://www.stratio.com/wp-content/uploads/2014/05/stratio_logo_2014.png]<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42<tel:%2B34%2091%20352%2059%2042> // @stratiobd<https://twitter.com/StratioBD>



--





--

Andrés de la Peña

[http://www.stratio.com/wp-content/uploads/2014/05/stratio_logo_2014.png]<http://www.stratio.com/>
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 352 59 42 // @stratiobd<https://twitter.com/StratioBD>
Mime
View raw message