spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Costin Leau <costin.l...@gmail.com>
Subject Re: solr in spark
Date Wed, 29 Apr 2015 15:13:00 GMT
On 4/29/15 6:02 PM, Jeetendra Gangele wrote:
> Thanks for detail explanation. My only worry is to search the all combinations of company
names through ES looks hard.
>

I'm not sure what makes you think "ES looks hard". Have you tried browsing the Elasticsearch
reference or the definitive 
guide?

[1] http://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
[2] http://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

> in solr we define everything in xml files like all attributes in WordDocumentFilterFactory
and shingles factory. how to
> do this in elastic search?
>

See the links above, the IRC or the mailing list. I don't want to derail this thread any longer
so I'll wrap up by 
pointing to one
of the many resources that pop up on google - a blog post on shingles and a post from Found.no
on text analysis and shingles

https://www.elastic.co/blog/searching-with-shingles
https://www.found.no/foundation/text-analysis-part-1/#optimizing-phrase-searches-with-shingles

If you need more help, do reach out to the Elasticsearch mailing list:
https://www.elastic.co/community

Cheers,

>
>
> On 29 April 2015 at 20:03, Costin Leau <costin.leau@gmail.com <mailto:costin.leau@gmail.com>>
wrote:
>
>     # disclaimer I'm an employee of Elastic (the company behind Elasticsearch) and lead
of Elasticsearch Hadoop integration
>
>     Some things to clarify on the Elasticsearch side:
>
>     1. Elasticsearch is a distributed, real-time search and analytics engine. Search
is just one aspect of it and it can
>     work with any type of data (whether it's text, image encoding, etc...): Github, Wikipedia,
Stackoverflow are popular
>     examples of known websites that are powered by Elasticsearch. In fact you can find
plenty of use cases and
>     information about this on the website [1].
>
>     2. Elasticsearch is stand-alone and can be run on the same or separate machines as
other services. In fact, on the
>     _same_ machine, one can run _multiple_ Elasticsearch nodes (and thus clusters). For
best performance, having
>     dedicated hardware (as Nick suggested) works best.
>
>     3. The Elasticsearch Spark integration has been available for over a year through
Map/Reduce and the native (Scala
>     and Java) API since q3 last year. There are plenty of features available which are
fully documented here [2]. Better
>     yet, there's a talk by yours truly from Spark Summit East [3] that is fully focused
on exactly this topic.
>
>     4. elasticsearch-hadoop is certified by Databricks, Cloudera, Hortonworks and MapR
and supports both Spark core and
>     Spark SQL 1.0-1.3. There are binaries for Scala 2.10 and 2.11. And for what it's
worth, it provided on of the first
>     (if not the first) implementation of DataSource API outside Databricks, which means
not only using Elasticsearch in
>     declarative fasion but also having push-down support for operators.
>
>     Hopefully these materials will get you started with Spark and Elasticsearch and also
clarify some of the
>     misconceptions about Elasticsearch.
>
>     Cheers,
>
>     [1] https://www.elastic.co/products/elasticsearch
>     [2] http://www.elastic.co/guide/en/elasticsearch/hadoop/master/reference.html
>     [3] http://spark-summit.org/east/2015/talk/using-spark-and-elasticsearch-for-real-time-data-analysis
>
>
>     On 4/28/15 8:16 PM, Nick Pentreath wrote:
>
>         Depends on your use case and search volume. Typically you'd have a dedicated
ES cluster if your app is doing a
>         lot of
>         real time indexing and search.
>
>         If it's only for spark integration then you could colocate ES and spark
>
>         —
>         Sent from Mailbox <https://www.dropbox.com/mailbox>
>
>
>         On Tue, Apr 28, 2015 at 6:41 PM, Jeetendra Gangele <gangele397@gmail.com <mailto:gangele397@gmail.com>
>         <mailto:gangele397@gmail.com <mailto:gangele397@gmail.com>>> wrote:
>
>              Thanks for reply.
>
>              Elastic search index will be within my Cluster? or I need the separate host
the elastic search?
>
>
>              On 28 April 2015 at 22:03, Nick Pentreath <nick.pentreath@gmail.com <mailto:nick.pentreath@gmail.com>
>         <mailto:nick.pentreath@gmail.com <mailto:nick.pentreath@gmail.com>>>
wrote:
>
>                  I haven't used Solr for a long time, and haven't used Solr in Spark.
>
>                  However, why do you say "Elasticsearch is not a good option ..."? ES
absolutely supports full-text
>         search and
>                  not just filtering and grouping (in fact it's original purpose was and
still is text search, though
>         filtering,
>                  grouping and aggregation are heavily used).
>         http://www.elastic.co/guide/en/elasticsearch/guide/master/full-text-search.html
>
>
>
>                  On Tue, Apr 28, 2015 at 6:27 PM, Jeetendra Gangele <gangele397@gmail.com
<mailto:gangele397@gmail.com>
>         <mailto:gangele397@gmail.com <mailto:gangele397@gmail.com>>> wrote:
>
>                      Does anyone tried using solr inside spark?
>                      below is the project describing it.
>         https://github.com/LucidWorks/spark-solr.
>
>                      I have a requirement in which I want to index 20 millions companies
name and then search as and
>         when new
>                      data comes in. the output should be list of companies matching the
query.
>
>                      Spark has inbuilt elastic search but for this purpose Elastic search
is not a good option since this is
>                      totally text search problem?
>
>                      Elastic search is good  for filtering and grouping.
>
>                      Does any body used solr inside spark?
>
>                      Regards
>                      jeetendra
>
>
>
>
>
>     --
>     Costin
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
>     For additional commands, e-mail: user-help@spark.apache.org <mailto:user-help@spark.apache.org>
>
>
>
>
>

-- 
Costin


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message