spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Takeshi Yamamuro (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25232) Support Full-Text Search in Spark SQL
Date Sat, 25 Aug 2018 04:10:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592442#comment-16592442
] 

Takeshi Yamamuro commented on SPARK-25232:
------------------------------------------

RLIKE is not enough? IMO they load data into databases and build indexes on them for efficiency,
and,
on the other hand, Spark runs queries in situ over the raw data directly.
So, I'm not sure that the index approach is suitable for Spark.

Anyway, I think you need to follow SPIP for new feature propsal: https://spark.apache.org/improvement-proposals.html


> Support Full-Text Search in Spark SQL
> -------------------------------------
>
>                 Key: SPARK-25232
>                 URL: https://issues.apache.org/jira/browse/SPARK-25232
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.3.1
>            Reporter: Lijie Xu
>            Priority: Major
>
> Full-text search (i.e., keyword search) is widely used in search engines and relational
databases such as MATCH() ... AGAINST operator in MySQL (https://dev.mysql.com/doc/en/fulltext-search.html),
Text query in Oracle (https://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#g1016054),
and text search in PostgreSQL (https://www.postgresql.org/docs/9.5/static/textsearch.html).
However, it is not natively supported in Spark SQL. We propose an approach to implement this
full-text search in Spark SQL.
> Our proposed approach is detailed  at [https://github.com/JerryLead/Misc/blob/master/FullTextSearch/Full-text-issue-2018.pdf]
> and the prototype is available at [https://github.com/bigdata-iscas/SparkFullTextQuery/tree/like_explorer]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message