spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shagun Sodhani <sshagunsodh...@gmail.com>
Subject Re: Exception when using some aggregate operators
Date Wed, 28 Oct 2015 03:33:51 GMT
Yup avg works good. So we have alternate functions to use in place on the
functions pointed out earlier. But my point is that are those original
aggregate functions not supposed to be used or I am using them in the wrong
way or is it a bug as I asked in my first mail.

On Wed, Oct 28, 2015 at 3:20 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Have you tried using avg in place of mean ?
>
> (1 to 5).foreach { i => val df = (1 to 1000).map(j => (j,
> s"str$j")).toDF("a", "b").save(s"/tmp/partitioned/i=$i") }
>     sqlContext.sql("""
>     CREATE TEMPORARY TABLE partitionedParquet
>     USING org.apache.spark.sql.parquet
>     OPTIONS (
>       path '/tmp/partitioned'
>     )""")
> sqlContext.sql("""select avg(a) from partitionedParquet""").show()
>
> Cheers
>
> On Tue, Oct 27, 2015 at 10:12 AM, Shagun Sodhani <sshagunsodhani@gmail.com
> > wrote:
>
>> So I tried @Reynold's suggestion. I could get countDistinct and
>> sumDistinct running but  mean and approxCountDistinct do not work. (I
>> guess I am using the wrong syntax for approxCountDistinct) For mean, I
>> think the registry entry is missing. Can someone clarify that as well?
>>
>> On Tue, Oct 27, 2015 at 8:02 PM, Shagun Sodhani <sshagunsodhani@gmail.com
>> > wrote:
>>
>>> Will try in a while when I get back. I assume this applies to all
>>> functions other than mean. Also countDistinct is defined along with all
>>> other SQL functions. So I don't get "distinct is not part of function name"
>>> part.
>>> On 27 Oct 2015 19:58, "Reynold Xin" <rxin@databricks.com> wrote:
>>>
>>>> Try
>>>>
>>>> count(distinct columnane)
>>>>
>>>> In SQL distinct is not part of the function name.
>>>>
>>>> On Tuesday, October 27, 2015, Shagun Sodhani <sshagunsodhani@gmail.com>
>>>> wrote:
>>>>
>>>>> Oops seems I made a mistake. The error message is : Exception in
>>>>> thread "main" org.apache.spark.sql.AnalysisException: undefined function
>>>>> countDistinct
>>>>> On 27 Oct 2015 15:49, "Shagun Sodhani" <sshagunsodhani@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi! I was trying out some aggregate  functions in SparkSql and I
>>>>>> noticed that certain aggregate operators are not working. This includes:
>>>>>>
>>>>>> approxCountDistinct
>>>>>> countDistinct
>>>>>> mean
>>>>>> sumDistinct
>>>>>>
>>>>>> For example using countDistinct results in an error saying
>>>>>> *Exception in thread "main" org.apache.spark.sql.AnalysisException:
>>>>>> undefined function cosh;*
>>>>>>
>>>>>> I had a similar issue with cosh operator
>>>>>> <http://apache-spark-developers-list.1001551.n3.nabble.com/Exception-when-using-cosh-td14724.html>
>>>>>> as well some time back and it turned out that it was not registered
in the
>>>>>> registry:
>>>>>> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
>>>>>>
>>>>>>
>>>>>> *I* *think it is the same issue again and would be glad to send over
>>>>>> a PR if someone can confirm if this is an actual bug and not some
mistake
>>>>>> on my part.*
>>>>>>
>>>>>>
>>>>>> Query I am using: SELECT countDistinct(`age`) as `data` FROM `table`
>>>>>> Spark Version: 10.4
>>>>>> SparkSql Version: 1.5.1
>>>>>>
>>>>>> I am using the standard example of (name, age) schema (though I am
>>>>>> setting age as Double and not Int as I am trying out maths functions).
>>>>>>
>>>>>> The entire error stack can be found here
>>>>>> <http://pastebin.com/G6YzQXnn>.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>
>

Mime
View raw message