pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-5288) Improve performance of PigTextRawBytesComparator
Date Fri, 11 Aug 2017 05:27:02 GMT

     [ https://issues.apache.org/jira/browse/PIG-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rohini Palaniswamy updated PIG-5288:
------------------------------------
    Attachment: PIG-5288-1.patch

> Improve performance of PigTextRawBytesComparator
> ------------------------------------------------
>
>                 Key: PIG-5288
>                 URL: https://issues.apache.org/jira/browse/PIG-5288
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.18.0
>
>         Attachments: PIG-5288-1.patch
>
>
> Came across this stacktrace for a group by when investigating a different performance
issue.
> {code}
> "TezChild" #22 daemon prio=5 os_prio=0 tid=0x00007fa935495000 nid=0x7c3e runnable [0x00007fa91d354000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:412)
>         at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:579)
>         at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:802)
>         at org.apache.hadoop.io.Text.decode(Text.java:412)
>         at org.apache.hadoop.io.Text.decode(Text.java:389)
>         at org.apache.hadoop.io.Text.toString(Text.java:280)
>         at org.apache.pig.impl.io.NullableText.getValueAsPigType(NullableText.java:46)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigTextRawComparator.compare(PigTextRawComparator.java:95)
>         at org.apache.tez.runtime.library.common.ValuesIterator.readNextKey(ValuesIterator.java:188)
>         at org.apache.tez.runtime.library.common.ValuesIterator.access$300(ValuesIterator.java:47)
>         at org.apache.tez.runtime.library.common.ValuesIterator$1$1.next(ValuesIterator.java:143)
>         at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POShuffleTezLoad.getNextTuple(POShuffleTezLoad.java:218)
> {code}
> Conversion to String and comparing is a wastage (result of extending from PigTextRawBytesComparator
which is used in sorting). PigCharArrayWritableComparator which is the equivalent used in
mapreduce does not. It directly compares it as a Text.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message