From user-return-401-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Sun Apr 26 23:46:20 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 46FBC18062B for ; Mon, 27 Apr 2020 01:46:20 +0200 (CEST) Received: (qmail 26337 invoked by uid 500); 26 Apr 2020 23:46:19 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 26327 invoked by uid 99); 26 Apr 2020 23:46:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Apr 2020 23:46:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id EA9C8C05F8 for ; Sun, 26 Apr 2020 23:46:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0 X-Spam-Level: X-Spam-Status: No, score=0 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id aLnEQ_WzlLAb for ; Sun, 26 Apr 2020 23:46:16 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::630; helo=mail-ej1-x630.google.com; envelope-from=emkornfield@gmail.com; receiver= Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 09F6D7DDD4 for ; Sun, 26 Apr 2020 23:46:15 +0000 (UTC) Received: by mail-ej1-x630.google.com with SMTP id k8so12634122ejv.3 for ; Sun, 26 Apr 2020 16:46:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to; bh=5OzK4aA43hgXjgY05ps3/fa4L5+scQMDRP3wS+OsLNM=; b=N5msL7w3zJaG+P8PfTiZLdLm5vcCHnweHhxpvn2v5tK0vaOReswyqOdPcauShnpDjx 75L1YFF0YXU+YguRQ7v3cMMb/zOY/pQHOVBrRWfIwOfdf9lSgZwLyyty5ozE9rWIiky1 XqEG6awQu+NYfnl4DWLJiShTCcWraNkjhUqVbEkW7TPOU8ryrNdAW3P8InsIwQteLa/9 pU/M0415Y63nEUq3bBZX+aTV9bzDsPxB51a1y+bZBEpA8213HC/cw4ouTeWL9qG7Yye7 Vg77TiqYUjGLpiMaN0sMcjTpt9sh22YHHAVOg60pzx3P65SoRr+o5KGMmMFVprtTJSv7 V7DQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to; bh=5OzK4aA43hgXjgY05ps3/fa4L5+scQMDRP3wS+OsLNM=; b=q+e7dKcaptgAlGHShn5C0/Xb7AfhEh3s31gwcAK2ICyv7RNsqeOSUzcCJLXqXaag0O ounDyPr7ufms8Ofe+pcK6ocqUsylrTWOFZ9f1i3QJ7aZhNYbmnbz8ufjeZJWG5uzXc5K pVlR2ISCqm5GhMvYArfbncW0eR2jRJAS4RjgJTcipk9XKjOGZ+yvq+6BNagM8HXLSsN8 aurQ0xLqOS51VL95vcA+fNSR9IN6DLuaW2sDpkNLjm5ltZMTgC4/+TUVJ6qX6vFVbMnb SwdMgjwd8uuvRtWRizwR98EMouKwuj54Xs5G2wcXxPBZksNnTfTpd/twOxqFT5csgKIU k9TQ== X-Gm-Message-State: AGi0PubzXNHehXt4hZrPbq2wo0Cz/CZLU+nBCNNnqefhTB/JRumYbf3e xwquGYavqNw/24xHKaAypRiMdyt2ZdlSwFbddQQHXxPnLZQ= X-Google-Smtp-Source: APiQypIEYwbzc5CRrFdp++t7gNdXXW07arIyQNopIuc+6e6glFcpDwhDWDBJbVp6CGtEUw9FvwG2DDi8AgQLKJ9woZU= X-Received: by 2002:a17:906:412:: with SMTP id d18mr16152177eja.54.1587944775170; Sun, 26 Apr 2020 16:46:15 -0700 (PDT) MIME-Version: 1.0 References: <5dfd5a11-d6c6-7f0f-02b0-ad5b08256dd9@crcdata.cz> In-Reply-To: <5dfd5a11-d6c6-7f0f-02b0-ad5b08256dd9@crcdata.cz> Reply-To: emkornfield@gmail.com From: Micah Kornfield Date: Sun, 26 Apr 2020 16:46:02 -0700 Message-ID: Subject: Re: Java: Sorting with QuickSort algorithm? To: user@arrow.apache.org Content-Type: multipart/alternative; boundary="0000000000003f3e7e05a43a34b1" --0000000000003f3e7e05a43a34b1 Content-Type: text/plain; charset="UTF-8" Hi Martin, Sorry for the late reply. For stable sort there is StableVectorComparator [1] that works with quicksort which should provide a stable sort when used with QuickSort (are you seeing a bug in that?). Are you suggesting implementing a different stable sort algorithm? Thanks, Micah https://github.com/apache/arrow/blob/f7fb49cfa19fe2d39dd54a426b1288d33342faf5/java/algorithm/src/main/java/org/apache/arrow/algorithm/sort/StableVectorComparator.java On Sun, Apr 12, 2020 at 1:01 AM Martin Janda wrote: > I found, that Arrow [Java] sorting uses QuickSort algorithm for vectors > > 15 items. QuickSort is unstable. It means that it doesn't provide > expected results. I think that stable sort algorithm is better. It keeps > order of indices for IndexSorter and preserves order for StructVectors. > > Stable sort algorithm produces same result when 1x sorted with > CombinedComparators and 2x sorted with split comparators. > > Martin > > --0000000000003f3e7e05a43a34b1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Martin,
Sorry for the late reply.=C2=A0 For stable = sort there is=C2=A0StableVectorComparator [1] that works with quicksort whi= ch should provide a stable sort when used with QuickSort (are you seeing a = bug in that?).=C2=A0 Are you suggesting implementing a different stable sor= t algorithm?

Thanks,
Micah

On S= un, Apr 12, 2020 at 1:01 AM Martin Janda <jandam@crcdata.cz> wrote:
I found, that Arrow [Java] sorting uses QuickSort a= lgorithm for vectors
=C2=A0> 15 items. QuickSort is unstable. It means that it doesn't pr= ovide
expected results. I think that stable sort algorithm is better. It keeps order of indices for IndexSorter and preserves order for StructVectors.

=C2=A0=C2=A0 Stable sort algorithm produces same result when 1x sorted with=
CombinedComparators and 2x sorted with split comparators.

=C2=A0=C2=A0 Martin

--0000000000003f3e7e05a43a34b1--