kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: No order by in kudu java api
Date Thu, 01 Sep 2016 16:13:35 GMT
On Thu, Sep 1, 2016 at 7:11 AM, Amit Adhau <amit.adhau@globant.com> wrote:

> Thanks Todd, we will be trying the same, hope that this should not affect
> the performance.
>
> We are using hash partition for our table. Can you please suggest, if
> there would be any other config flags that we should look into to improve
> the scan performance. In the past we had used some of the flags that you
> had suggested in your kudu insert performance blog and that helped us in
> kudu writes.
>

Are you using a single Java client to read large amounts of data? If so,
note that you're getting a single-threaded read, so you are most likely not
limited by the server side. What you could consider is using the ScanToken
API to retrieve a bunch of scan tokens for your query, and then feed them
into a thread pool, starting a new scanner for each token. That should give
you parallelism on the client side.

-Todd

> Thanks,
> Amit
>
> On Aug 31, 2016 10:36 PM, "Todd Lipcon" <todd@cloudera.com> wrote:
>
>> Hi Amit,
>>
>> That's correct, there is no "order by" support in the Java API, because
>> this is an arbitrarily complex operation. Imagine a table with a trillion
>> rows, and asking for "order by" from a Java client. It would have to either
>> download and sort the entire table on your client node (which is
>> infeasible) or would have to somehow ask the servers to perform a huge
>> shuffle and sort, which isn't something Kudu's designed to do.
>>
>> The recommendation is:
>> - if you're just needing to sort small sets of rows, then grab the whole
>> result set and use a normal Java-based sort (Collections.sort)
>> - if you're needing to sort a large number of rows, use something like
>> Impala or Spark SQL to perform the sort.
>>
>> -Todd
>>
>> On Wed, Aug 31, 2016 at 8:06 AM, Amit Adhau <amit.adhau@globant.com>
>> wrote:
>>
>>> Hi Kudu Team,
>>>
>>> Using Java Kudu API, we want to sort the data on kudu table based on
>>> table column, but we have not found any option in API for the same.
>>> Can you please help us on the same.
>>>
>>> --
>>> Thanks & Regards,
>>>
>>> *Amit Adhau* | Data Architect
>>>
>>> *GLOBANT* | IND:+91 9821518132
>>>
>>> [image: Facebook] <https://www.facebook.com/Globant>
>>>
>>> [image: Twitter] <http://www.twitter.com/globant>
>>>
>>> [image: Youtube] <http://www.youtube.com/Globant>
>>>
>>> [image: Linkedin] <http://www.linkedin.com/company/globant>
>>>
>>> [image: Pinterest] <http://pinterest.com/globant/>
>>>
>>> [image: Globant] <http://www.globant.com/>
>>>
>>> The information contained in this e-mail may be confidential. It has
>>> been sent for the sole use of the intended recipient(s). If the reader of
>>> this message is not an intended recipient, you are hereby notified that any
>>> unauthorized review, use, disclosure, dissemination, distribution or
>>> copying of this communication, or any of its contents,
>>> is strictly prohibited. If you have received it by mistake please let
>>> us know by e-mail immediately and delete it from your system. Many
>>> thanks.
>>>
>>>
>>>
>>> La información contenida en este mensaje puede ser confidencial. Ha sido
>>> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
>>> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
>>> notificado que cualquier lectura, uso, publicación, diseminación,
>>> distribución o copiado de esta comunicación o su contenido está
>>> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
>>> por error le agradeceremos notificarnos por e-mail inmediatamente y
>>> eliminarlo de su sistema. Muchas gracias.
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
> The information contained in this e-mail may be confidential. It has been
> sent for the sole use of the intended recipient(s). If the reader of this
> message is not an intended recipient, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution or
> copying of this communication, or any of its contents,
> is strictly prohibited. If you have received it by mistake please let us
> know by e-mail immediately and delete it from your system. Many thanks.
>
>
>
> La información contenida en este mensaje puede ser confidencial. Ha sido
> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
> notificado que cualquier lectura, uso, publicación, diseminación,
> distribución o copiado de esta comunicación o su contenido está
> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
> por error le agradeceremos notificarnos por e-mail inmediatamente y
> eliminarlo de su sistema. Muchas gracias.
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message