jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davide Giannella (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (OAK-6671) Enable support for custom types in ExternalSort
Date Fri, 29 Sep 2017 10:10:07 GMT

     [ https://issues.apache.org/jira/browse/OAK-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Davide Giannella closed OAK-6671.
---------------------------------

Bulk close for 1.7.8

> Enable support for custom types in ExternalSort
> -----------------------------------------------
>
>                 Key: OAK-6671
>                 URL: https://issues.apache.org/jira/browse/OAK-6671
>             Project: Jackrabbit Oak
>          Issue Type: Technical task
>          Components: commons
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>             Fix For: 1.8, 1.7.8
>
>         Attachments: OAK-6671-v1.patch
>
>
> ExternalSort currently sorts the file content as string. For some cases we need to sort
the content in custom way which is current facilitated via Comparator support. However in
this mode we need to deserialize the line in required format for enabling custom comparison
which adds overhead.
> For e.g. consider a file having following file
> {noformat}
> /apps|{"8":"dat:2016-07-01T15:14:37.241+05:30","71":["nam:rep:AccessControllable"],"9":"admin","0":"nam:sling:Folder"}
> /apps/assets|{"8":"dat:2016-07-01T15:37:38.598+05:30","9":"admin","0":"nam:nt:folder"}
> {noformat}
> This needs to be sorted on the basis of path and that too on per element basis. Currently
sorting a 50Gb file having 130M lines take 30 for a batch for 8M. Most of the time is spent
in extract the path structure. This can be avoided if ExternalSort support mapping line to
custom type and retain that type for the sorting phase
> This would add slight memory overhead for cases where this feature is used. For normal
case no overhead would be present.
> Would come up with a patch



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message