hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hornn <...@git.apache.org>
Subject [GitHub] incubator-hawq pull request: HAWQ-178: Add JSON plugin support in ...
Date Mon, 01 Feb 2016 18:16:34 GMT
Github user hornn commented on the pull request:

    https://github.com/apache/incubator-hawq/pull/302#issuecomment-178107477
  
    It sounds very similar to CSV with quoted data, which is not splittable. The way we do
it today is by ensuring we process each file by a single accessor, even if it actually consists
of multiple splits. (see HdfsTextMulti profile and [QuotedLineBreakAccessor](https://github.com/apache/incubator-hawq/blob/master/pxf/pxf-hdfs/src/main/java/org/apache/hawq/pxf/plugins/hdfs/QuotedLineBreakAccessor.java)).
The problem, of course, is that we lose parallelism and performance.
    @tzolov, if it requires too much re-writing I agree that we can make it in two stages
- first the splittable case (one record per line), and then the more complex cases.
    @adamjshook, good to hear from you :


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message