beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-5628) Several VcfIO tests fail in Python 3 with TypeError: cannot use a string pattern on a bytes-like object
Date Fri, 12 Oct 2018 14:33:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-5628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16647969#comment-16647969
] 

Simon commented on BEAM-5628:
-----------------------------

This error can be traced back to the _create_generator function (io/vcfio.py: line 318), where
it is mentioned that PyVCF has explicit str() calls when parsing INFO fields, which fails with
UTF-8 decoded strings. For this reason, the line is encoded back to UTF-8 in the python2 version. 

Because removing the encoding step results in hanging of some tests, there is a chance this
relates to 5623.

Does anyone have additional insights?

> Several VcfIO tests fail in Python 3 with  TypeError: cannot use a string pattern on
a bytes-like object
> --------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-5628
>                 URL: https://issues.apache.org/jira/browse/BEAM-5628
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Assignee: Simon
>            Priority: Major
>
> ERROR: test_read_after_splitting (apache_beam.io.vcfio_test.VcfSourceTest)
> "
>  ----------------------------------------------------------------------
> Traceback (most recent call last):
>    File ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio_test.py"",
line 336, in test_read_after_splitting
> ]     split_records.extend(source_test_utils.read_from_source(*source_info))
> ]   File ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/source_test_utils.py"",
line 101, in read_from_source
>      for value in reader:
>    File ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
line 264, in read_records
>      for line in record_iterator:
>    File ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/apache_beam/io/vcfio.py"",
line 330, in __next__
>      record = next(self._vcf_reader)
>    File ""/usr/local/google/home/valentyn/projects/beam/clean_head/beam/sdks/python/target/.tox/py3/lib/python3.5/site-packages/vcf/parser.py"",
line 543, in __next__
>      row = self._row_pattern.split(line.rstrip())
>  TypeError: cannot use a string pattern on a bytes-like object
> "



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message