any23-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewismc <>
Subject [GitHub] any23 pull request: ANY23-247 FIX Attribute name itemscope associa...
Date Fri, 25 Mar 2016 22:24:23 GMT
Github user lewismc commented on the pull request:
    I agree. Jumping through this in the debugged made me think the same.
    I think it is different if Any23 is to be a PURE implementation... But that
    is clearly not the case. Any23 fits in best when it can be used to extract
    semantics from any old crap input that it is fed. Parsers and extractors
    *should not* fail when there is a piece of crap input HTML. Currently,
    that's exactly what happens and it is extremely limiting.
    I would like to propose that this PR is committed to master as is, we then
    open a brand new issue which acts exactly your comments refactoring out
    content extractor and reusing the input stream which has been fixed, etc.
    Any thoughts Peter? Thanks fr quick response.
    On Friday, March 25, 2016, Peter Ansell <> wrote:
    > The system does seem a little too complex for our purposes and isn't
    > usable because of that.
    > Removing generics would be the first step IMO as there are too many
    > rawtypes definitions which indicate generics are being used badly.
    > ContentExtractor may be able to be completely removed instead of being
    > refitted into the process after that and the parser should always be set to
    > parse as far as practical for our purposes.
    > It is a little strange that there isn't a buffered, markable, InputStream
    > provided for all of the steps to reuse as necessary rather than pushing a
    > raw InputStream or other source into different extractors.
    > —
    > You are receiving this because you authored the thread.
    > Reply to this email directly or view it on GitHub
    > <>

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message