flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek VerLee <derekver...@gmail.com>
Subject Enriching data from external source with cache
Date Fri, 29 Sep 2017 18:39:20 GMT

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  <body text="#000000" bgcolor="#FFFFFF">
    My basic problem will sound familiar I think, I need to enrich
    incoming data using a REST call to an external system for slowly
    evolving metadata. and some cache based lag is acceptable, so to
    reduce load on the external system and to process more efficiently,
    I would like to implement a cache.  The cache would by key, and I am
    already doing a keyBy for the same key in the job.<br>
    Please correct me if I'm wrong:<br>
    * Keyed State would be great to store my metadata "cache", Async I/O
    is ideal for pulling from the external system, <br>
    but AsyncFunction can not access keyed state ( "Exception: State is
    not supported in rich async functions.") and operators can not share
    state between them.<br>
    This leaves me wondering, since side inputs are not here yet, what
    the best (and perhaps most idiomatic) way to approach my problem?<br>
    I'd rather keep changes to existing systems minimal for this
    iteration and just minimize impact on them during peaks best I
    can... systemic refactoring and re-architecture will be coming soon
    (so I'm happy to hear thoughts on that as well).<br>
    Approaches considered:<br>
    1. AsyncFunction with a transient guava cache.  Not ideal ... but
    maybe good enough to get by<br>
    2. Using compound message types (oh, if only java had real algebraic
    data types...) and send cache miss messages from some
    CacheEnrichmentMapper (keyed) to some AsyncCacheLoader (not keyed)
    which then backfeeds cache updates to the former via iteration ... i
    don't know why this couldn't work but it feels like a hot mess
    unless there is some way I am not thinking of to do it cleanly<br>
    3. One user mentioned on a similar thread loading the data in as
    another DataStream and then using joins, but I'm confused about how
    this would work, it seems to me that joins happen on windows,
    windows pertain to (some notion of) time, what would be my notion of
    time for the slow (maybe years old in some cases) meta-data?<br>
    4. Forget about async I/O<br>
    5. implement my own "async i/o" in using a process function or
    similar  .. is this a valid pattern<br>

View raw message