accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-4744) Using RFile API with cache and multiple files hides data
Date Mon, 20 Nov 2017 21:12:00 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ASF GitHub Bot updated ACCUMULO-4744:
-------------------------------------
    Labels: pull-request-available  (was: )

> Using RFile API with cache and multiple files hides data
> --------------------------------------------------------
>
>                 Key: ACCUMULO-4744
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-4744
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.8.0, 1.8.1
>            Reporter: Keith Turner
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.8.2
>
>
> Noticed this bug in source code while working on ACCUMULO-4641.  When using the RFile
API introduced in 1.8 to read from multiple files with cache enabled, not all data may be
seen.  This happens because internally the code gives all input sources the same cache id.
 Therefore index and data blocks from multiple files collide in the cache.
> This bug does not happen when reading data through tserver, only the RFile API.
> {code:java}
>   Scanner scanner =
>        RFile.newScanner()
>            .from(file1, file2, file3)   //multiple input files
>            .withFileSystem(localFs)
>            .withIndexCache(1000000)   //enabled cache 
>            .withDataCache(10000000)  //enabled cache
>            .build();
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message