incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: New API Prototype
Date Sun, 07 Oct 2012 13:55:11 GMT
I have added a HDFS implementation of the server and as expeceted the
performance of the file system have dropped off.

The following numbers are from a server implementation using the local
file system, which in turn uses MMapDirectory under the covers.

The query was "*" which is a MatchAllDocuments query, followed by
fetching the entire result set.

{"totalResults":1000000}
Tuple count [286210] at [57241.12993482499/s] Attribute Count
[2862100] at [572411.2993482499/s]
Tuple count [606270] at [64011.38549069929/s] Attribute Count
[6062700] at [640113.854906993/s]
Tuple count [925900] at [63924.92606124217/s] Attribute Count
[9259000] at [639249.2606124217/s]
Tuple count [1000000] at [64905.374795692594/s] Attribute Count
[10000000] at [649053.747956926/s]

The following numbers are from a server implementation using the the
HDFS file system API (but only accessing the local file system, no
actual HDFS cluster instance was used).

The query was "*" which is a MatchAllDocuments query, followed by
fetching the entire result set.

{"totalResults":1000000}
Tuple count [61790] at [12357.340118037697/s] Attribute Count [617900]
at [123573.40118037697/s]
Tuple count [138720] at [15385.501509751082/s] Attribute Count
[1387200] at [153855.01509751083/s]
Tuple count [214840] at [15221.147556947826/s] Attribute Count
[2148400] at [152211.47556947827/s]
Tuple count [290650] at [15161.184328283138/s] Attribute Count
[2906500] at [151611.84328283137/s]
Tuple count [371090] at [16086.352757477636/s] Attribute Count
[3710900] at [160863.52757477635/s]
Tuple count [450490] at [15879.272729308997/s] Attribute Count
[4504900] at [158792.72729308996/s]
Tuple count [528940] at [15689.861929215023/s] Attribute Count
[5289400] at [156898.61929215022/s]
Tuple count [607220] at [15654.841541725911/s] Attribute Count
[6072200] at [156548.41541725912/s]
Tuple count [686760] at [15907.258721743568/s] Attribute Count
[6867600] at [159072.58721743568/s]
Tuple count [765430] at [15732.524289221672/s] Attribute Count
[7654300] at [157325.24289221672/s]
Tuple count [845510] at [16015.708514105043/s] Attribute Count
[8455100] at [160157.08514105043/s]
Tuple count [925930] at [16083.21513910121/s] Attribute Count
[9259300] at [160832.1513910121/s]
Tuple count [1000000] at [15705.643960027472/s] Attribute Count
[10000000] at [157056.43960027472/s]

You can see the dramatic drop off in performance that just adding the
HDFS file system API produced.  Next thing to do is to add the
directory caching (block cache) back into the implementation.

Also I am using Apache Hadoop version r1.0.3.

Aaron


On Thu, Oct 4, 2012 at 10:39 PM, Aaron McCurry <amccurry@gmail.com> wrote:
> I have created a new prototype API for Blur and it is vastly simpler.
> I have also created a small embedded server that implements this API.
> I have attached the Thrift definition file for quick evaluation.  If
> you have some time please take a look at the project and let me know
> what you think.  Thanks!
>
> Aaron
>
> git branch new-api-prototype
>
> Project src/blur-new-api-prototype
>
>
> ///////////// Thrift definition
> namespace java org.apache.blur.thrift.generated
>
> exception BlurException {
>   1:string message,
>   2:string stackTraceStr
> }
>
> enum TYPE {
>   STRING, BOOL, SHORT, INT, LONG, FLOAT, DOUBLE, BINARY
> }
>
> struct Attribute {
>   1:string name,
>   2:binary value,
>   3:TYPE type
> }
>
> struct Tuple {
>   1:list<Attribute> attributes
> }
>
> struct Session {
>   1:string sessionId
> }
>
> service BlurTuple {
>
>   Session openReadSession() throws (1:BlurException e)
>   void executeQuery(1:Session session, 2:string query) throws
> (1:BlurException e)
>   list<Tuple> nextMetaDataResults(1:Session session, 2:i32 batchSize)
> throws (1:BlurException e)
>   list<Tuple> nextResults(1:Session session, 2:i32 batchSize) throws
> (1:BlurException e)
>   void closeReadSession(1:Session session) throws (1:BlurException e)
>
>   Session openWriteSession() throws (1:BlurException e)
>   void writeTuples(1:Session session, 2:list<Tuple> tuples) throws
> (1:BlurException e)
>   void commitWriteSession(1:Session session) throws (1:BlurException e)
>   void rollbackWriteSession(1:Session session) throws (1:BlurException e)
>
> }

Mime
View raw message