From java-user-return-64045-archive-asf-public=cust-asf.ponee.io@lucene.apache.org Tue Sep 25 10:44:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id B805C18061A for ; Tue, 25 Sep 2018 10:44:04 +0200 (CEST) Received: (qmail 55059 invoked by uid 500); 25 Sep 2018 08:44:03 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 55048 invoked by uid 99); 25 Sep 2018 08:44:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Sep 2018 08:44:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 3C77EC980D for ; Tue, 25 Sep 2018 08:44:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.7 X-Spam-Level: X-Spam-Status: No, score=0.7 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_MED=-2.3] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id ZHJJgw7dxMcj for ; Tue, 25 Sep 2018 08:44:00 +0000 (UTC) Received: from mailout5.zih.tu-dresden.de (mailout5.zih.tu-dresden.de [141.30.67.74]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 36E3E5F175 for ; Tue, 25 Sep 2018 08:44:00 +0000 (UTC) Received: from mail.zih.tu-dresden.de ([141.76.14.4]) by mailout5.zih.tu-dresden.de with esmtps (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.84_2) (envelope-from ) id 1g4iwk-00007a-Dw for java-user@lucene.apache.org; Tue, 25 Sep 2018 10:43:54 +0200 Received: from [92.116.202.107] (helo=pikosphere.fritz.box) by server-50.mailclusterdns.zih.tu-dresden.de with esmtpsa (TLSv1.2:AES256-SHA:256) (envelope-from ) id 1g4iwk-0005P9-7V for java-user@lucene.apache.org; Tue, 25 Sep 2018 10:43:54 +0200 Message-ID: <4404d283f2f23e27ab024e60025cf140e51e1886.camel@tu-dresden.de> Subject: RamDirectory vs MemoryIndex vs MMapDirectory for In-Memory-Index From: Matthias =?ISO-8859-1?Q?M=FCller?= To: java-user@lucene.apache.org Date: Tue, 25 Sep 2018 10:43:53 +0200 Content-Type: multipart/alternative; boundary="=-XhyArTm7CDQKR5yjeM9l" X-Mailer: Evolution 3.28.1-2 Mime-Version: 1.0 X-TUD-Original-From: matthias_mueller@tu-dresden.de X-TUD-Virus-Scanned: mailout5.zih.tu-dresden.de --=-XhyArTm7CDQKR5yjeM9l Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Hi, Lucene provides different storage options for in-memory indexes. I found three structures that would qualify for the task: * RamDirectory (which I currently use for prototyping, but wonder if it is the ideal choice for my task) * MemoryIndex, which claims to have better performance and resource use for small documents * MMapDirectory which should outperform RamDirectory for huge indices (what is "huge?") My plan is to periodically index some properties (string codes, longs, lat/lng points) of a larger database content with Lucene for quicker lookups (compared to slow SQL queries). What would be the most efficient (or intended) storage option for such an index in terms of lookup speed and CPU/memory use? Below [1] is a brief summary of the index contents and I hope these figures are sufficient to get a recommendation. But I am also happy to study more detailed documentation on the matter. - Matthias [1]: Summary of index contents and intended use * Total documents: 500.000 - 1.000.000, may grow to 10.000.000 records in mid future. * Document fields (all of them single value fields): * String (9x), usually 1-10 characters long, mostly recurring values (5% distinct) * LongPoint (4x), two fields contain mostly distinct values, one lostly recurring values (5-10% distinct), one field acts as a primary key * LatLonPoint (1x), 30% distinct * Refresh interval: 1..5 minutes (I currently create a fresh index instance on each update and discard the old one) * Most queries are range queries and exact matches on several properties, sometimes I need to retrieve the property fields of a single document based on a primary key value. --=-XhyArTm7CDQKR5yjeM9l--