incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Apache Drill : lucene format plugin
Date Tue, 11 Aug 2015 12:42:28 GMT
On Mon, Aug 10, 2015 at 4:55 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Hi,
>
> I am writing a lucene format plugin for Apache Drill. Once this is
> in-place, we can extend it to run sql queries on top of blur tables from
> Apache Drill. Below are 2 problems I am trying to resolve. I posted it to
> the lucene dev list but did not receive any response. Trying it out here :)
>
> 1. Creating individual segment readers : Currently I am using the below
> code to create segment readers. I am trying to find out the minimum
> information that I need to serialize so that I can create a specific
> SegmentReader? I am trying to avoid serializing a lot of low-level state
> information or creating all segment readers every where
>
> segmentInfos =
> SegmentInfos.readLatestCommit(FSDirectory.open(Paths.get(selectionRoot)));
> segmentsFilename = segmentInfos.getSegmentsFileName();
> segmentReaders = new ArrayList<SegmentReader>();
> for (SegmentCommitInfo sci : segmentInfos.asList()) {
>   segmentReaders.add(new SegmentReader(sci, IOContext.READ));
> }
>

Here is the blur input format for reading index segments directly via map
reduce:

https://github.com/apache/incubator-blur/blob/master/blur-mapred/src/main/java/org/apache/blur/mapreduce/lib/BlurInputFormat.java

This is the method that you may be most interested in looking at:

https://github.com/apache/incubator-blur/blob/master/blur-mapred/src/main/java/org/apache/blur/mapreduce/lib/BlurInputFormat.java#L396

Checkout the BlurInputSplit for the low level bits that are needed to
reopen the data on the input side.


>
>
> 2. Serializing lucene query object : I translated the sql filter condition
> into a lucene query object(partly done). Now I am trying to serialize the
> lucene query object into a string and then de-serialize it back *without
> analyzing* the serialized string representation. Any pointers on this part?
>

Here's a mostly working package for converting lucene query objects to
writables for serialization.

https://github.com/apache/incubator-blur/tree/master/blur-query/src/main/java/org/apache/blur/lucene/serializer

Hope this helps!

Aaron


>
> Any help is greatly appreciated
>
> - Rahul
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message