lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject September 2009 Hadoop/Lucene/Solr/UIMA/katta/Mahout Get Together Berlin
Date Thu, 10 Sep 2009 11:53:03 GMT

I cross-post this here, Isabel Drost is managing the meetup. This time it is
more about Hadoop, but there is also a talk about the new Lucene 2.9 release
(presented by me). As far as I know, Simon Willnauer will also be there:

I would like to announce the September-2009 Hadoop Get Together in
newthinking store Berlin.

When: 29. September 2009 at 5:00pm
Where: newthinking store, Tucholskystr. 48, Berlin, Germany

As always there will be slots of 20min each for talks on your Hadoop topic.
After each talk there will be a lot time to discuss. You can order drinks
directly at the bar in the newthinking store. If you like, you can order
pizza. There are quite a few good restaurants nearby, so we can go there
after the official part.

Talks scheduled so far:
Thorsten Schuett, Solving Puzzles with MapReduce: MapReduce is most often
used for data mining and filtering large datasets. In this talk we will show
that it also useful for a completely different problem domain: solving
puzzles. Based on MapReduce, we can implement massively parallel
breadth-first and heuristic search. MapReduce will take care of the hard
problems, like parallelization, disk and error handling, while we can
concentrate on the puzzle. Throughout the talk we will use the sliding
puzzle ( as our example.

Thilo Götz, Text analytics on jaql: Jaql (JSON query language) is a query
language for Javascript Object Notation that runs on top of Apache Hadoop.
It was primarily designed for large scale analysis of semi-structured data.
I will give an introduction to jaql and describe our experiences using it
for text analytics tasks. Jaql is open source and available from

Uwe Schindler, Lucene 2.9 Developments: Numeric Search, Per-Segment- and
Near-Real-Time Search, new TokenStream API: Uwe Schindler presents some new
additions to Lucene 2.9. In the first half he will talk about fast numerical
and date range queries (NumericRangeQuery, formerly TrieRangeQuery) and
their usage in geospatial search applications like the Publishing Network
for Geoscientific & Environmental Data (PANGAEA). In the second half of his
talk, Uwe will highlight various improvements to the internal search
implementation for near-real-time search. Finally, he will present the new
TokenStream API, based on AttributeSource/Attributes that make indexing more
pluggable. Future
developments in the Flexible Indexing Area will make use of it. Uwe will
show a Tokenizer that uses custom attributes to index XML files into various
document fields based on XML element names as a possible use-case.

We would like to invite you, the visitor to also tell your Hadoop story, if
you like, you can bring slides - there will be a beamer.

A big Thanks goes to the newthinking store for providing a room in the
center of Berlin for us.

See the Upcoming page:

Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

View raw message