lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mirko Kämpf <mirko.kae...@gmail.com>
Subject Re: Using Lucene for File Name indexing / searching only
Date Fri, 12 Oct 2012 08:30:54 GMT
Hi Bernd,

I did something like you mentioned by using Solrj (
http://wiki.apache.org/solr/Solrj) and ajax-solr for the frontend (
https://github.com/evolvingweb/ajax-solr).
The plan of the project is, to have a SOLR instance which does the same
thing like the NameNode in HDFS but on a more user oriented file level, as
soon as files are more or less randomly distributed over multiple computers
we could use checksum to find duplicate files and interprete this as a kind
of instant backup - the idea is easy: to know what is already duplicated on
different places tells you, what files are not yet secure. Based on this
more efficient backup strategies with less bandwith needs can be created.

What is your project context? Is it same private activity or a business
related project? I created the code during the last two years in the
context of a small research project but I did not get the funding I needed,
so it was shiftet a little bit backwards.

Are you interested in combining forces? I would like to share my existing
code. The project name is IWD (intelligent WebDrive). It would tell you
details about all your files, in remote storage services (Google-Drive,
Dropbox, NFS Shares, but also offline files on USB drives or even CDs /
DVDs. And you can connect to to Mail-Boxes, FTP- WebDav servers etc. It is
comparable to Google-Desktop-Search but for an distributed environment, not
just one machine. It does scan your folders recursively and it sends the
data to the SOLR server. In is able to send just metadata or the full
content to be parsed on the solr side. A client side tika parser is also
included. The project is not yet published, as I was thinking the idea
would not be so common. But if you are interested, lets join on github.

Best wishes
Mirko

***********************************
*
*  Mirko Kämpf
*  +49 176 206 35199
*
*  *Trainer@*Cloudera Inc.
*  mirko@cloudera.com
*
*
*  *PhD Student@*Martin-Luther Universität Halle-Wittenberg
*  mirko.kaempf@physik.uni-halle.de
*
***********************************



2012/10/12 Lee <leegee@gmail.com>

> Hi Bernd
>
> If all you want to do is search files by name, Lucene is not the tool for
> you.
>
> Lucene is a very powerful document search library.
>
> You would have to code using Lucene to write your tool, it does not
> plug-and-play.
>
> It would be quicker and more efficient to write your own routine to list
> the directory and run a regular expression on the file names.
>
> Even if the directory listing is extremely large (which is a bad idea
> anyway), or needs to be watched, it will still be easier to write your own
> tool than use Lucene.
>
> I hope that helps
> Lee
>
>
>
> On 12/10/2012 09:32, Bernd Kappler wrote:
>
>> Hi Danil,
>>
> >
> > thank's for your quick response. Indeed: a library is what I am
> > looking for - since I want to make use of it from within a server
> > based java application. When I read the Lucene documentation, I
> > became confident that Lucene *can* do this job. My question is more:
> > is it the best library for this task? Since this uses only a subset
> > of the functionality Lucene was designed for, Lucene might not be
> > optimized for this. Do you know, if there are other java libraries
> > available that are desingned and optimized for searching files by
> > name? Or do you think that Lucene - while being optimized for
> > indexing file contents - will do also a great job on this specific
> > sub task?
> >
> > Thanks and regards
> >
> >
> > Bernd
> >
> > Am 10/12/2012 09:13 AM, schrieb Danil ŢORIN:
> >> Lucene is just a library.
> >>
> >> You will have to write the tool, and it's up to you what data you
> >> choose to index and how to query the index you created.. It can be
> >> filename, partial filename, prefix or even regexp on filename, it's
> >> all up to you...
> >>
> >>
> >> On Fri, Oct 12, 2012 at 10:00 AM, Bernd Kappler
> >> <bernd.kappler@genedata.com> wrote:
> >>> Hi,
> >>>
> >>> I am looking for a java based tool for creating an index of all
> >>> files in a specific direcoty that allows me to find the files by
> >>> name - similar to UNIXs locate tool.
> >>>
> >>> From the genereal documentation I have the impression that lucene
> >>> could do this. But I also noticed that the primary usage scenario
> >>> is indexing file contents - and it might just be an overkill to
> >>> use lucene for this sub-task.
> >>>
> >>> Based on your experience: is lucene the right tool for this or
> >>> would you recommend something different?
> >>>
> >>> Thanks and regards
> >>>
> >>>
> >>> Bernd
> >
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message