hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wukang Lin <vboylin1...@gmail.com>
Subject Re: Makes search indexes
Date Tue, 03 Dec 2013 17:24:21 GMT
Hi James,
  it seems a problem of search for non-standardized documents, I think solr
(or some like this) may meet your requires.
  good luck.

2013/12/3 James Pettyjohn <jamesp@scientology.net>

> Hi, general strategy and schemata approach question.
> I've got a lot of different data in a relational db I'm trying to make
> searchable. One thing for example is searching for people by email
> address. I have 6 tables that might be, 10s of millions of records
> and none of it standardized. So it's mixed case and may have multiple
> emails in one field or something which isn't an email address at all.
> To do that as a one off isn't too bad but the data will be added to,
> and PKs will get phased out and split into multiple PKs etc. Also I
> want this on a number of other fields too that will need different
> transformations applied to the data and come from their own set of
> tables.
> I could do this a number of ways but I'm not satisfied with any of them
> and I don't think that such a generic proposition has no tools already
> somewhat suited for this task.
> The best tools for this may not be HBase but I'd like to
> put my HBase cluster to work on this and have it available to
> MR jobs.
> Best, James

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message