hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "edward yoon" <edw...@udanax.org>
Subject Re: Linear algebra in a JVM [was Re: Jython]
Date Thu, 07 Feb 2008 07:06:15 GMT
Actually, My most hadoop applications are made for numeric analysis.
Therefore, I was tried to make a generalized matrix in/out format.
https://issues.apache.org/jira/browse/HADOOP-2515
as a Map<row, Map<column, cell>> structure after review the code and
discuss with gary bradski.

But, If i make a new matrix file structure on Hadoop HDFS, i think it
could be some resemblancing going on Hbase. So, I think Hadoop + Hbase
is good fit with matrix management & operation.

"It (BigTable) presents the abstraction of a 2-dimensional
table of data cells, with different versions over time making
up a third dimension." -- Failure Trends in a Large Disk Drive Population, 2007

It mean that BigTable is used for analysis processing with arbitrary
set of elements by query, not a relational data processing.

>  I see http://wiki.apache.org/hadoop/Matrix

Thanks for your review.
I hope we talk together soon.

On 2/7/08, Grant Ingersoll <gsingers@apache.org> wrote:
> How do you think these various libraries fit into Hadoop?  Does it
> make sense to just build what we need using HBase?  I see http://wiki.apache.org/hadoop/Matrix
>  does some matrix things, but then it has a Groovy overlay, so it
> isn't quite what we want, I don't think.
>
> Perhaps, we should just think about, and push up to Hadoop if we can,
> our own set of Hadoop based matrix libraries.  Starting off, we need a
> decent way to create a matrix and populate it, then also basic matrix
> things like addition, multiplication, etc.  Then we can add other
> things as we need them?  For instance, I am interested in TextRank
> (search for Mihalcea and TextRank) and it essentially comes down to
> doing an iterative algorithm over a matrix.  I was thinking I might,
> as a way to get deeper into the latest Hadoop, use it as a sample,
> useful algorithm.  It's not specifically ML, but it does have
> interesting results and it is fairly easy to implement.
>
> Should we just lay out a page on the Wiki where we can start thinking
> about matrix needs?  Using other libraries is definitely an option,
> but I am not sure if they will be optimal in the Hadoop environment.
>
> -Grant
>
> On Feb 6, 2008, at 12:18 PM, Ted Dunning wrote:
>
> >
> > There are unfortunately many choices for linear algebra in JVM's, none
> > particularly satisfactory.
> >
> > Colt is the one I use.  It has a very odd syntax, but gives good
> > performance.  The structure is such that it is very hard to extend
> > to, say,
> > sparse matrices.  The licensing on Colt isn't particularly easy,
> > either and
> > I have been unable to contact the author to see about liberalizing it.
> >
> > Jama is now essentially defunct, but it had a very simple API and
> > not very
> > high performance.  Extending to additional matrix types is also not
> > feasible
> > due to the design exposing matrix internal structure as a double
> > indexed
> > matrix.  The licensing on Jama is very open.
> >
> > MTJ is high performance and has a less strange API than Colt, but I
> > haven't
> > used it so I can't say much about performance.  I get the impression
> > it
> > would be difficult to extend, but I could well be wrong about that.
> >
> > Commons math uses an extension of Jama, I think.  I haven't used
> > it.  The
> > last time I looked seriously at commons math, the committers had
> > some very
> > odd agendas going on so I dropped it from consideration.  It looks
> > like it
> > has come quite a ways since then, but I haven't dug into it deeply
> > since my
> > first evaluation.
> >
> >
> > On 2/6/08 12:45 AM, "Paul Elschot" <paul.elschot@xs4all.nl> wrote:
> >
> >> Op Wednesday 06 February 2008 05:23:31 schreef Markus Weimer:
> >>> Hi,
> >>> One of my contributions to Elefant is an adapter to the Java
> >>> Version of UIMA
> >>> which allows you to pipe Python strings through a UIMA annotation
> >>> engine and
> >>> get feature vectors to work with back. This was done using JPype: <
> >>> http://jpype.sourceforge.net/>, a tool which links the JVM to the
> >>> CPython
> >>> VM.
> >>>
> >>> I choose this non-obvious approach because we use native code Python
> >>> extensions for the matrix operations, an area where Java
> >>> regrettably lacks
> >>> behind big time compared to native code. So, Jython was out of the
> >>> question
> >>> as I don't know any way to access a CPython extension from Jython.
> >>> I found
> >>> JPype to do the job and to do it well (the overhead per Cross-VM
> >>> call was
> >>> around 1ms on my laptop). So for those craving for a state-of-the-
> >>> art Python
> >>> with decent extensions and access to Java code, this might be an
> >>> option.
> >>
> >> Well, one of my favourite Java libraries made it into the email
> >> address of
> >> this
> >> list, and I must say, I was hoping to get some good solutions to
> >> the problem
> >> of
> >> linear algebra in a JVM here. Has this problem been discussed
> >> beforehand?
> >>
> >> I have only used linear algebra packages well before there was Java,
> >> so I wonder how to go about it now.
> >>
> >> Regards,
> >> Paul Elschot
> >>
> >
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
> http://www.lucenebootcamp.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>


-- 
B. Regards,
Edward yoon @ NHN, corp.

Mime
View raw message