Relationship between JDMP and WEKA?

This sub-folder is intended for questions and comments about machine learning algorithms, data sets, parallel processing, etc.

Relationship between JDMP and WEKA?

Postby DrGary » Wed Mar 18, 2009 2:05 am

How do JDMP and Weka relate to each other? Does JDMP duplicate any WEKA functionality, or does it pass through its data structures to WEKA for the ML algorithms? I see a partial example in the forums that show WEKA being used to create a classifier. But the next step, 'use JDMP' to read a file, isn't documented. Does JDMP translate or copy from JDMP data structures to WEKA data structures? Or does WEKA use interfaces that JDMP supports?

Thanks!
DrGary
 
Posts: 1
Joined: Wed Mar 18, 2009 1:58 am

Re: Relationship between JDMP and WEKA?

Postby arndt » Wed Mar 18, 2009 11:45 pm

JDMP duplicates some of Weka's functionality. E.g. we have our own algorithms for a Naive Bayes Classifier and neural networks, and other algorithms will follow. However, our intention is not to rebuild the machine learning packages out there, but rather provide a framework to combine them efficiently.
Data structures are converted from JDMP into a format that another package such as Weka can handle. If possible, wrappers are used and data is not copied. This works well for Weka's algorithms but not for its data structures, since Weka does not provide a clear interface/class division for that.
Holger
arndt
Site Admin
 
Posts: 168
Joined: Mon Feb 02, 2009 7:02 pm
Location: Munich, Germany

Re: Relationship between JDMP and WEKA?

Postby jay » Sat Apr 25, 2009 6:44 pm

Hi,

We don't really need another Weka, RapidMiner, etc... The biggest gap most open source&commericial products and all but a select few (far too few) tools is scalability around memory requirements. From your descriptions of JDMP and the underlying matrix library it looks like this is the or one of the main gaps you see this software filling. That is excellent! WE don't need another Weka or another matrix library which is simply a substitute to existing offerings asides for a few select features here and there.

The scalability hole is a large one. I am very excited to see this project progress. Having the underlying matrix toolskit (also scalable) should aid in community development.

At present are the modeling methods in the package also capable of work with larger than memory datsets?

Best regards,

Jay
jay
 
Posts: 2
Joined: Sat Apr 25, 2009 5:11 pm

Re: Relationship between JDMP and WEKA?

Postby arndt » Mon Apr 27, 2009 10:44 pm

When the basis of a data set is a matrix, more than memory data sets can be handled easily through UJMP.

In addition to that, it is possible to use a list of samples which can be swapped to disk, e.g. using the EhCache plugin.

More options will follow soon...
Holger
arndt
Site Admin
 
Posts: 168
Joined: Mon Feb 02, 2009 7:02 pm
Location: Munich, Germany


Return to Java Data Mining Package

Who is online

Users browsing this forum: No registered users and 0 guests

cron