Groovy for data science, a killer application for Groovy?

classic Classic list List threaded Threaded
28 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Groovy for data science, a killer application for Groovy?

dracodoc
I've been using Groovy for several years for my side projects and love Groovy. However Groovy is still not popular enough.

Recently I looked at the very hot Data Science/Big Data field, found there could be a great opportunity for Groovy. Python is currently very popular in that field, because it is easy to explore/prototype with python, process data. There are many good library support for Python now. I think Python do have some limitation, one is Python 3 compatibility, another is that people often need to move to other language to build larger scale systems after the prototyping phase.

I believe Groovy could be a much better language for Data Science compared to Python, because
1. Groovy have all the dynamic language features of Python or more. It could be interactive interpreted too.
2. Web support, visualization could be built with Grails base.
3. All the JVM languages could be integrated more easily, the transition to larger scale production system later will be much easier. If doing right, the transition could be minimized with static typing, performance tuning etc.

The only disadvantage is the library support, which is vital for ecosystem and language adoption. There are many Java libraries available though, so the library support problem is not very critical.

I think Groovy developers could look at the possibility to explore in this direction. The first thing I thought will be very useful is a notebook style environment Groovy interpreter. IPython is a great example which embed text, scripts, visualization, language integration. R studio have Shiny and other tools to support so called Reproducible Research. Groovy have great potential in this direction. There is a IPython based product Beaker also support multiple languages including Groovy.

Reply | Threaded
Open this post in threaded view
|

Re: Groovy for data science, a killer application for Groovy?

Mark Fortner-3

I couldn't agree more. You have libraries like POI that can extract data from spreadsheets, Apache Commons Math for a lot of the typical mathematical analyses and JavaFXs charting library or JFreeChart to render the results.  The Groovy Console is a passable tool for some basic scripting tasks and might make a decent starting point. I saw a demo several years ago of embedding executable groovy code in an Open Office presentation.

Anyway, the topic might make a good subject for some blog posts.

Mark

On Jan 4, 2015 9:18 AM, "dracodoc" <[hidden email]> wrote:
I've been using Groovy for several years for my side projects and love
Groovy. However Groovy is still not popular enough.

Recently I looked at the very hot Data Science/Big Data field, found there
could be a great opportunity for Groovy. Python is currently very popular in
that field, because it is easy to explore/prototype with python, process
data. There are many good library support for Python now. I think Python do
have some limitation, one is Python 3 compatibility, another is that people
often need to move to other language to build larger scale systems after the
prototyping phase.

I believe Groovy could be a much better language for Data Science compared
to Python, because
1. Groovy have all the dynamic language features of Python or more. It could
be interactive interpreted too.
2. Web support, visualization could be built with Grails base.
3. All the JVM languages could be integrated more easily, the transition to
larger scale production system later will be much easier. If doing right,
the transition could be minimized with static typing, performance tuning
etc.

The only disadvantage is the library support, which is vital for ecosystem
and language adoption. There are many Java libraries available though, so
the library support problem is not very critical.

I think Groovy developers could look at the possibility to explore in this
direction. The first thing I thought will be very useful is a notebook style
environment Groovy interpreter. IPython is a great example which embed text,
scripts, visualization, language integration. R studio have Shiny and other
tools to support so called Reproducible Research. Groovy have great
potential in this direction. There is a IPython based product Beaker also
support multiple languages including Groovy.





-----
http://dracodoc.blogspot.com/
--
View this message in context: http://groovy.329449.n5.nabble.com/Groovy-for-data-science-a-killer-application-for-Groovy-tp5722061.html
Sent from the groovy - user mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: Groovy for data science, a killer application for Groovy?

sbglasius
It could potentially also make a good topic for conference presentations. GR8Conf cfp is till open: http://cfp.gr8conf.org



Best regards / Med venlig hilsen,
Søren Berg Glasius

40 Stevenson Ave, Berkeley, CA 94708
Mobile: (+1)510 984 8362, Skype: sbglasius
--- Press ESC once to quit - twice to save the changes.

On 4 January 2015 at 14:40, Mark Fortner <[hidden email]> wrote:

I couldn't agree more. You have libraries like POI that can extract data from spreadsheets, Apache Commons Math for a lot of the typical mathematical analyses and JavaFXs charting library or JFreeChart to render the results.  The Groovy Console is a passable tool for some basic scripting tasks and might make a decent starting point. I saw a demo several years ago of embedding executable groovy code in an Open Office presentation.

Anyway, the topic might make a good subject for some blog posts.

Mark

On Jan 4, 2015 9:18 AM, "dracodoc" <[hidden email]> wrote:
I've been using Groovy for several years for my side projects and love
Groovy. However Groovy is still not popular enough.

Recently I looked at the very hot Data Science/Big Data field, found there
could be a great opportunity for Groovy. Python is currently very popular in
that field, because it is easy to explore/prototype with python, process
data. There are many good library support for Python now. I think Python do
have some limitation, one is Python 3 compatibility, another is that people
often need to move to other language to build larger scale systems after the
prototyping phase.

I believe Groovy could be a much better language for Data Science compared
to Python, because
1. Groovy have all the dynamic language features of Python or more. It could
be interactive interpreted too.
2. Web support, visualization could be built with Grails base.
3. All the JVM languages could be integrated more easily, the transition to
larger scale production system later will be much easier. If doing right,
the transition could be minimized with static typing, performance tuning
etc.

The only disadvantage is the library support, which is vital for ecosystem
and language adoption. There are many Java libraries available though, so
the library support problem is not very critical.

I think Groovy developers could look at the possibility to explore in this
direction. The first thing I thought will be very useful is a notebook style
environment Groovy interpreter. IPython is a great example which embed text,
scripts, visualization, language integration. R studio have Shiny and other
tools to support so called Reproducible Research. Groovy have great
potential in this direction. There is a IPython based product Beaker also
support multiple languages including Groovy.





-----
http://dracodoc.blogspot.com/
--
View this message in context: http://groovy.329449.n5.nabble.com/Groovy-for-data-science-a-killer-application-for-Groovy-tp5722061.html
Sent from the groovy - user mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Reply | Threaded
Open this post in threaded view
|

Re: Groovy for data science, a killer application for Groovy?

Jim White
In reply to this post by dracodoc
I agree that Groovy is a great language for scientific programming and have been using it for that purpose for years.  The reason I adopted Groovy back at 1.0 was to develop IFCX Wings, a literate scripting tool for any JVM-based language.  More recently I've been developing Gondor, a Groovy DSL for HTCondor DAGman workflows (compute intensive batch systems, machine learning for NLP in my case).  I also gave solutions in Groovy when I taught Computational Linguistics Fundamentals at the University of Washington as well as use in talks at user groups about my research.

Some others using Groovy for scientific work that I know of are Paolo Di Tommaso's Nextflow (http://www.nextflow.io/) which is used primarily in bioinformatics research but (like Gondor) is a general purpose tool for making batch processing on grids Groovy.  Stergios Papadimitriou is developing GroovyLab (https://code.google.com/p/jlabgroovy/wiki/SciLabInGroovyLab) and the Groovy web site has an example using JScience.  I'm sure there are more that I don't know about.

That said, I don't think there is an especially more or less compelling case for Groovy in science than in any other domain.  The challenges remain the same and I don't see anything that is going to shift its adoption rate in a big way.  For example, a lot of scientific computing is done in academic environments where Python is quite strong (and has displaced Java in many institutions) and the domain-specific language is often R or Matlab.

Having said *that*, I do agree that blogging and doing more presentations that feature Groovy at work in science and machine learning applications is the way to get the word out and raise Groovy's profile.  As I say, I've been doing that for my part (including starting a new blog: http://jimwhite.github.io/ which will be getting some Groovy content RSN) and may have won a few converts but no signs of any big waves building (yet).

Jim

On Sun, Jan 4, 2015 at 9:17 AM, dracodoc <[hidden email]> wrote:
I've been using Groovy for several years for my side projects and love
Groovy. However Groovy is still not popular enough.

Recently I looked at the very hot Data Science/Big Data field, found there
could be a great opportunity for Groovy. Python is currently very popular in
that field, because it is easy to explore/prototype with python, process
data. There are many good library support for Python now. I think Python do
have some limitation, one is Python 3 compatibility, another is that people
often need to move to other language to build larger scale systems after the
prototyping phase.

I believe Groovy could be a much better language for Data Science compared
to Python, because
1. Groovy have all the dynamic language features of Python or more. It could
be interactive interpreted too.
2. Web support, visualization could be built with Grails base.
3. All the JVM languages could be integrated more easily, the transition to
larger scale production system later will be much easier. If doing right,
the transition could be minimized with static typing, performance tuning
etc.

The only disadvantage is the library support, which is vital for ecosystem
and language adoption. There are many Java libraries available though, so
the library support problem is not very critical.

I think Groovy developers could look at the possibility to explore in this
direction. The first thing I thought will be very useful is a notebook style
environment Groovy interpreter. IPython is a great example which embed text,
scripts, visualization, language integration. R studio have Shiny and other
tools to support so called Reproducible Research. Groovy have great
potential in this direction. There is a IPython based product Beaker also
support multiple languages including Groovy.





-----
http://dracodoc.blogspot.com/
--
View this message in context: http://groovy.329449.n5.nabble.com/Groovy-for-data-science-a-killer-application-for-Groovy-tp5722061.html
Sent from the groovy - user mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email



Reply | Threaded
Open this post in threaded view
|

Re: Groovy for data science, a killer application for Groovy?

dracodoc
I love many features of Groovy as a language, which are not found in others. However it took much more than just language features to be successful, especially a good marketing pitch and position in good time. After some time, the library support and ecosystem will be the key factors.

I found Groovy is the best option if I only need core language features and some Java libraries. For scientific programming, my limited experience with Java libraries on numerical computing was not very smooth. I ported a medium sized matlab script to Java/Groovy, so I need to find Java equivalent of matrix computing, linear regression, chart etc. I found that:
1. Using Groovy SwingBuilder and Miglayout to write a medium complexity GUI is not hard, although there is very little documentation on this specific topic, I have to explore by myself and only succeeded after I had deeper understanding on how SwingBuidler worked. Though I guess this path will not be chosen by most newcomers because they may want a GUI designer and afraid of the learning curve of Miglayout.
2. The apache commons math library is not easy to use. V3 documentation is not complete and changed a lot from V2, I have to use V2 at last. The design of API often conflict with your intuition and change randomly in different places. I also need to search and use different libraries just to replicate one function in Matlab. At last I used several math libraries.
3. Jfreechart is very mature and have good documentation. However it is not actively updated, maybe it doesn't need major update.
4. I tried GroovyLab but it doesn't really solve my problems.

All these situations may look not as shining as the new options of python, Javascript web visualization etc.

That being said, what I meant is actually a little different from scientific programming.

There are this new Data Science/Big Data hype and classic numeric computing in science. They are pretty different in the tasks and requirements. However I think more and more scientists need some Data Science tasks now. They have much more experimental data to import, clean, process. They can use many advanced statistical methods and machine learning methods relatively easily thanks to all kinds of new libraries. There are numerous documentation and tutorial available. So Data Science is not just for internet companies, actually everybody can run some data analysis quickly with free tools and open data.

I believe Groovy could have a much better position in this trend.


Reply | Threaded
Open this post in threaded view
|

Re: Groovy for data science, a killer application for Groovy?

Jochen Theodorou
The problem in all this is knowing what is needed and liked.
... more inline:

Am 05.01.2015 15:34, schrieb dracodoc:
[...]
> 1. Using Groovy SwingBuilder and Miglayout to write a medium complexity GUI
> is not hard, although there is very little documentation on this specific
> topic, I have to explore by myself and only succeeded after I had deeper
> understanding on how SwingBuidler worked. Though I guess this path will not
> be chosen by most newcomers because they may want a GUI designer and afraid
> of the learning curve of Miglayout.

Ok, point (1), a data science application needs something to easily
build guis with... what does Python provide here?

> 2. The apache commons math library is not easy to use. V3 documentation is
> not complete and changed a lot from V2, I have to use V2 at last. The design
> of API often conflict with your intuition and change randomly in different
> places. I also need to search and use different libraries just to replicate
> one function in Matlab. At last I used several math libraries.

point (2) people coming from Malab would need some kind of API that is
quite similar to what Matlab provides. commons math seems not to be
enough... You say that is mostly because documentation is not so well..
is http://commons.apache.org/proper/commons-math/userguide/index.html 
not good?

> 3. Jfreechart is very mature and have good documentation. However it is not
> actively updated, maybe it doesn't need major update.

Haven't used it in the past either... when I needed charts of data it
was mostly done with gnuplot ;)

> 4. I tried GroovyLab but it doesn't really solve my problems.

here it would be good if you could tell us why.

[...]

> That being said, what I meant is actually a little different from scientific
> programming.
>
> There are this new Data Science/Big Data hype and classic numeric computing
> in science. They are pretty different in the tasks and requirements. However
> I think more and more scientists need some Data Science tasks now.They have
> much more experimental data to import, clean, process. They can use many
> advanced statistical methods and machine learning methods relatively easily
> thanks to all kinds of new libraries. There are numerous documentation and
> tutorial available. So Data Science is not just for internet companies,
> actually everybody can run some data analysis quickly with free tools and
> open data.
>
> I believe Groovy could have a much better position in this trend.

Yeah, I would like to improve Groovy here to make it a good alternative
to Python. Too bad nobody will pay me for that ;)

bye blackdrag

--
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: Groovy for data science, a killer application for Groovy?

dracodoc
Sorry if I didn't make myself clear. My original post is that Groovy could have a opportunity in the new Data Science/Big Data hype. This is actually quite different from scientific computing.

My second post talked about my experience in scientific computing with Groovy, which was to respond to earlier post about this. However my original point is not about classic scientific computing, you can actually discard my comments about the matlab program port experience, which is not very relevant to the original discussion.

I didn't talk too much about Data Science/Big Data in my original post because there are too many resources about them available. As I said before, it is used in all kinds of companies, and everybody including scientists can use some Data Science tools. I didn't intend to suggest Groovy to replace Python, which is impossible with existing libraries and ecosystems.

My ideas are simple: Python enjoyed lots of growth of interests from Data Science/Big Data, Groovy could use this opportunity too.

Actually I think Pivotal provides some Data Science service, so actually it's not impossible to find some support from the company. Of course all these are just guess of an outsider, it could well just be wishes.
Reply | Threaded
Open this post in threaded view
|

Re: Groovy for data science, a killer application for Groovy?

Russel Winder-3
In reply to this post by dracodoc
I hate to be a damper on this and all the other supportive posts but
wishing things to be true will achieve nothing. Data science is full of
PhD statistics folks who like R or perhaps
Python/SciPy/Matplotlib/Pandas. Trust me I run workshops for these folk.
Currently "data science" is an R, Python, Julia, place mostly because
the infrastructure is already there and everyone already uses R, Python,
Julia.

The core issue is that R, Python, Julia already have the infrastructure
for analysing and (more importantly) visualizing data and algorithms
over it. I am sure JVM and Groovy can do this, but it doesn't have the
systems these folk use today.

The core issue is really that the frameworks require seriously fast
computation frameworks and these are available now in Fortran and C++
but not in Java and the JVM is rubbish at making use of Fortran and C++
libraries.

For anecdotal evidence, the London PyData meeting this month has 200+
people turning up, there is no JVM-based equivalent even scheduled.

Data science is not big data. Big data isn't really doing anything
sophisticated. If JVM and Groovy is to make inroads on the data science
activity it needs to find a new milieu to bring to the community rather
than trying to compete with the extant situation.

I do not have an explicit suggestion even though I have "skin in the
game".

--
Russel.
=============================================================================
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:[hidden email]
41 Buckmaster Road    m: +44 7770 465 077   xmpp: [hidden email]
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Groovy for data science, a killer application for Groovy?

Jochen Theodorou
Am 06.01.2015 00:38, schrieb Russel Winder:
> I hate to be a damper on this and all the other supportive posts but
> wishing things to be true will achieve nothing. Data science is full of
> PhD statistics folks who like R or perhaps
> Python/SciPy/Matplotlib/Pandas. Trust me I run workshops for these folk.
> Currently "data science" is an R, Python, Julia, place mostly because
> the infrastructure is already there and everyone already uses R, Python,
> Julia.

That's why I was trying to figure out what is missing. If nothing is
offered nothing will happen. But if the keypoint here is Fortran and C++
computation frameworks, then things look indeed bad. Funny thing is that
I did talk with Cedric about native code binding just a few weeks ago
and that I was wondering if it is not possible to provide something
better than what the JVM has today... basically I came to similar things
that caused the creation of http://openjdk.java.net/jeps/191 quite a
while ago. But not sure that is enough.... I really liked back then the
integration of native code done by gcj... but even that I think is not
suitable here

Anyway, I wanted to see what people come up with before I try to spread
some realism ;)

[...]
> Data science is not big data. Big data isn't really doing anything
> sophisticated.

well... I guess this is a kind of dispute... data science is for example
not machine learning... but enough data scientists use machine learning.
Also data science is not statistics... but frankly, what would data
science be without it.. at the same time... what is big data without
data science? They might not be the same, but I see strong connections.

bye blackdrag

--
Jochen "blackdrag" Theodorou - Groovy Project Tech Lead
blog: http://blackdragsview.blogspot.com/
german groovy discussion newsgroup: de.comp.lang.misc
For Groovy programming sources visit http://groovy-lang.org


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply | Threaded
Open this post in threaded view
|

Re: Groovy for data science, a killer application for Groovy?

Dylan Cali
On Tue, Jan 6, 2015 at 1:57 AM, Jochen Theodorou <[hidden email]> wrote:
> Am 06.01.2015 00:38, schrieb Russel Winder:
> But if the keypoint here is Fortran and C++ computation
> frameworks, then things look indeed bad. Funny thing is that I did talk with
> Cedric about native code binding just a few weeks ago and that I was
> wondering if it is not possible to provide something better than what the
> JVM has today...

Is JNA not an option here? I've had good success using it to leverage
native libraries, and it is much, much easier to get going with than
JNI.

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


123