Bug 193706 - [new port] devel/apache-spark: high performance distributed computing system
Summary: [new port] devel/apache-spark: high performance distributed computing system
Status: Closed Feedback Timeout
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Martin Wilke
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-17 11:43 UTC by Radim Kolar
Modified: 2016-01-16 06:23 UTC (History)
3 users (show)

See Also:


Attachments
port shar (12.36 KB, text/plain)
2014-09-17 11:43 UTC, Radim Kolar
no flags Details
spark shar (12.43 KB, text/plain)
2014-09-18 16:39 UTC, Radim Kolar
no flags Details
spark port shar (12.48 KB, text/plain)
2014-09-25 18:03 UTC, Radim Kolar
no flags Details
my version of spark port (21.89 KB, text/plain)
2014-10-01 11:56 UTC, Dmitry Sivachenko
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Radim Kolar 2014-09-17 11:43:41 UTC
Created attachment 147395 [details]
port shar

Apache Spark is a fast and general engine for large-scale data processing. 

Spark runs programs up to 100x faster than Hadoop MapReduce in memory,
or 10x faster on disk.

Spark has an advanced DAG execution engine that supports cyclic data
flow and in-memory computing.

You can write applications quickly in Java, Scala or Python.

Spark powers a stack of high-level tools including Spark SQL, MLlib
for machine learning, GraphX, and Spark Streaming. You can combine these
frameworks seamlessly in the same application.

If you have a Hadoop 2 cluster, you can run Spark without any installation
needed. Otherwise, Spark is easy to run standalone or on EC2 or Mesos.
It can read from HDFS, HBase, Cassandra, and any Hadoop data source.

WWW: http://spark.apache.org/
Comment 1 Marcus von Appen freebsd_committer freebsd_triage 2014-09-17 16:45:02 UTC
If I see it correctly, the spark engine requires Java to be installed, but it only has a build dependency on maven. It would make sense to add the openjdk 1.7 as RUN_DEPENDS and BUILD_DEPENDS to have a (fully) functional port.
Comment 2 Radim Kolar 2014-09-18 16:39:34 UTC
Created attachment 147452 [details]
spark shar

added NO_ARCH=yes, added RUN_DEPENDS on JAVA 1.7.
Comment 3 Radim Kolar 2014-09-25 18:03:22 UTC
Created attachment 147677 [details]
spark port shar

i discovered that spark needs hadoop shared lib a runtime.
Comment 4 John Marino freebsd_committer freebsd_triage 2014-10-01 09:08:43 UTC
Can the port be renamed from spark to apache_spark?

I've been working on an unrelated spark port, lang/spark, see http://www.spark-2014.org/ , for a few months

I've hit technical snags which caused the delay, but in any case devel/spark would definitely be confused with lang/spark (as spark-2014 could also legitimately be put in devel category).

Let's avoid ambiguity because it occurs!
Comment 5 Dmitry Sivachenko freebsd_committer freebsd_triage 2014-10-01 11:55:19 UTC
Few comments on port:
1) I find 1-screen sized copyright in startup scripts redundant, no other ports include them.
2) JAVA_VENDOR, HAVA_VERSION variables in startup scripts are not used and not needed
3) Hardcoded "/usr/local/share/spark/sbin" in start_worker.in
4) Extra dependency on sbt which is not needed, there is a documented procedure of building spark with maven: https://spark.apache.org/docs/1.1.0/building-with-maven.html
5) hadoop is runtime dependency, so no need to list it as LIB_DEPENDS
6) Daemons do not require root privileges to run, so it is better to use separate pseudo-user to start them.
7) It is wise to pre-build maven dependencies and fetch them as tar-file, so build cluster does not download 250MB on each build.

I created the same port independently (did not noticed your submission), so I attach my work here for reference.  I don't really care whose version will be committed, just want the port to be in good shape before this happens.
Comment 6 Dmitry Sivachenko freebsd_committer freebsd_triage 2014-10-01 11:56:02 UTC
Created attachment 147883 [details]
my version of spark port
Comment 7 John Marino freebsd_committer freebsd_triage 2014-10-01 12:04:43 UTC
(In reply to Dmitry Sivachenko from comment #6)
> Created attachment 147883 [details]
> my version of spark port


I know the shar existed before - that said, the same request applies to set PKGNAMEPREFIX to "apache-".  

"Apache Spark" is even trademark, so there is precedent (http://spark.apache.org/)
Comment 8 Dmitry Sivachenko freebsd_committer freebsd_triage 2014-10-01 12:05:34 UTC
(In reply to John Marino from comment #7)
> (In reply to Dmitry Sivachenko from comment #6)
> > Created attachment 147883 [details]
> > my version of spark port
> 
> 
> I know the shar existed before - that said, the same request applies to set
> PKGNAMEPREFIX to "apache-".  
> 
> "Apache Spark" is even trademark, so there is precedent
> (http://spark.apache.org/)


No objection from my side.
Comment 9 Radim Kolar 2014-10-03 09:07:43 UTC
you are violated my copyright rights by removing copyright statements them from my rc.d scripts. 

Write your own
Comment 10 John Marino freebsd_committer freebsd_triage 2014-10-03 09:12:04 UTC
(In reply to Radim Kolar from comment #9)
> you are violated my copyright rights by removing copyright statements them
> from my rc.d scripts. 
> 
> Write your own


Radim, this is his quote, "I created the same port independently (did not noticed your submission), so I attach my work here for reference."

That means he didn't use your rc.d, he wrote his own.
I think an apology is in order.
Comment 11 Radim Kolar 2014-10-29 22:59:02 UTC
he is lying. Check his scripts and mine. 

He asked me by email for permission to remove my copyrights from my scripts and suggested some minor changes. I didn't gave him my permission, he removed it anyway.
Comment 12 John Marino freebsd_committer freebsd_triage 2014-10-31 17:16:41 UTC
It's like 25 lines long.

Is it even standard practice to add copyrights (and licenses) to RC scripts in ports?  This legal spat is going to ensure it's never committed.  Who wants to deal with this stuff?  I don't.

If you want credit, I'd think the permanent "# Created by:" line would be enough.
Comment 13 John Marino freebsd_committer freebsd_triage 2014-11-24 14:28:36 UTC
as long as the title is getting tweaked, might as well use the suggested port name.  PR is still stuck though...
Comment 14 John Marino freebsd_committer freebsd_triage 2015-01-26 07:29:20 UTC
I'm moving this out of triage.  Nobody has made any effort to resolve the tiff and nobody else wants to step in, thus a stalemate.