Apache Spark is unable to find the native snappy library, even when the snappy and snappyjava port or package is installed. >>> input=sc.textFile('/some/folder/*.bz2') 15/09/28 22:58:25 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=278302556 15/09/28 22:58:25 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 265.3 MB) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/share/spark/python/pyspark/context.py", line 370, in textFile return RDD(self._jsc.textFile(name, minPartitions), self, File "/usr/local/share/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/usr/local/share/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. : org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=FreeBSD and os.arch=x86_64 at org.xerial.snappy.SnappyLoader.findNativeLibrary(SnappyLoader.java:299) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:163) at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:145) at org.xerial.snappy.Snappy.<clinit>(Snappy.java:47) at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:90) at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:83) at org.apache.spark.io.SnappyCompressionCodec.compressedOutputStream(CompressionCodec.scala:125) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:199) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:199) at scala.Option.map(Option.scala:145) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:199) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:101) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:980) at org.apache.spark.SparkContext.hadoopFile(SparkContext.scala:717) at org.apache.spark.SparkContext.textFile(SparkContext.scala:557) at org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Setting "spark.broadcast.default false" or "spark.io.compression.codec lzf" in /usr/local/share/spark/conf/spark-defaults.conf appears to allow the spark-shell to load successfully. (workaround copied from https://issues.apache.org/jira/browse/SPARK-3532)
Hello, can you please verify if it is still an issue with recently committed spark-1.5.1? Thanks!
(In reply to Dmitry Sivachenko from comment #1) Alas, same issue with Spark 1.5.1 from ports.
Should be fixed in 1.5.1 port, I'm surprised to hear it's not. On my system the output of this command includes the FreeBSD native library: jar tvf /usr/local/share/spark/lib/spark-assembly-1.5.1-hadoop2.7.1.jar | grep libsnappy 25832 Thu Nov 05 19:27:40 GMT 2015 org/xerial/snappy/native/FreeBSD/x86_64/libsnappyjava.so Do you not have that packaged?
(In reply to Mark Dixon from comment #3) No: I see stuff for AIX (ppc64), Linux (arm, armhf, ppc64, ppc64le, x86, x86_64), Mac (x86, x86_64), SunOS (spark, x86, x86_64). Nothing for FreeBSD. I'm using a VM for testing in this case: I did pkg install bash pkg install sudo (chsh for my user and root to bash, setup sudoers) then I cd into /usr/ports/devel/spark and ran (as root, because of the pkg requirements): BATCH=yes make make install I just tried make config-recursive, and while I saw snappy was checked by default it did not properly build. I'm going to try rolling back to my snapshot just prior to the make and run through all the config options to see if I missed anything.
Huh. I just tried to build 1.5.1 again in my VM and it kept failing on snappy-java. I tried installing the snappyjava package and then rebuilding spark and it worked fine. I'm going to try a fresh VM and see if I can replicate my issues.
I raised this bug on snappy-java, I guess this is what you're seeing: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204666
I updated the snappy bug with a patch, let's hope it gets merged.
The fix in 204666 was committed, can you please retest and report back ?
I did a testbuild on 11a, worked fine.
Seems to be working now! Thank you!