Created attachment 219115 [details] py-tensorflow 1.15.4 svn diff - Fix and update science/py-tensorflow and science/py-tensorflow-estimator Tensorflow is marked broken and is at 1.14. This is my initial attempt at updating it to 1.15.4 so that I could use GPT-2 on FreeBSD. I'm no expert on porting or tensorflow so I'm sure I made some mistakes. This also re-adds science/py-tensorflow-estimator which was marked broken as well. I didn't have any problems building it so it should be the same as before. This patch sets --jobs to 1 to try to guarantee that it will build. There was one kernel that required a huge amount of memory to compile which was why only one core could be used while building. For everything else I could set it to --jobs 5+ and it would run fine. If you build this yourself try building with as many jobs as you can want it gets to that one file. The only portlint errors I saw were complaints about not using make makepatch, which is really weird because that is what I used. I fixed all other warnings. I've launched some testport poudriere stuff but it is going to take a long time for that to finish, so I'm going ahead and posting the bug to get feedback. Changes made: - Using the host jsoncpp was causing compatibility issues, so I think I have it marked to grab it from git instead. - Changed the do-install to copy all of the correct folders to go in /usr/local/lib/python* - Changed lots of patch files, mostly to add -lexecinfo - Lots of places complained about errors similar to the following: this rule is missing dependency declarations for the following files ...: 'tensorflow/contrib/makefile/downloads/absl/absl/strings/string_view.h' 'tensorflow/contrib/makefile/downloads/absl/absl/types/optional.h' - I had to add the following lines in a lot of bazel deps in the patches to fix this: "@com_google_absl//absl/strings", "@com_google_absl//absl/base:core_headers", I've been using the version I built with this port to run some GPT-2 text generation stuff for a few days and haven't had any problems at all. Hopefully this works for others and helps the official port get fixed.
Thanks for the patch, Austin.
(In reply to Austin Shafer from comment #0) Austin, I am trying to build TensorFlow with the patch but it fails to build in poudriere: > WARNING: Download from https://github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException Unknown host: github.com > ERROR: An error occurred during the fetch of repository 'io_bazel_rules_closure': > java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz, https://github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz] to /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/io_bazel_rules_closure/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz: Unknown host: github.com > ERROR: no such package '@io_bazel_rules_closure//closure': java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz, https://github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz] to /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/io_bazel_rules_closure/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz: Unknown host: github.com > ERROR: no such package '@io_bazel_rules_closure//closure': java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz, https://github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz] to /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/io_bazel_rules_closure/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz: Unknown host: github.com This is because downloads aren't allowed during buiuld. The local build also fails: > ERROR: /usr/ports/science/py-tensorflow/work-py37/bazel_out/2acfe593813f4c06cfb8cb015b65a7a2/external/jsoncpp_git/BUILD.bazel:5:1: C++ compilation of rule '@jsoncpp_git//:jsoncpp' failed (Exit 1) > In file included from external/jsoncpp_git/src/lib_json/json_value.cpp:7: > In file included from /usr/local/include/json/assertions.h:13: > /usr/local/include/json/config.h:125:9: warning: 'JSON_HAS_INT64' macro redefined [-Wmacro-redefined] > #define JSON_HAS_INT64 > ^ > <command line>:5:9: note: previous definition is here > #define JSON_HAS_INT64 1 > ^ > external/jsoncpp_git/src/lib_json/json_value.cpp:1161:13: error: out-of-line definition of 'insert' does not match any declaration in 'Json::Value' > bool Value::insert(ArrayIndex index, Value newValue) { > ^~~~~~ Yuri
Created attachment 219800 [details] tensorflow svn diff v2 This patch should correct the downloading issues. Poudriere built about half of it, but then ran into the following: SUBCOMMAND: # @swig//:lnswiglink [action 'Executing genrule @swig//:lnswiglink [for host]'] (cd /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/execroot/org_tensorflow && \ exec env - \ PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/nonexistent/bin \ /usr/local/bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; ln -s $(which swig3.0) bazel-out/host/bin/external/swig/s wiglink') ERROR: /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/swig/BUILD.bazel:13:1: declared o utput 'external/swig/swiglink' was not created by genrule. This is probably because the genrule actually didn't create this output, or because the output was a directory and the genrule was run remotely (note that only the contents of declared file outputs are copied from genrules run remotely) ERROR: /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/swig/BUILD.bazel:13:1: not all outputs were created or valid I'm building again outside of poudriere and so far its fine, so not sure what's going on here. Regarding the jsoncpp problem, I saw that when I was first fixing the port, and changed it to use jsoncpp from git instead of the one installed from pkg. (I removed the jsoncpp dep in this v2 patch). Originally I had to create yet another older jsoncpp port at version 1.9.2 to be used as the dependency instead of building jsoncpp_git locally. All in all, tensorflow's build system is absolutely horrible and not fun to work on. Although the v2 patch still isn't perfect in poudriere, I'm posting it just so anyone else who tries will get farther than having those download problems.
(In reply to Austin Shafer from comment #3) > All in all, tensorflow's build system is absolutely horrible and not fun to work on. I can't agree more. And bazel is the core of the problem.
BTW, any news on this? There is a port for PhotoPrism in the works, which right now compiles TensorFlow directly, and it would be nice if it could use this port (once updated). =) https://github.com/huo-ju/photoprism-freebsd-port/
(In reply to Lapo Luchini from comment #5) I'm working on other things now and don't have plans to return atm. Did this patch work for you? Your photoprism port looks really cool, does it reliably build tensorflow? If so, it's in better shape than my patch. It would be interesting to compare the two, assuming it works maybe your tensorflow build could be incorporated into or replace this.
(In reply to Lapo Luchini from comment #5) In case it builds including TensorFlow it would make sense to commit it and then see if TensorFlow port can be build using some hints from it. I will do that once I have time. Yuri
It does compile (and work), but it's not yet a proper Port, as it asks interactive questions and downloads dependencies at build time. ("bazel being bazel", I guess?)
(In reply to Lapo Luchini from comment #8) See https://github.com/huo-ju/photoprism-freebsd-port/issues/13#issuecomment-855335447
Just finished Uni for the year so going to work on updating the tensorflow port to the latest version. Hopefully should only take a couple of days once my system has finish rebuilding everything. Had some hiccups with ryzen and freebsd causing issues.
I've made a successful port of 1.15.5. However there are some issues that need to be addressed. Due to it being the last of the 1.x series it depends on some versions of packages that are no longer available in the port tree, and without these versions it will error due to API changes. The packages are: google-cloud-cpp == 1.17.0 grcp = 1.22.0,1 Will need to address this.
Created attachment 226298 [details] Upgrade to 1.15.5 and fix poudriere building A patch for the work so far. Should build in poudriere, only possible error maybe from GRPC, which is a simple fix once a suitable course of action has been decided. Patch to work with latest GRPC or add a grpc120 port.
Created attachment 226299 [details] Fix GRPC build issues This fixes everything now and builds with latest grpc. Not sure what to do with my patch to create the port devel/google-cloud-cpp117 which is needed for building successfully
(In reply to Anthony Donnelly from comment #13) > Not sure what to do with my patch to create the port devel/google-cloud-cpp117 which is needed for building successfully Can you attach it here?
I opened up a new bug report for the open-cloud-cpp, as to not pollute this bug report. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=257053
patch fails: ===> Patching for py38-tensorflow-1.14.0_18 ===> Applying FreeBSD patches for py38-tensorflow-1.14.0_18 from /disk-samsung/freebsd-ports/science/py-tensorflow/files 1 out of 2 hunks failed--saving rejects to WORKSPACE.rej ===> FAILED Applying FreeBSD patch-WORKSPACE ===> FAILED to apply cleanly FreeBSD patch(es) patch-WORKSPACE *** Error code 1
Could you try with the tensorflow from my git, I messed up the diff file I created, and it had duplicate patches in it.
(In reply to Anthony Donnelly from comment #17) Could you just attach a shar instead?
Created attachment 226310 [details] shar file Hopefully this will do.
Committed, thanks!
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=3dccfaa0cdd58e4ba5cde9d68ddf1351b55db1ef commit 3dccfaa0cdd58e4ba5cde9d68ddf1351b55db1ef Author: Anthony Donnelly <amzo1337@gmail.com> AuthorDate: 2021-07-09 00:27:30 +0000 Commit: Yuri Victorovich <yuri@FreeBSD.org> CommitDate: 2021-07-09 00:31:37 +0000 science/py-tensorflow: Update 1.14.0 -> 1.15.5 PR: 250646 science/py-tensorflow/Makefile | 40 ++++++++------ science/py-tensorflow/Makefile.MASTER_SITES | 59 +++++++++++---------- science/py-tensorflow/distinfo | 56 +++++++++++--------- science/py-tensorflow/files/bazelrc | 2 + science/py-tensorflow/files/patch-WORKSPACE (new) | 61 ++++++++++++++++++++++ ...patch-tensorflow_compiler_mlir_lite_BUILD (new) | 10 ++++ ...low_compiler_mlir_lite_quantization_BUILD (new) | 8 +++ ...tensorflow_compiler_mlir_tensorflow_BUILD (new) | 10 ++++ .../patch-tensorflow_contrib_bigtable_BUILD (new) | 20 +++++++ ... patch-tensorflow_contrib_boosted__trees_BUILD} | 8 +-- ...h-tensorflow_contrib_ffmpeg_default_BUILD (new) | 10 ++++ ...e_kernels_client_ignite__plain__client__unix.cc | 6 +-- ... => patch-tensorflow_contrib_makefile_Makefile} | 16 +++--- .../files/patch-tensorflow_core_BUILD | 14 ++--- ...rflow_core_distributed__runtime_rpc_BUILD (new) | 10 ++++ ...ributed__runtime_rpc_grpc__server__lib.cc (new) | 15 ++++++ ...atch-tensorflow_core_platform_cloud_BUILD (new) | 10 ++++ ...nsorflow_core_platform_cloud_gcs__dns__cache.cc | 6 +-- ..._core_platform_default_build__config.bzl (gone) | 13 ----- ...ch-tensorflow_core_platform_posix_env.cc (gone) | 31 ----------- ...h-tensorflow_core_profiler_internal_BUILD (new) | 10 ++++ ...tensorflow_core_profiler_rpc_client_BUILD (new) | 19 +++++++ ...atch-tensorflow_core_protobuf_autotuning.proto} | 6 +-- ..._micro_tools_make_targets_freebsd__makefile.inc | 4 +- ...ch-tensorflow_lite_kernels_internal_BUILD (new) | 11 ++++ ...h-tensorflow_lite_kernels_internal_build (gone) | 11 ---- ...ow_lite_python_interpreter__wrapper_BUILD (new) | 10 ++++ .../files/patch-tensorflow_lite_tools_BUILD (new) | 10 ++++ ...e => patch-tensorflow_lite_tools_make_Makefile} | 6 +-- ...w_lite_tools_make_targets_freebsd__makefile.inc | 4 +- ...atch-tensorflow_lite_tools_optimize_BUILD (new) | 20 +++++++ ...low_lite_tools_optimize_calibration_BUILD (new) | 34 ++++++++++++ ...nsorflow_python_eager_pywrap__tfe__src.cc (new) | 20 +++++++ ...ch-tensorflow_python_lib_core_bfloat16.cc (new) | 11 ++++ ...ython_lib_core_ndarray__tensor__bridge.cc (new) | 11 ++++ ...tream__executor_stream__executor__pimpl.h (new) | 10 ++++ .../files/patch-tensorflow_tensorflow.bzl | 22 ++++---- ...d => patch-tensorflow_tools_lib__package_BUILD} | 8 +-- ...atch-tensorflow_tools_pip__package_build (gone) | 10 ---- .../files/patch-tensorflow_workspace.bzl | 20 +++---- ...ld.bazel => patch-third__party_aws_BUILD.bazel} | 8 +-- ...atch-third__party_com__google__absl.BUILD (new) | 13 +++++ ...tch-third__party_flatbuffers_BUILD.system (new) | 18 +++++++ .../files/patch-third__party_mlir_BUILD (new) | 10 ++++ ...tch-third__party_systemlibs_enum34.build (gone) | 17 ------ ...third__party_systemlibs_functools32.BUILD (new) | 18 +++++++ .../patch-third__party_systemlibs_grpc.BUILD (new) | 11 ++++ ...patch-third__party_systemlibs_grpc.build (gone) | 11 ---- ...=> patch-third__party_systemlibs_jsoncpp.BUILD} | 16 +++--- .../patch-third__party_systemlibs_protobuf.bzl | 6 +-- ...patch-third__party_systemlibs_swig.build (gone) | 11 ---- ...-third__party_systemlibs_syslibs__configure.bzl | 12 ++--- ...atch-third_party_gpus_rocm_configure.bzl (gone) | 11 ---- 53 files changed, 558 insertions(+), 266 deletions(-)
Created attachment 226324 [details] New option to set jobs, will default to 1 or hw.ncpu I have attached a patch here which will stop bazel automatically using all cores and eating up the entire system resources. This is why it was marked as broken before as poudriere when building other ports would use all the jails resources and the build would be killed.
(In reply to Anthony Donnelly from comment #22) Anthony, with this patch Tensorflow took 20.5 hours to build on an otherwise idle machine. Before the patch it took <5 hours on the same machine. So the patch slows it down more than just limiting to a given number of CPUs.
Yeah, but without the patch, bazel will automatically utilize the number of CPUS on the system even if the build system is trying to force make -j1. So when the freebsd package build machine is trying to build tensorflow with x amount of packages also. tensorflow with bazel will try and consume all the CPU and memory resources, which eventually will cause it to be killed. The options will allow people to set the -j flag whenever they want to, but will default to -j1. It was just a safety measure, to force -j1 by default, but give people the option to select to parrallel build with make config.
Anthony, 'import tensoflow' fails when the py38-tensorflow-1.15.5_1 package is installed. No 'tensofflow' module is installed. > Traceback (most recent call last): > File "test.py", line 2, in <module> > import tensorflow as tf > ModuleNotFoundError: No module named 'tensorflow' Yuri
Created attachment 226491 [details] Fix installation of the pip package This will fix the above issue. I didn't submit the fix yet as I have an open issue on the tensorflow git regarding the incomplete packaging of tensorflow on FreeBSD. There are still parts of the tensorflow package which are failing to be bundle, such as keras, etc. https://github.com/tensorflow/tensorflow/issues/50766 Though. I am unsure if getting the package to work is worth it. The highest tensorflow able to build currently that I have tested is 2.2.0. Anything elses requires creating a new toolchain due to the way bazel fetches them remotely which isn't something that is feasible for one person to maintain and create.
Created attachment 226539 [details] Upgrade and fix install Could you give this new port a try. It upgrades tensorflow to 2.1.0 for the v2 api. It also installs tensorflow library and headers for c and cpp, as well as the .pc files and fixes the import tensorflow.
(In reply to Anthony Donnelly from comment #27) Ok, I will try it, thank you.
(In reply to Anthony Donnelly from comment #27) The build failed: > ERROR: /wrkdirs/usr/ports/science/py-tensorflow/work-py38/tensorflow-2.1.0/tensorflow/core/kernels/BUILD:4321:1: C++ compilation of rule '//tensorflow/core/kernels:conv_ops' failed (Killed): clang failed: error executing command
That's bazel being killed by the system for eating all the memory. I mentioned this issue above. Could force jobs to 1.
(In reply to Anthony Donnelly from comment #30) Does it start over ${MAKE_JOBS_NUMBER} by default?
It defaults to the number of CPUs by default. In the above shar. I left in --jobs 10 by accident, as I used 16 cores and 32gb memory and a 10GB swp file for building and bazel will still consume all that on my system. Not sure if bazel will respect the environment variables. Could just add --jobs=1 or use the above patch I submitted where it defaults to 1, but adds an option to parallel build based on the number of cpus is the option is selected with make config. I won't have access to my FreeBSD machine until tomorrow though. https://github.com/tensorflow/tensorflow/issues/7723 There suggestion is to add the --local-resource flag, but every machine is different, so it doesn't seem viable for the ports. The same is true for limiting bazel memory usage.
Created attachment 226550 [details] py-tensorflow.shar with 3-way PARALLEL_JOBS option Anthony, I reworked the PARALLEL_JOBS that you submitted earlier to run with half jobs by default. Will see if the build would succeed with it. Yuri
^Triage: Seems like this is in progress (bug 257311 comment 1)
^Triage: Pending QA
Given some software requires tensorflow 1.x (e.g. Photoprism) and others 2.x, could we please have science/py-tensorflow1 and science/py-tensorflow2?
Yeah, that was the original plan. Which is why I packaged the latest 1.x version first then moved onto 2.x. Though, not sure if it would be best to just have them conflict, or install into seperate prefixes such as ${LOCALBASE}/lib/tensorflorX and just ship the .pc file with them. Probably thinking having them conflict would be the easiest way, otherwise everything depending on it would need to be patched to find the correct version.
I agree that having them conflict is probably the best way forward right now. Setting up non-conflicting versions seems like a lot of work what currently looks like an unlikely use case.
Maybe let this get committed first then I can start a new bug to get the old 1.x port added. Add the minute i'm testing and adding options. XLA,MPI and opencl. Since cuda isn't support, and I am unsure of the status of rocm support on FreeBSD since I have no amd card to try, but don't want to throw in more and more patches in this bug report and prolong it getting committed
Updated in commit 3dccfaa0cdd58e4ba5cde9d68ddf1351b55db1ef
...to 1.15.5
(In reply to Daniel Engberg from comment #41) I was trying to compile TensorFlow 2.8/2.9 (Both as I had 2.8 earlier) on FreeBSD-13 and it fails on not able to identify toolchain. Not sure spent 4 hrs yesterday trying to figure out with no success. I'm more inclined to download this port but why are we at 1.5 while TF has released 2.9? can someone explain? I'm new and not familiar thus the ask. Also are their any plans to have binary in near future, compiling takes a sweet 7-10hrs. Looking forward for suggestions and guidance as I'm not a savvy ports or bazel compiler.
(In reply to Huskers from comment #42) There is also science/py-tensorflow2 but it doesn't look like it builds. TensorFlow uses Google tool Bazel. Basel based projects are extremely fragile and only build reliably inside of Google. Both ports worked at some point, and broke later. They are very difficult to fix. And Google won't provide cmake scripts for TensorFlow.
(In reply to Yuri Victorovich from comment #43) Sorry, my bad. science/py-tensorflow2 only exists on my disk, not in the ports tree. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=258621
Tensorflow 2.1 and 2.2 are already building for me and I have a port on my system and it's in my Github. However, bazel isn't worth the headache just to get TensorFlow to build when it would also lack GPU support, so I give up on it as it took far too much time out of my uni work. Currently, it's just easier to use cloud-based systems such as Google Colab, Azure of AWS for all my machine learning needs. However, I am working on my final year project and it's a drop-in replacement for TensorFlow with the same syntax that I have been working on but can't release it until it has all been marked and checked
(In reply to Daniel Engberg from comment #41) I was trying to compile TensorFlow 2.8/2.9 (Both as I had 2.8 earlier) on FreeBSD-13 and it fails on not able to identify toolchain. Not sure spent 4 hrs yesterday trying to figure out with no success. I'm more inclined to download this port but why are we at 1.5 while TF has released 2.9? can someone explain? I'm new and not familiar thus the ask. Also are their any plans to have binary in near future, compiling takes a sweet 7-10hrs. Looking forward for suggestions and guidance as I'm not a savvy ports or bazel compiler.(In reply to Anthony Donnelly from comment #45) (In reply to Anthony Donnelly from comment #45) I was able to compile 2.14 in FreeBSD-12 but post upgrade to FreeBSD-13 I'm not able to Bazel fails with a tool chain error.
Currently in progress of resolving the toolchain error for 2.9.1. However, it might be best to close this bug and open a new one to track the upgrading to 2.9.1 as there are a lot of issues to address. Currently making use of bazels built in patches command to try and patch some external sources during build.
Created one - let me know of additional info I can to assist anyone searching the bug list https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266141