Bug 250646 - science/py-tensorflow: Update to 1.15.4
Summary: science/py-tensorflow: Update to 1.15.4
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: Yuri Victorovich
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-10-26 18:39 UTC by Austin Shafer
Modified: 2020-11-19 00:18 UTC (History)
2 users (show)

See Also:
bugzilla: maintainer-feedback? (amzo1337)


Attachments
py-tensorflow 1.15.4 svn diff (51.80 KB, patch)
2020-10-26 18:39 UTC, Austin Shafer
no flags Details | Diff
tensorflow svn diff v2 (63.33 KB, patch)
2020-11-18 22:51 UTC, Austin Shafer
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Austin Shafer 2020-10-26 18:39:32 UTC
Created attachment 219115 [details]
py-tensorflow 1.15.4 svn diff

- Fix and update science/py-tensorflow and science/py-tensorflow-estimator

Tensorflow is marked broken and is at 1.14. This is my initial attempt at updating it to 1.15.4 so that I could use GPT-2 on FreeBSD. I'm no expert on porting or tensorflow so I'm sure I made some mistakes.

This also re-adds science/py-tensorflow-estimator which was marked broken as well. I didn't have any problems building it so it should be the same as before.

This patch sets --jobs to 1 to try to guarantee that it will build. There was one kernel that required a huge amount of memory to compile which was why only one core could be used while building. For everything else I could set it to --jobs 5+ and it would run fine. If you build this yourself try building with as many jobs as you can want it gets to that one file.

The only portlint errors I saw were complaints about not using make makepatch, which is really weird because that is what I used. I fixed all other warnings. I've launched some testport poudriere stuff but it is going to take a long time for that to finish, so I'm going ahead and posting the bug to get feedback.

Changes made:
- Using the host jsoncpp was causing compatibility issues, so I think I have it marked to grab it from git instead.
- Changed the do-install to copy all of the correct folders to go in /usr/local/lib/python*
- Changed lots of patch files, mostly to add -lexecinfo
- Lots of places complained about errors similar to the following:
this rule is missing dependency declarations for the following files ...:
  'tensorflow/contrib/makefile/downloads/absl/absl/strings/string_view.h'
  'tensorflow/contrib/makefile/downloads/absl/absl/types/optional.h'

- I had to add the following lines in a lot of bazel deps in the patches to fix this:
"@com_google_absl//absl/strings",
"@com_google_absl//absl/base:core_headers",

I've been using the version I built with this port to run some GPT-2 text generation stuff for a few days and haven't had any problems at all. Hopefully this works for others and helps the official port get fixed.
Comment 1 Yuri Victorovich freebsd_committer 2020-10-26 19:02:02 UTC
Thanks for the patch, Austin.
Comment 2 Yuri Victorovich freebsd_committer 2020-11-14 01:40:01 UTC
(In reply to Austin Shafer from comment #0)

Austin,

I am trying to build TensorFlow with the patch but it fails to build in poudriere:
> WARNING: Download from https://github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz failed: class com.google.devtools.build.lib.bazel.repository.downloader.UnrecoverableHttpException Unknown host: github.com
> ERROR: An error occurred during the fetch of repository 'io_bazel_rules_closure':
>    java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz, https://github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz] to /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/io_bazel_rules_closure/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz: Unknown host: github.com
> ERROR: no such package '@io_bazel_rules_closure//closure': java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz, https://github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz] to /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/io_bazel_rules_closure/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz: Unknown host: github.com
> ERROR: no such package '@io_bazel_rules_closure//closure': java.io.IOException: Error downloading [https://storage.googleapis.com/mirror.tensorflow.org/github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz, https://github.com/bazelbuild/rules_closure/archive/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz] to /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/io_bazel_rules_closure/308b05b2419edb5c8ee0471b67a40403df940149.tar.gz: Unknown host: github.com

This is because downloads aren't allowed during buiuld.


The local build also fails:
> ERROR: /usr/ports/science/py-tensorflow/work-py37/bazel_out/2acfe593813f4c06cfb8cb015b65a7a2/external/jsoncpp_git/BUILD.bazel:5:1: C++ compilation of rule '@jsoncpp_git//:jsoncpp' failed (Exit 1)
> In file included from external/jsoncpp_git/src/lib_json/json_value.cpp:7:
> In file included from /usr/local/include/json/assertions.h:13:
> /usr/local/include/json/config.h:125:9: warning: 'JSON_HAS_INT64' macro redefined [-Wmacro-redefined]
> #define JSON_HAS_INT64
>         ^
> <command line>:5:9: note: previous definition is here
> #define JSON_HAS_INT64 1
>         ^
> external/jsoncpp_git/src/lib_json/json_value.cpp:1161:13: error: out-of-line definition of 'insert' does not match any declaration in 'Json::Value'
> bool Value::insert(ArrayIndex index, Value newValue) {
>             ^~~~~~


Yuri
Comment 3 Austin Shafer 2020-11-18 22:51:27 UTC
Created attachment 219800 [details]
tensorflow svn diff v2

This patch should correct the downloading issues. Poudriere built about half of it, but then ran into the following:


SUBCOMMAND: # @swig//:lnswiglink [action 'Executing genrule @swig//:lnswiglink [for host]']
(cd /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/execroot/org_tensorflow && \
  exec env - \
    PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/nonexistent/bin \
  /usr/local/bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; ln -s $(which swig3.0) bazel-out/host/bin/external/swig/s
wiglink')
ERROR: /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/swig/BUILD.bazel:13:1: declared o
utput 'external/swig/swiglink' was not created by genrule. This is probably because the genrule actually didn't create this output, or because
the output was a directory and the genrule was run remotely (note that only the contents of declared file outputs are copied from genrules run
remotely)
ERROR: /wrkdirs/usr/ports/science/py-tensorflow/work-py37/bazel_out/79f818d2f8c81bc5a548094dc218cfbb/external/swig/BUILD.bazel:13:1: not all outputs were created or valid


I'm building again outside of poudriere and so far its fine, so not sure what's going on here. Regarding the jsoncpp problem, I saw that when I was first fixing the port, and changed it to use jsoncpp from git instead of the one installed from pkg. (I removed the jsoncpp dep in this v2 patch). Originally I had to create yet another older jsoncpp port at version 1.9.2 to be used as the dependency instead of building jsoncpp_git locally.

All in all, tensorflow's build system is absolutely horrible and not fun to work on. Although the v2 patch still isn't perfect in poudriere, I'm posting it just so anyone else who tries will get farther than having those download problems.
Comment 4 Yuri Victorovich freebsd_committer 2020-11-19 00:18:01 UTC
(In reply to Austin Shafer from comment #3)

> All in all, tensorflow's build system is absolutely horrible and not fun to work on. 

I can't agree more. And bazel is the core of the problem.