Bug 234285 - graphics/tesseract: Update to 4.0.0
Summary: graphics/tesseract: Update to 4.0.0
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Tobias C. Berner
URL:
Keywords: needs-qa
: 234287 236810 (view as bug list)
Depends on:
Blocks: 236680
  Show dependency treegraph
 
Reported: 2018-12-23 02:02 UTC by Piotr Kubaj
Modified: 2019-04-02 11:44 UTC (History)
9 users (show)

See Also:


Attachments
patch (3.96 KB, patch)
2018-12-23 02:02 UTC, Piotr Kubaj
koobs: maintainer-approval+
Details | Diff
patch (3.96 KB, patch)
2019-03-27 14:26 UTC, Piotr Kubaj
no flags Details | Diff
screenlog_tesseract (26.27 KB, application/gzip)
2019-03-28 03:19 UTC, Walter Schwarzenfeld
no flags Details
tesseract_dir (8.50 KB, application/x-tar)
2019-03-28 03:23 UTC, Walter Schwarzenfeld
no flags Details
Patch to make sure man pages build and are always built (1.58 KB, patch)
2019-03-28 04:03 UTC, Michael Gmelin
no flags Details | Diff
Patch to make sure man pages build AND aren't XML (2.96 KB, patch)
2019-03-28 05:39 UTC, Michael Gmelin
no flags Details | Diff
Patch to make sure man pages build AND aren't XML - refined (2.98 KB, patch)
2019-03-28 05:54 UTC, Michael Gmelin
grembo: maintainer-approval? (pkubaj)
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Piotr Kubaj freebsd_committer 2018-12-23 02:02:06 UTC
Created attachment 200381 [details]
patch

Update tesseract to 4.0.0.

Tested on 11.2-RELEASE.
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2018-12-30 12:24:54 UTC
*** Bug 234287 has been marked as a duplicate of this bug. ***
Comment 2 Tobias C. Berner freebsd_committer 2019-02-24 13:40:53 UTC
Moin moin

Arch seems to have a fix for the manapages:
https://git.archlinux.org/svntogit/community.git/commit/trunk?h=packages/tesseract&id=f877c8bf2315833dcde205fc724ccf1f07572751

[there are also upstream commits for that]


Also, I think it needs USES=gmake -- otherwise it fails to build for me.


mfg Tobias
Comment 3 Piotr Kubaj freebsd_committer 2019-02-26 13:23:10 UTC
There's already 4.1-rc1, so I think we could upgrade soon to 4.1 anyway.
Comment 4 Michael Gmelin freebsd_committer 2019-03-19 12:24:16 UTC
It would be great if you could finish upgrading this soon or alternatively downgrade tesseract-data back to 3.04.00. The current combination of tesseract 3.x combined with tesseract-data from 4.x breaks certain languages, at the very least deu (= German). As a result, a ton of "ParamsModel::Unknown Parameter" and "ParamsModel::Incomplete line" errors are emitted when loading training data.

Thanks!

See also:
https://stackoverflow.com/questions/41160539/tesseract-error-params-modelincomplete-line-error
https://github.com/tesseract-ocr/tessdata/issues/96
Comment 5 Tobias C. Berner freebsd_committer 2019-03-22 19:20:32 UTC
(In reply to Piotr Kubaj from comment #3)
Was 4.1 released already?
Comment 6 Piotr Kubaj freebsd_committer 2019-03-22 22:17:08 UTC
(In reply to Tobias C. Berner from comment #5)
No, I think I'll go with 4.0 for now. I'm currently at AsiaBSDCon though, so I'll be able to look at it probably on Tuesday.
Comment 7 Charlie Li 2019-03-24 03:53:32 UTC
Both the current version in the tree and 4.0.0 fail to build after base r345349 due to under-linking libpthread. This should block bug 236141 as a result.

Seeing about upstreaming the fix, as they are probably unaware of this one, plus they had to fix a similar under-linking for their MinGW support.
Comment 8 Charlie Li 2019-03-24 04:09:39 UTC
(In reply to Charlie Li from comment #7)
Build bug reported upstream: https://github.com/tesseract-ocr/tesseract/issues/2344
Comment 9 Charlie Li 2019-03-25 16:14:39 UTC
This bug needs to block bug 236734 as well. Since this build process uses libtool, the C++ libraries are being linked with -nostdlib, which will leave out libpthread.
Comment 10 Piotr Kubaj freebsd_committer 2019-03-27 11:50:39 UTC
(In reply to Tobias C. Berner from comment #2)
I tried tesseract with Arch's patch and the result is still the same (missing manpages).

Also, I don't need gmake, it builds without it (Poudriere on 12-STABLE/amd64).

Can we just commit the patch with 4.0?
Comment 11 Tobias C. Berner freebsd_committer 2019-03-27 12:46:02 UTC
Yeah sure. I'll look at it tonight -- can you rebase the patch after the pthread fix from earlier today?

mfg Tobias
Comment 12 Piotr Kubaj freebsd_committer 2019-03-27 14:26:50 UTC
Created attachment 203193 [details]
patch

(In reply to Tobias C. Berner from comment #11)
OK, attached.
Comment 13 commit-hook freebsd_committer 2019-03-27 20:28:43 UTC
A commit references this bug:

Author: tcberner
Date: Wed Mar 27 20:28:08 UTC 2019
New revision: 496977
URL: https://svnweb.freebsd.org/changeset/ports/496977

Log:
  graphics/tesseract: Update to 4.0.0

  Changelog:
  	https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes#tesseract-release-notes-oct-29-2018---v400

  - due to an issue with the build system the man pages are missing -- this will be corrected at a later stage.
  - bump revisions of dependencies due to shlib-version change.

  PR:		234285
  Submitted by:	Piotr Kubaj <pkubaj@anongoth.pl> (maintainer)

Changes:
  head/graphics/opencv/Makefile
  head/graphics/opencv-java/Makefile
  head/graphics/p5-Image-OCR-Tesseract/Makefile
  head/graphics/py-opencv/Makefile
  head/graphics/tesseract/Makefile
  head/graphics/tesseract/distinfo
  head/graphics/tesseract/pkg-plist
  head/multimedia/vapoursynth/Makefile
  head/net/tucan/Makefile
Comment 14 Tobias C. Berner freebsd_committer 2019-03-27 21:20:49 UTC
Committed. Thanks!
Comment 15 Walter Schwarzenfeld freebsd_triage 2019-03-27 22:43:34 UTC
Re-open
fails with with:

Making all in doc
make[4]: make[4]: don't know how to make combine_lang_model.1. Stop

make[4]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0/doc
*** [all-recursive] Error code 1

make[3]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0
1 error

make[3]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0
*** [all] Error code 2

make[2]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0
1 error

make[2]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0
===> Compilation failed unexpectedly.
Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to
the maintainer.
*** Error code 1

Stop.
make[1]: stopped in /usr/ports/graphics/tesseract
*** Error code 1
Comment 16 Walter Schwarzenfeld freebsd_triage 2019-03-27 23:16:40 UTC
Found it:

USES misses gmake.
Comment 17 Walter Schwarzenfeld freebsd_triage 2019-03-28 00:18:16 UTC
The DOCS option is empty and useless.
Comment 18 VVD 2019-03-28 00:21:26 UTC
(In reply to w.schwarzenfeld from comment #16)
Same here, fixed by adding gmake to USES.
Comment 19 Michael Gmelin freebsd_committer 2019-03-28 01:34:37 UTC
(In reply to VVD from comment #18)

How do you both build it to see the error? I'm building on 11.2amd64 with poudriere as well as manually from ports and it builds just fine (bulk and testport) without specifying gmake.
Comment 20 Walter Schwarzenfeld freebsd_triage 2019-03-28 01:53:43 UTC
Ask the Makefile in the doc directory.
Just normal with portmaster or in the port, same error.
Comment 21 Michael Gmelin freebsd_committer 2019-03-28 01:57:23 UTC
(In reply to w.schwarzenfeld from comment #20)

By "ask", do you mean manually changing into the doc directory and calling "make"?
Comment 22 Walter Schwarzenfeld freebsd_triage 2019-03-28 02:19:02 UTC
I tried this but "make" does not work, gmake works.
Comment 23 Michael Gmelin freebsd_committer 2019-03-28 02:28:00 UTC
(In reply to w.schwarzenfeld from comment #22)

To make sure I understand:
It builds and installs correctly for you, but changing manually into the doc directory within work and running make doesn't work. Running gmake there works. Given that it was committed with the caveat "no man pages", I assume this is to be expected.

Assuming that this is correct: Here it builds with make and gmake just fine. I tried adding "USES=gmake", but this made do difference. Still no man pages.

Please correct me if my understanding is not accurate.
Comment 24 Walter Schwarzenfeld freebsd_triage 2019-03-28 02:38:21 UTC
Yes, correct.
For man-pages:
===> Checking for items in STAGEDIR missing from pkg-plist
Error: Orphaned: man/man1/ambiguous_words.1.gz
Error: Orphaned: man/man1/classifier_tester.1.gz
Error: Orphaned: man/man1/cntraining.1.gz
Error: Orphaned: man/man1/combine_lang_model.1.gz
Error: Orphaned: man/man1/combine_tessdata.1.gz
Error: Orphaned: man/man1/dawg2wordlist.1.gz
Error: Orphaned: man/man1/lstmeval.1.gz
Error: Orphaned: man/man1/lstmtraining.1.gz
Error: Orphaned: man/man1/merge_unicharsets.1.gz
Error: Orphaned: man/man1/mftraining.1.gz
Error: Orphaned: man/man1/set_unicharset_properties.1.gz
Error: Orphaned: man/man1/shapeclustering.1.gz
Error: Orphaned: man/man1/tesseract.1.gz
Error: Orphaned: man/man1/text2image.1.gz
Error: Orphaned: man/man1/unicharset_extractor.1.gz
Error: Orphaned: man/man1/wordlist2dawg.1.gz
Error: Orphaned: man/man5/unicharambigs.5.gz
Error: Orphaned: man/man5/unicharset.5.gz
===> Checking for items in pkg-plist which are not in STAGEDIR
===> Error: Plist issues found.
*** Error code 1

The DOCS option does nothing, it does not matter if it is on or off.
Comment 25 Michael Gmelin freebsd_committer 2019-03-28 02:52:07 UTC
(In reply to w.schwarzenfeld from comment #24)

I don't see those plist issues - are you maybe using a version from the patch and not what was committed (the committed version doesn't include man pages in the plist)=

Could you maybe send me a complete build log + the port skeleton you're using (just tarring up the graphics/tesseract without the work directory) via email (grembo@)?
Comment 26 Walter Schwarzenfeld freebsd_triage 2019-03-28 03:19:06 UTC
Created attachment 203204 [details]
screenlog_tesseract

Build screenlog.
Comment 27 Walter Schwarzenfeld freebsd_triage 2019-03-28 03:23:14 UTC
Created attachment 203205 [details]
tesseract_dir

Tar of tesseract directory.
Comment 28 Michael Gmelin freebsd_committer 2019-03-28 03:40:27 UTC
(In reply to w.schwarzenfeld from comment #26)

That was helpful, thank you.

The issue is that man pages depend on asciidoc. This isn't specified in the Makefile, hence it wasn't installed in a clean room build and the problem didn't trigger.

I'll create a patch for testing and attach it to this PR.
Comment 29 Michael Gmelin freebsd_committer 2019-03-28 04:03:08 UTC
Created attachment 203206 [details]
Patch to make sure man pages build and are always built

Note that the patch is against r497002 (4.0.0_1).

- Make the build depend on asciidoc (so man pages are also built in a clean room environment which doesn't happen to have asciidoc installed by chance).

- Use gmake, as building man pages fails otherwise.

- Add man pages to pkg-plist.

- Bump revision.

Note that the DOCS option does work as intended - man pages are always built, but additional documentation located in /usr/local/share/doc/tesseract is only installed if DOCS is enabled. This is a common way of interpreting this toggle (so that there are always man pages).

@w.schwarzenfeld This worked here okay, but maybe you could test too.
Comment 30 Walter Schwarzenfeld freebsd_triage 2019-03-28 04:19:52 UTC
Thanks, this works, but the man pages a malformed

man tesseract
()                                                                          ()



<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD
XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <meta http-
equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" /> <meta
name="generator" content="AsciiDoc 8.6.10" /> <title>TESSERACT(1)</title>
<style type="text/css"> /* Shared CSS for AsciiDoc xhtml11 and html5 backends
*/

/* Default font. */ body {
  font-family: Georgia,serif; }

/* Title font. */ h1, h2, h3, h4, h5, h6, div.title, caption.title, thead,
p.table.header, #toctitle, #author, #revnumber, #revdate, #revremark, #footer
{
  font-family: Arial,Helvetica,sans-serif; }

body {
Comment 31 Michael Gmelin freebsd_committer 2019-03-28 05:39:52 UTC
Created attachment 203207 [details]
Patch to make sure man pages build AND aren't XML

Note that the patch replaces the previous one and is against r497002 (4.0.0_1).

- This adds the necessary dependencies and patches to convert asciidoc xml output to man pages

Like before it also does:

- Make the build depend on asciidoc (so man pages are also built in a clean room environment which doesn't happen to have asciidoc installed by chance).
- Use gmake, as building man pages fails otherwise.

- Add man pages to pkg-plist.

- Bump revision.

@w.schwarzenfeld Thanks for noticing the previous issues, maybe you could test once more.
Comment 32 Michael Gmelin freebsd_committer 2019-03-28 05:54:32 UTC
Created attachment 203208 [details]
Patch to make sure man pages build AND aren't XML - refined

Note that the patch replaces the previous one and is against r497002 (4.0.0_1).

- This adds the necessary dependencies and patches to convert asciidoc xml output to man pages

- This is refined from the previous one - using localbase in locating xsltproc

Like before it also does:

- Make the build depend on asciidoc (so man pages are also built in a clean room environment which doesn't happen to have asciidoc installed by chance).
- Use gmake, as building man pages fails otherwise.

- Add man pages to pkg-plist.

- Bump revision.

@w.schwarzenfeld Thanks for noticing the previous issues, maybe you could test once more.
Comment 33 Walter Schwarzenfeld freebsd_triage 2019-03-28 06:18:44 UTC
Sorry, does not work. Manpages still xml.

I made a patch:

patch-doc-generate_manpages.sh 
--- doc/generate_manpages.sh.orig       2019-03-28 04:57:12 UTC
+++ doc/generate_manpages.sh
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/usr/local/bin/bash
 #
 # File:         generate_manpages.sh
 # Description:  Converts .asc files into man pages, etc. for Tesseract.
@@ -16,7 +16,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-man_xslt=/usr/share/xml/docbook/stylesheet/docbook-xsl/manpages/docbook.xsl
+#man_xslt=/usr/local/share/xml/docbook/stylesheet/docbook-xsl/manpages/docbook.xsl
+man_xslt=/usr/local/share/xsl/docbook/manpages/docbook.xsl
 asciidoc=$(which asciidoc)
 xsltproc=$(which xsltproc)
 if [[ -z "${asciidoc}" ]] || [[ -z "${xsltproc}" ]]; then

and added to the Makefile:

do-build:
      cd ${WRKSRC}/doc ./generate_manpages.sh
Comment 34 Walter Schwarzenfeld freebsd_triage 2019-03-28 06:38:19 UTC
Sorry, is ok. First try went wrong. Second try was succesful.

Patch works, thanks!
Comment 35 Michael Gmelin freebsd_committer 2019-03-28 19:43:04 UTC
*** Bug 236810 has been marked as a duplicate of this bug. ***
Comment 36 Michael Gmelin freebsd_committer 2019-03-28 19:48:18 UTC
@Piotr Sorry for creating a bit of noise on this bug - the final version of the patch[0] seems to works okay, would be cool if you could take a look at it and - assuming you're happy with it - give positive maintainer feedback, so that tcberner@ or I can commit it.

[0]https://bugs.freebsd.org/bugzilla/attachment.cgi?id=203208
Comment 37 mikael.urankar 2019-03-28 20:02:08 UTC
(In reply to Michael Gmelin from comment #36)
Can someone just gives a port bit to Piotr so that he can commit the patch himself?
Comment 38 Tobias C. Berner freebsd_committer 2019-03-28 20:04:37 UTC
(In reply to mikael.urankar from comment #37)
Not the word idea... and you should hide too :D
Comment 39 O. Hartmann 2019-03-31 05:55:15 UTC
When will the patch be commited? The bug is bugging people.
Comment 40 O. Hartmann 2019-03-31 06:06:48 UTC
The latest offered patch works here, but as we also have a ffmpeg update, now ffmpeg update/compilation dies with an error in the vain of being configured:

[...]

===>   ffmpeg-4.1.2_1,1 depends on shared library: libGL.so - found (/usr/local/lib/libGL.so)
===>  Configuring for ffmpeg-4.1.2_1,1
ERROR: tesseract not found using pkg-config

[...]
Comment 41 Tobias C. Berner freebsd_committer 2019-03-31 08:40:02 UTC
(In reply to O. Hartmann from comment #40)

The ffmpeg issue seems to stem from missing inclusion of openmp when linking:

cc -DLIBICONV_PLUG -isystem /usr/local/include -D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DPIC -O2 -pipe -DLIBICONV_PLUG -fstack-protector -isystem /usr/local/include -fno-strict-aliasing -std=c11 -fomit-frame-pointer -fPIC -pthre      ad -I/usr/local/include -I/usr/local/include/p11-kit-1 -I/usr/local/include -I/usr/local/include -I/usr/local/include/freetype2 -I/usr/local/include/freetype2 -I/usr/local/include/opencv -I/usr/local/include -I/usr/local/include -I/usr/local/includ      e/leptonica -L/usr/local/lib -c -o /tmp/ffconf.JO3vBiaq/test.o /tmp/ffconf.JO3vBiaq/test.c
15723 cc: warning: argument unused during compilation: '-L/usr/local/lib' [-Wunused-command-line-argument]
15724 cc -fstack-protector -L/usr/local/lib -Wl,--as-needed -Wl,-z,noexecstack -I/usr/local/include -I/usr/local/include/leptonica -L/usr/local/lib -o /tmp/ffconf.JO3vBiaq/test /tmp/ffconf.JO3vBiaq/test.o -ltesseract
15725 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `omp_get_thread_num'
15726 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_serialized_parallel'
15727 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_push_num_threads'
15728 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_for_static_init_4'
15729 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_end_serialized_parallel'
15730 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_global_thread_num'
15731 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_fork_call'
15732 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_for_static_fini'
15733 cc: error: linker command failed with exit code 1 (use -v to see invocation)
15734 ERROR: tesseract not found using pkg-config
Comment 42 Michael Gmelin freebsd_committer 2019-03-31 12:02:31 UTC
(In reply to O. Hartmann from comment #40)

Which OS version are you building ffmpeg on? I tried on 11.2 RELEASE using poudriere as well as in-tree build, both worked just fine (multimedia/ffmpeg with tesseract support enabled), so this might be a -CURRENT issue (see also bug #236907). Did you try a poudriere build? I might find some time later to investigate.

Regardless of this issue, I'll commit a patch today under a "build fix" blanket.
Comment 43 O. Hartmann 2019-03-31 18:09:26 UTC
For both, graphics/tesseract and multimedia/ffmpeg failing building environment is FreeBSD 13.0-CURRENT #79 r345739: Sat Mar 30 19:56:50 CET 2019 amd64 and its poudriere-jail equivalent (13.0-CURRENT 1300017 amd64 2019-03-27 18:28:25).

I haven't checked on 12-STABLE so far, next week is the earliest timeslot to achieve this.  11.2-RELENG is built with poudriere only, and also for that, I can check at earliest next week.
Comment 44 Tobias C. Berner freebsd_committer 2019-03-31 18:20:05 UTC
(In reply to Michael Gmelin from comment #42)
Sounds good to me.


Do you have the string omp_get_thread_num in your tesseract shlib?
Comment 45 Michael Gmelin freebsd_committer 2019-04-01 02:56:16 UTC
(In reply to Tobias C. Berner from comment #44)

I could reproduce the problem on 13-CURRENT - as base LLVM includes OpenMP now and tesseract defaults to using it, it's picked up in the build, but tesseract doesn't link the library against libomp (it links all other binaries correctly).

The same issue can be seen on 11.2 if devel/openmp is installed on the machine building from the ports tree (which won't ever happen in cleanroom builds, that's why it didn't show up in the poudriere builds).

I'll commit a patch shortly. Besides the man page fixes, it includes a configuration option to use OpenMP on amd64/i386. This way --enable-openmp/--disable-openmp is always set explicitly. It also patches configure.ac to look for libomp, so it's linked against it (-fopenmp, as set by AC_OPENMP in OPENMP_CXXFLAGS didn't do the trick). It feels a bit dirty, but it seems to be a viable workaround. On top, it makes sure that libomp is a dependency on systems without OpenMP in base (necessary until 11.2 is EoL).

I enabled OpenMP builds by default, as this is the default configuration upstream and is supposed to improve performance (even though reports on that differ). I did a few quick checks and could confirm speed gains with OpenMP enabled locally.

I'll leave the bug "In Progress" until there's sufficient feedback that the problem is solved for everyone.
Comment 46 commit-hook freebsd_committer 2019-04-01 03:09:01 UTC
A commit references this bug:

Author: grembo
Date: Mon Apr  1 03:08:21 UTC 2019
New revision: 497458
URL: https://svnweb.freebsd.org/changeset/ports/497458

Log:
  Various build fixes:

  - Make sure man pages are always built correctly. They were
    only built if asciidoc happened to be installed and
    the output was XML instead of troff.
  - Fix build on 13-CURRENT - the build picked up
    OpenMP implicitly and didn't link the library to libomp
    properly, which broke dependent ports (like graphics/ffmpeg).
  - Fix build on releases if OpenMP was already installed.
  - Enable building with OpenMP support by default on i386
    and amd64.
  - Bump revision

  PR:		234285
  Approved by:	portmgr (build fix blanket)

Changes:
  head/graphics/tesseract/Makefile
  head/graphics/tesseract/files/patch-configure.ac
  head/graphics/tesseract/files/patch-doc_Makefile.am
  head/graphics/tesseract/pkg-plist
Comment 47 Piotr Kubaj freebsd_committer 2019-04-02 11:44:58 UTC
I guess I can close it now. Thanks all for help!