Created attachment 200381 [details] patch Update tesseract to 4.0.0. Tested on 11.2-RELEASE.
*** Bug 234287 has been marked as a duplicate of this bug. ***
Moin moin Arch seems to have a fix for the manapages: https://git.archlinux.org/svntogit/community.git/commit/trunk?h=packages/tesseract&id=f877c8bf2315833dcde205fc724ccf1f07572751 [there are also upstream commits for that] Also, I think it needs USES=gmake -- otherwise it fails to build for me. mfg Tobias
There's already 4.1-rc1, so I think we could upgrade soon to 4.1 anyway.
It would be great if you could finish upgrading this soon or alternatively downgrade tesseract-data back to 3.04.00. The current combination of tesseract 3.x combined with tesseract-data from 4.x breaks certain languages, at the very least deu (= German). As a result, a ton of "ParamsModel::Unknown Parameter" and "ParamsModel::Incomplete line" errors are emitted when loading training data. Thanks! See also: https://stackoverflow.com/questions/41160539/tesseract-error-params-modelincomplete-line-error https://github.com/tesseract-ocr/tessdata/issues/96
(In reply to Piotr Kubaj from comment #3) Was 4.1 released already?
(In reply to Tobias C. Berner from comment #5) No, I think I'll go with 4.0 for now. I'm currently at AsiaBSDCon though, so I'll be able to look at it probably on Tuesday.
Both the current version in the tree and 4.0.0 fail to build after base r345349 due to under-linking libpthread. This should block bug 236141 as a result. Seeing about upstreaming the fix, as they are probably unaware of this one, plus they had to fix a similar under-linking for their MinGW support.
(In reply to Charlie Li from comment #7) Build bug reported upstream: https://github.com/tesseract-ocr/tesseract/issues/2344
This bug needs to block bug 236734 as well. Since this build process uses libtool, the C++ libraries are being linked with -nostdlib, which will leave out libpthread.
(In reply to Tobias C. Berner from comment #2) I tried tesseract with Arch's patch and the result is still the same (missing manpages). Also, I don't need gmake, it builds without it (Poudriere on 12-STABLE/amd64). Can we just commit the patch with 4.0?
Yeah sure. I'll look at it tonight -- can you rebase the patch after the pthread fix from earlier today? mfg Tobias
Created attachment 203193 [details] patch (In reply to Tobias C. Berner from comment #11) OK, attached.
A commit references this bug: Author: tcberner Date: Wed Mar 27 20:28:08 UTC 2019 New revision: 496977 URL: https://svnweb.freebsd.org/changeset/ports/496977 Log: graphics/tesseract: Update to 4.0.0 Changelog: https://github.com/tesseract-ocr/tesseract/wiki/ReleaseNotes#tesseract-release-notes-oct-29-2018---v400 - due to an issue with the build system the man pages are missing -- this will be corrected at a later stage. - bump revisions of dependencies due to shlib-version change. PR: 234285 Submitted by: Piotr Kubaj <pkubaj@anongoth.pl> (maintainer) Changes: head/graphics/opencv/Makefile head/graphics/opencv-java/Makefile head/graphics/p5-Image-OCR-Tesseract/Makefile head/graphics/py-opencv/Makefile head/graphics/tesseract/Makefile head/graphics/tesseract/distinfo head/graphics/tesseract/pkg-plist head/multimedia/vapoursynth/Makefile head/net/tucan/Makefile
Committed. Thanks!
Re-open fails with with: Making all in doc make[4]: make[4]: don't know how to make combine_lang_model.1. Stop make[4]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0/doc *** [all-recursive] Error code 1 make[3]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0 1 error make[3]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0 *** [all] Error code 2 make[2]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0 1 error make[2]: stopped in /ram/usr/ports/graphics/tesseract/work/tesseract-4.0.0 ===> Compilation failed unexpectedly. Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to the maintainer. *** Error code 1 Stop. make[1]: stopped in /usr/ports/graphics/tesseract *** Error code 1
Found it: USES misses gmake.
The DOCS option is empty and useless.
(In reply to w.schwarzenfeld from comment #16) Same here, fixed by adding gmake to USES.
(In reply to VVD from comment #18) How do you both build it to see the error? I'm building on 11.2amd64 with poudriere as well as manually from ports and it builds just fine (bulk and testport) without specifying gmake.
Ask the Makefile in the doc directory. Just normal with portmaster or in the port, same error.
(In reply to w.schwarzenfeld from comment #20) By "ask", do you mean manually changing into the doc directory and calling "make"?
I tried this but "make" does not work, gmake works.
(In reply to w.schwarzenfeld from comment #22) To make sure I understand: It builds and installs correctly for you, but changing manually into the doc directory within work and running make doesn't work. Running gmake there works. Given that it was committed with the caveat "no man pages", I assume this is to be expected. Assuming that this is correct: Here it builds with make and gmake just fine. I tried adding "USES=gmake", but this made do difference. Still no man pages. Please correct me if my understanding is not accurate.
Yes, correct. For man-pages: ===> Checking for items in STAGEDIR missing from pkg-plist Error: Orphaned: man/man1/ambiguous_words.1.gz Error: Orphaned: man/man1/classifier_tester.1.gz Error: Orphaned: man/man1/cntraining.1.gz Error: Orphaned: man/man1/combine_lang_model.1.gz Error: Orphaned: man/man1/combine_tessdata.1.gz Error: Orphaned: man/man1/dawg2wordlist.1.gz Error: Orphaned: man/man1/lstmeval.1.gz Error: Orphaned: man/man1/lstmtraining.1.gz Error: Orphaned: man/man1/merge_unicharsets.1.gz Error: Orphaned: man/man1/mftraining.1.gz Error: Orphaned: man/man1/set_unicharset_properties.1.gz Error: Orphaned: man/man1/shapeclustering.1.gz Error: Orphaned: man/man1/tesseract.1.gz Error: Orphaned: man/man1/text2image.1.gz Error: Orphaned: man/man1/unicharset_extractor.1.gz Error: Orphaned: man/man1/wordlist2dawg.1.gz Error: Orphaned: man/man5/unicharambigs.5.gz Error: Orphaned: man/man5/unicharset.5.gz ===> Checking for items in pkg-plist which are not in STAGEDIR ===> Error: Plist issues found. *** Error code 1 The DOCS option does nothing, it does not matter if it is on or off.
(In reply to w.schwarzenfeld from comment #24) I don't see those plist issues - are you maybe using a version from the patch and not what was committed (the committed version doesn't include man pages in the plist)= Could you maybe send me a complete build log + the port skeleton you're using (just tarring up the graphics/tesseract without the work directory) via email (grembo@)?
Created attachment 203204 [details] screenlog_tesseract Build screenlog.
Created attachment 203205 [details] tesseract_dir Tar of tesseract directory.
(In reply to w.schwarzenfeld from comment #26) That was helpful, thank you. The issue is that man pages depend on asciidoc. This isn't specified in the Makefile, hence it wasn't installed in a clean room build and the problem didn't trigger. I'll create a patch for testing and attach it to this PR.
Created attachment 203206 [details] Patch to make sure man pages build and are always built Note that the patch is against r497002 (4.0.0_1). - Make the build depend on asciidoc (so man pages are also built in a clean room environment which doesn't happen to have asciidoc installed by chance). - Use gmake, as building man pages fails otherwise. - Add man pages to pkg-plist. - Bump revision. Note that the DOCS option does work as intended - man pages are always built, but additional documentation located in /usr/local/share/doc/tesseract is only installed if DOCS is enabled. This is a common way of interpreting this toggle (so that there are always man pages). @w.schwarzenfeld This worked here okay, but maybe you could test too.
Thanks, this works, but the man pages a malformed man tesseract () () <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <meta http- equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" /> <meta name="generator" content="AsciiDoc 8.6.10" /> <title>TESSERACT(1)</title> <style type="text/css"> /* Shared CSS for AsciiDoc xhtml11 and html5 backends */ /* Default font. */ body { font-family: Georgia,serif; } /* Title font. */ h1, h2, h3, h4, h5, h6, div.title, caption.title, thead, p.table.header, #toctitle, #author, #revnumber, #revdate, #revremark, #footer { font-family: Arial,Helvetica,sans-serif; } body {
Created attachment 203207 [details] Patch to make sure man pages build AND aren't XML Note that the patch replaces the previous one and is against r497002 (4.0.0_1). - This adds the necessary dependencies and patches to convert asciidoc xml output to man pages Like before it also does: - Make the build depend on asciidoc (so man pages are also built in a clean room environment which doesn't happen to have asciidoc installed by chance). - Use gmake, as building man pages fails otherwise. - Add man pages to pkg-plist. - Bump revision. @w.schwarzenfeld Thanks for noticing the previous issues, maybe you could test once more.
Created attachment 203208 [details] Patch to make sure man pages build AND aren't XML - refined Note that the patch replaces the previous one and is against r497002 (4.0.0_1). - This adds the necessary dependencies and patches to convert asciidoc xml output to man pages - This is refined from the previous one - using localbase in locating xsltproc Like before it also does: - Make the build depend on asciidoc (so man pages are also built in a clean room environment which doesn't happen to have asciidoc installed by chance). - Use gmake, as building man pages fails otherwise. - Add man pages to pkg-plist. - Bump revision. @w.schwarzenfeld Thanks for noticing the previous issues, maybe you could test once more.
Sorry, does not work. Manpages still xml. I made a patch: patch-doc-generate_manpages.sh --- doc/generate_manpages.sh.orig 2019-03-28 04:57:12 UTC +++ doc/generate_manpages.sh @@ -1,4 +1,4 @@ -#!/bin/bash +#!/usr/local/bin/bash # # File: generate_manpages.sh # Description: Converts .asc files into man pages, etc. for Tesseract. @@ -16,7 +16,8 @@ # See the License for the specific language governing permissions and # limitations under the License. -man_xslt=/usr/share/xml/docbook/stylesheet/docbook-xsl/manpages/docbook.xsl +#man_xslt=/usr/local/share/xml/docbook/stylesheet/docbook-xsl/manpages/docbook.xsl +man_xslt=/usr/local/share/xsl/docbook/manpages/docbook.xsl asciidoc=$(which asciidoc) xsltproc=$(which xsltproc) if [[ -z "${asciidoc}" ]] || [[ -z "${xsltproc}" ]]; then and added to the Makefile: do-build: cd ${WRKSRC}/doc ./generate_manpages.sh
Sorry, is ok. First try went wrong. Second try was succesful. Patch works, thanks!
*** Bug 236810 has been marked as a duplicate of this bug. ***
@Piotr Sorry for creating a bit of noise on this bug - the final version of the patch[0] seems to works okay, would be cool if you could take a look at it and - assuming you're happy with it - give positive maintainer feedback, so that tcberner@ or I can commit it. [0]https://bugs.freebsd.org/bugzilla/attachment.cgi?id=203208
(In reply to Michael Gmelin from comment #36) Can someone just gives a port bit to Piotr so that he can commit the patch himself?
(In reply to mikael.urankar from comment #37) Not the word idea... and you should hide too :D
When will the patch be commited? The bug is bugging people.
The latest offered patch works here, but as we also have a ffmpeg update, now ffmpeg update/compilation dies with an error in the vain of being configured: [...] ===> ffmpeg-4.1.2_1,1 depends on shared library: libGL.so - found (/usr/local/lib/libGL.so) ===> Configuring for ffmpeg-4.1.2_1,1 ERROR: tesseract not found using pkg-config [...]
(In reply to O. Hartmann from comment #40) The ffmpeg issue seems to stem from missing inclusion of openmp when linking: cc -DLIBICONV_PLUG -isystem /usr/local/include -D_ISOC99_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -DPIC -O2 -pipe -DLIBICONV_PLUG -fstack-protector -isystem /usr/local/include -fno-strict-aliasing -std=c11 -fomit-frame-pointer -fPIC -pthre ad -I/usr/local/include -I/usr/local/include/p11-kit-1 -I/usr/local/include -I/usr/local/include -I/usr/local/include/freetype2 -I/usr/local/include/freetype2 -I/usr/local/include/opencv -I/usr/local/include -I/usr/local/include -I/usr/local/includ e/leptonica -L/usr/local/lib -c -o /tmp/ffconf.JO3vBiaq/test.o /tmp/ffconf.JO3vBiaq/test.c 15723 cc: warning: argument unused during compilation: '-L/usr/local/lib' [-Wunused-command-line-argument] 15724 cc -fstack-protector -L/usr/local/lib -Wl,--as-needed -Wl,-z,noexecstack -I/usr/local/include -I/usr/local/include/leptonica -L/usr/local/lib -o /tmp/ffconf.JO3vBiaq/test /tmp/ffconf.JO3vBiaq/test.o -ltesseract 15725 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `omp_get_thread_num' 15726 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_serialized_parallel' 15727 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_push_num_threads' 15728 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_for_static_init_4' 15729 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_end_serialized_parallel' 15730 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_global_thread_num' 15731 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_fork_call' 15732 /usr/local/bin/ld: /usr/local/lib/libtesseract.so: undefined reference to `__kmpc_for_static_fini' 15733 cc: error: linker command failed with exit code 1 (use -v to see invocation) 15734 ERROR: tesseract not found using pkg-config
(In reply to O. Hartmann from comment #40) Which OS version are you building ffmpeg on? I tried on 11.2 RELEASE using poudriere as well as in-tree build, both worked just fine (multimedia/ffmpeg with tesseract support enabled), so this might be a -CURRENT issue (see also bug #236907). Did you try a poudriere build? I might find some time later to investigate. Regardless of this issue, I'll commit a patch today under a "build fix" blanket.
For both, graphics/tesseract and multimedia/ffmpeg failing building environment is FreeBSD 13.0-CURRENT #79 r345739: Sat Mar 30 19:56:50 CET 2019 amd64 and its poudriere-jail equivalent (13.0-CURRENT 1300017 amd64 2019-03-27 18:28:25). I haven't checked on 12-STABLE so far, next week is the earliest timeslot to achieve this. 11.2-RELENG is built with poudriere only, and also for that, I can check at earliest next week.
(In reply to Michael Gmelin from comment #42) Sounds good to me. Do you have the string omp_get_thread_num in your tesseract shlib?
(In reply to Tobias C. Berner from comment #44) I could reproduce the problem on 13-CURRENT - as base LLVM includes OpenMP now and tesseract defaults to using it, it's picked up in the build, but tesseract doesn't link the library against libomp (it links all other binaries correctly). The same issue can be seen on 11.2 if devel/openmp is installed on the machine building from the ports tree (which won't ever happen in cleanroom builds, that's why it didn't show up in the poudriere builds). I'll commit a patch shortly. Besides the man page fixes, it includes a configuration option to use OpenMP on amd64/i386. This way --enable-openmp/--disable-openmp is always set explicitly. It also patches configure.ac to look for libomp, so it's linked against it (-fopenmp, as set by AC_OPENMP in OPENMP_CXXFLAGS didn't do the trick). It feels a bit dirty, but it seems to be a viable workaround. On top, it makes sure that libomp is a dependency on systems without OpenMP in base (necessary until 11.2 is EoL). I enabled OpenMP builds by default, as this is the default configuration upstream and is supposed to improve performance (even though reports on that differ). I did a few quick checks and could confirm speed gains with OpenMP enabled locally. I'll leave the bug "In Progress" until there's sufficient feedback that the problem is solved for everyone.
A commit references this bug: Author: grembo Date: Mon Apr 1 03:08:21 UTC 2019 New revision: 497458 URL: https://svnweb.freebsd.org/changeset/ports/497458 Log: Various build fixes: - Make sure man pages are always built correctly. They were only built if asciidoc happened to be installed and the output was XML instead of troff. - Fix build on 13-CURRENT - the build picked up OpenMP implicitly and didn't link the library to libomp properly, which broke dependent ports (like graphics/ffmpeg). - Fix build on releases if OpenMP was already installed. - Enable building with OpenMP support by default on i386 and amd64. - Bump revision PR: 234285 Approved by: portmgr (build fix blanket) Changes: head/graphics/tesseract/Makefile head/graphics/tesseract/files/patch-configure.ac head/graphics/tesseract/files/patch-doc_Makefile.am head/graphics/tesseract/pkg-plist
I guess I can close it now. Thanks all for help!