Bug 193694

Summary: non-ASCII characters lost on the way from commit-hook into Bugzilla
Product: Services Reporter: Matthias Andree <mandree>
Component: Core InfrastructureAssignee: Philip Paeps <philip>
Status: Closed FIXED    
Severity: Affects Some People CC: bugmeister, clusteradm, koobs, lwhsu, mat, mva, peter, philip, yuri
Priority: --- Keywords: easy, needs-patch, needs-qa
Version: unspecified   
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193682

Description Matthias Andree freebsd_committer freebsd_triage 2014-09-16 21:24:57 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193682#c4 contains a comment added by the ports commit hook.  The changelog itself contained one UTF-8 character (in the Submitted by:, "Siebörger") which is present and properly encoded in the commit log mailed out to the lists, but has been replaced by a question mark in the bug's comment.  The Bugzilla rendered bug page claims to also be encoded as UTF-8, so it's not clear why the umlaut got lost.

Please change the bug tracker such that UTF-8 characters get properly recorded and rendered.
Comment 1 Matthias Andree freebsd_committer freebsd_triage 2014-09-16 21:25:22 UTC
That should have been https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193682#c3
Comment 2 Marcus von Appen freebsd_committer freebsd_triage 2014-09-18 09:21:48 UTC
This might be related due to several issues.
The notifier script for example does not declare a content encoding in the mail being generated, which might then be just reencoded/played safe on picking up the mail.
I'm adding portmgr@ to keep them informed about the test outcome.
Comment 3 Marcus von Appen freebsd_committer freebsd_triage 2014-09-22 10:00:50 UTC
Local tests show that it is related to the (missing) content encoding.

@portmgr: Can we assume commit messages to be UTF-8 encoded and add a 

  echo 'Content-Type: text/plain; charset="UTF-8"'

into hooks/scripts/notify_bz.sh? Otherwise the content encoding is guessed randomly, with a fallback to the executing user's locale.
Comment 4 Matthias Andree freebsd_committer freebsd_triage 2014-09-22 18:33:16 UTC
Internally, Subversion stores everything as UTF-8 encoded unicode, or so the SVNBook.red-bean.com claims:

<http://svnbook.red-bean.com/en/1.7/svn.tour.importing.html#svn.tour.importing.naming>

Now the client (svn) will re-encode according to the locale setting, I'm not sure what svnlook does.  So in order to play it safe, you'll probably want to add

  export LANG=en_US.UTF-8

or more radical and to the point

  export LC_ALL=en_US.UTF-8

to the notify_bz.sh to enforce the declared encoding (C.UTF-8 or POSIX.UTF-8 causes complaining about svn not being able to set LC_CTYPE).

Please test what svnlook renders, I don't currently have a server-side repo at hand.
Comment 5 Marcus von Appen freebsd_committer freebsd_triage 2014-10-15 06:25:42 UTC
What's the status of this? Did someone of portmgr@ look into the necessary adjustments for notify_bz.sh?
Comment 6 Marcus von Appen freebsd_committer freebsd_triage 2014-11-13 06:52:16 UTC
portmgr@: is there any progress on this issue?
Comment 7 Matthias Andree freebsd_committer freebsd_triage 2014-11-18 22:14:53 UTC
If it's script adjustments, we need to get bugmeister on the hook.  Not sure if we can expect much help from portmgr@, so let's just try bugmeister.
Comment 8 Matthias Andree freebsd_committer freebsd_triage 2014-11-18 22:16:40 UTC
Sorry - I see that bugmeister reassigned to portmgr in September already. Reverting my changes.
Comment 9 Marcus von Appen freebsd_committer freebsd_triage 2014-12-14 09:04:51 UTC
Any news on this?
Comment 10 Matthias Andree freebsd_committer freebsd_triage 2014-12-14 12:45:53 UTC
Is this an area where non-maintainer commits get reverted? 

Else it's time for someone else to invoke maintainer timeout and take action.
Comment 11 Antoine Brodin freebsd_committer freebsd_triage 2014-12-14 12:48:25 UTC
I don't see any patch provided in this bug report,  why would timeout be invoked?
Comment 12 Antoine Brodin freebsd_committer freebsd_triage 2014-12-14 13:09:22 UTC
Also,  this probably affects base and docs,  it's not specific to ports so I'm not sure portmgr is the right contact (maybe svnadm@ / peter@)
Comment 13 Marcus von Appen freebsd_committer freebsd_triage 2014-12-14 20:01:32 UTC
(In reply to Antoine Brodin from comment #12)
> Also,  this probably affects base and docs,  it's not specific to ports so
> I'm not sure portmgr is the right contact (maybe svnadm@ / peter@)

Yes, it will affect all commits, which also write something into Bugzilla. So the problem will be the same for all different source trees, be it ports, doc or base. The issue is (in my opinion) simple to fix (see comment #3) and "only" would have an impact on Bugzilla comments. Should we give a fix a go?
Comment 14 Mathieu Arnold freebsd_committer freebsd_triage 2015-01-21 13:39:55 UTC
Got bitten with it in https://bugs.freebsd.org/196964 so adding me to the CC.
Comment 15 Kubilay Kocak freebsd_committer freebsd_triage 2015-11-02 07:24:00 UTC
@Marcus, is your add code suggestion in comment 3 still valid? If so I can add a patch here. If not, if someone else could that would be great.

Who is the maintainer/owner of the SVN hook scripts? We should assign the Product/Component/Assignee accordingly.
Comment 16 Kubilay Kocak freebsd_committer freebsd_triage 2015-11-02 07:28:44 UTC
Spoke to Peter on IRC, over to him (and clusteradm). Thanks Pete!
Comment 17 Peter Wemm freebsd_committer freebsd_triage 2015-11-02 07:30:03 UTC
I was planning to make some adjustments to the way bugzilla receives email, I'll take care of this as well.
Comment 18 Yuri Victorovich freebsd_committer freebsd_triage 2019-06-15 03:08:32 UTC
This is still a problem.
Comment 19 Mark Linimon freebsd_committer freebsd_triage 2021-06-12 00:01:59 UTC
^Triage: reassign to bugmeister@.
Comment 20 Philip Paeps freebsd_committer freebsd_triage 2021-06-12 00:24:45 UTC
Is this still a case following the Git migration?  The Git commit hooks go through a lot more trouble to encode UTF-8 strings.
Comment 21 Yuri Victorovich freebsd_committer freebsd_triage 2021-06-12 02:14:40 UTC
(In reply to Philip Paeps from comment #20)

This is still a case after Git migration.
Comment 22 Philip Paeps freebsd_committer freebsd_triage 2021-06-12 02:42:49 UTC
Okay.  I see the problem.  We encode in the notify-mailinglist hook but not in the notify-bugzilla hook.  I'll take care of this today.  This is simple enough to fix.
Comment 23 commit-hook freebsd_committer freebsd_triage 2021-06-12 07:05:50 UTC
This is a fake test commit.  Sorry for the noise.

commit 4338e4d340a02ac85d5ff0e7dfc1320b753c314f
Author:     麻煩 <philip@trouble.is>
AuthorDate: 2021-06-12 06:51:45 +0000
Commit:     Philip Paeps <philip@FreeBSD.org>
CommitDate: 2021-06-12 07:00:03 +0000

    hooks: test notify-bugzilla with UTF-8 strings

    This commit message was written in 香港.

 test (new) | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
Comment 24 Li-Wen Hsu freebsd_committer freebsd_triage 2021-06-12 11:03:07 UTC
Deployed to the production git server.