It is perfectly legitimate for cron-jobs to rely on cron doing the e-mailing for them. And it works perfectly fine for the default content-type of text/plain. However, if a cron-job generated HTML, PNG, or anything other than text/plain, it currently has to do the mailing itself. The proposed patch, which I've been using for the last 9 years here myself modifies the mail-sending part of cron to use libmagic(3) to determine the output's type. If determined successfully, the Content-Type header is set to advise the recipient's e-mail program on how to treat the message: http://aldan.algebra.com/~mi/cron-mime.diff
Created attachment 172959 [details] cron-mime patch (downloaded JIC it disappears some day)
Drop patch-ready since the patch doesn't apply cleanly.
(In reply to Pedro F. Giffuni from comment #2) > the patch doesn't apply cleanly. It does to 10.x :-)
Created attachment 184654 [details] cron-mime patch updated for the latest 10.3 This version merges in the recent addition of the ``-m mailfrom'' to 10.3.
While the intention is good and proposed feature would be useful, the solution has are some rough edges. First, it's not very flexible to implement this as global option. You could easily make it per-crontab depending on environment variable like CONTENT_TYPE. Cron already looks at per-crontab MAILTO environment variable, why not look at CONTENT_TYPE? Then, cutting contents at first 1024 bytes embedded in the code is error-prone. Note well: libmagic has MAGIC_PARAM_REGEX_MAX==8192 currently. I'd suggest "CONTENT_TYPE=libmagic:1024" format when ":1024" part would be optional to redefine default of MAGIC_PARAM_REGEX_MAX. Also, I would not be so sure about "Content-Transfer-Encoding: 8bit" header. 8bit does not mean "any data". According to RFC 2045 https://www.ietf.org/rfc/rfc2045.txt , it means short (998 octets or less) text strings without null bytes and CR/LF octets in CRLF sequences only. If the body contains binary data, the "binary" Content-Transfer-Encoding token must be used. Mentioned PNG would violate "8bit" for sure.
Also, file/libmagic is contributed software with CDDL license and we have src.conf(5) knob WITHOUT_FILE to build system without this dependency. It is important for many reasons including embedded friendyness to minimize size of compiled code, so please make sure you respect WITHOUT_FILE and make your changes compile-time conditional, f.e: .if ${MK_FILE} != "no" CFLAGS+=-DLIBMAGIC .endif
(In reply to Eugene Grosbein from comment #5) I don't think, it should be a per-user option, because the added behavior does not merely add a feature -- it fixes a bug. A non-textual e-mail without a MIME-type is invalid and a program generating it is buggy... I'm hoping, once merged and tested "in the wild", the change will become permanent . Which means, I need to reverse the meaning of the new `-M' option. (In reply to Eugene Grosbein from comment #6) > make sure you respect WITHOUT_FILE and make your changes compile-time > conditional Ah, didn't know about it. Yes, will do -- a few #ifdefs will be necessary... Thanks for the review. I'll get back to it...
(In reply to Mikhail T. from comment #7) Cron jobs are not supposed to spit binary data to standard output. I don't think cron daemon should be responsible for fixing such bugs of user jobs. I also don't think that libmagic overhead should be made default as most jobs don't need auto-detection of Content-Type but fixed value like CONTENT_TYPE=fixed:"text/plain; charset=utf-8"
(In reply to Eugene Grosbein from comment #8) > Cron jobs are not supposed to spit binary data to standard output. One can just as well argue, cron-jobs aren't supposed to spit /anything/ to standard output. But they do and that output has been emailed by cron for decades. If my job fetches a text-file from a web-server every day, I can rely on cron to e-mail it to me -- why should the logic be any different, if I switched to fetching a JPG? And it is not necessarily binary -- what if the output is a CSV or HTML? There is no good reason not to mark it correctly, indeed, it is a bug not to... > also don't think that libmagic overhead should be made default What overhead? Compared to mailing something in the first place -- which involves multiple fork-ing, exec-ing, and network traffic, consulting a few local files in-process is nothing -- and cron-jobs aren't happening millions of times per minute... Maybe, it should be possible for the cronjob-owner to explicitly set the mime-type per job. But, if they don't, cron should do the guessing for them. Unless compiled without the functionality at all, which should, as you suggested, be an option (WITHOUT_MAGIC). > fixed value like CONTENT_TYPE=fixed:"text/plain; charset=utf-8" "text/plain" is the default for messages with Mime-Type set and thus does not need to be explicitly set at all. Except for when it is wrong. BTW, my default for text is KOI8-U so right there some other logic would be necessary to set the header correctly. Better leave it to libmagic and be consistent with other software (like Apache's mod_magic).
(In reply to Mikhail T. from comment #9) > cron-jobs aren't supposed to spit /anything/ to standard output They are, as any other shell command. > why should the logic be any different, if I switched to fetching a JPG? Because SMTP does not support unencoded arbitrary binary data. That is, "Content-Transfer-Encoding: binary" is not supported in the Internet e-mail: no MTA is supposed to to receive and process such data, and many do not. And I don't think we should make cron to perform MIME encoding. > What overhead? libmagic processing may be expensive in terms of I/O, memory and CPU times. Have you tried to evaluate this overhead using some slow CPU like Intel Atom or ARM or MIPS32? You may be surprised. It evolves reading significant amount of libmagic own data and I/O may be slow due to slow media like CompactFlash. Then it may run plenty regexp's against input data. The whole process can easily long for seconds of wall time making measurable system load. Such processing should be opt-in, not opt-out.
> They are, as any other shell command. A shell command can output data of any type... > Because SMTP does not support unencoded arbitrary binary data. ? Could you cite the RFC? At any rate, this PR is not about binary necessarily. Be it text/html, or text/xml, or CSV, or HTML -- the MIME-type should be set properly. > Such processing should be opt-in, not opt-out. If that's your condition for accepting this, I'll paint this bikeshed however you wish.
(In reply to Mikhail T. from comment #11) >> Because SMTP does not support unencoded arbitrary binary data. > Could you cite the RFC? It is not possible to cite non-existent requirement for a MTA to accept unencoded binary data. And there are MTAs that reject unencoded binary data containing NUL bytes, for example.
(In reply to Eugene Grosbein from comment #12) > It is not possible to cite non-existent requirement for a MTA to > accept unencoded binary data. In comment #10 you said: "SMTP does not support unencoded arbitrary binary data." A citation supporting this statement should be possible... But, to keep this discussion from becoming even more confrontational, here is the RFC from 2000, that provided for binary-clean SMTP: https://tools.ietf.org/html/rfc3030 > And there are MTAs that reject unencoded binary data containing NUL bytes Sendmail -- the MTA bundled with FreeBSD -- is fine with it... Whoever replaces it on their system with something lacking this feature, can make sure, their cron-jobs don't output binaries...
(In reply to Mikhail T. from comment #13) No MTA is obliged to support such extensions. And in most cases local MTA (sendmail or not) would relay locally generated mail to some external MTA that it allowed to reject binary data just for its "binaryness". It is not expected that unencoded binary mail should pass through relays. In fact, most real-world e-mail processing software involved (mail filters etc.) do not tolerate NUL bytes in mail.
Created attachment 188784 [details] set mime-type and employ base64 encoding, if binary is detected (In reply to Eugene Grosbein from comment #14) Ok, how about this version? When "encoding=binary" is found in the the mimetype, it encodes the entire body as base64 -- using OpenSSL's implementation found in -lcrypto.
(In reply to Mikhail T. from comment #15) It still does not respect WITHOUT_FILE. And if such code uses openssl, it should depend on WITHOUT_OPENSSL too. However, our libc actually has b64_ntop() function for base64 encoding provided with <resolv.h> header and you can get example of its usage from src/usr.bin/uuencode/uuencode.c "usage" message mistakenly shows "-m" instead of "-M". Added #include's should be sorted by name of header. sizeof(char) is 1 by definition of C programming language (see http://chimera.roma1.infn.it/SP/COMMON/iso-iec-9899-1990.pdf section 6.5.3.4 paragraph 4), so you need not use bufsize*sizeof(char). With MAGIC_MIME flag to magic_open() you request both of MIME type and encoding but then skip MIME type with strstr() looking for encoding only. Wouldn't it better to use MAGIC_MIME_ENCODING instead of MAGIC_MIME and directly compare result with "charset=binary"? Otherwise, looks good, thanks!
(In reply to Eugene Grosbein from comment #16) > our libc actually has b64_ntop() function for base64 encoding provided with <resolv.h> I saw that, but it asks for a buffer to fill, whereas -lcrypto's implementation can output directly into a FILE *. I tried using -larchive, but that library's base64-code unconditionally wraps the base64-output with `begin-base64 ... ====` lines, which confuses e-mail programs (https://github.com/libarchive/libarchive/issues/976). > sizeof(char) is 1 by definition of C programming language Ironically, the very uuencode code, which you referred me to has lines like: rv = b64_ntop(buf, n, buf2, (sizeof(buf2) / sizeof(buf2[0]))); :-) But, yes, I'll be glad to clean out things like sizeof(char).
(In reply to Eugene Grosbein from comment #16) > Wouldn't it better to use MAGIC_MIME_ENCODING instead of MAGIC_MIME I still need the mime type to populate the Content-Type header. I do not need the charset in it, but it does not seem to harm anything. I could split it into two calls -- one for MAGIC_MIME_TYPE and another for MAGIC_MIME_ENCODING, but I thought it'd be more to your liking to reduce the amount of conversations with libmagic.
(In reply to Mikhail T. from comment #18) Fine. One more: https://tools.ietf.org/html/rfc2045#section-4 says: Messages composed in accordance with this document MUST include such a header field, with the following verbatim text: MIME-Version: 1.0 There are no reasons to change case to Mime-Version, please fix this too.
(In reply to Mikhail T. from comment #17) > Ironically, the very uuencode code, which you referred me to has lines like: > > rv = b64_ntop(buf, n, buf2, (sizeof(buf2) / sizeof(buf2[0]))); No, it has not. Perhaps, you were reading output of pre-processor (cc -E) because original source has "b64_ntop(buf, n, buf2, nitems(buf2))" - it uses generic macro nitems() to count.
For bugs matching the following conditions: - Status == In Progress - Assignee == "bugs@FreeBSD.org" - Last Modified Year <= 2017 Do - Set Status to "Open"
Mikhail, do you still have any interest in reviving this patch?
I certainly do. However, I have a feeling, the work would complete faster, if an actual src-committer took it upon himself to finish it to HIS satisfaction. The changes Eugene requested could _easily_ have been done by him, for example.
(In reply to Mikhail Teterin from comment #23) While I do not object against the feature and understand if could be useful for some part of user base, I still do not think it is cron's job to check and/or correct MIME format of job's output. The feature could be implemented as external filter processing an output using a pipe, if needed. And if it is embedded into the cron anyway, it should impose no regressions on correctness and performance for general case. I'm not going to develop that myself but I would commit the patch in form of correct and low-overhead optional feature.
Created attachment 205286 [details] Refresh This version applies to the current sources in 11 (after the mailfrom) change. It also respects the WITHOUT_OPENSSL knob -- continuing to use the 8bit encoding even if the program's output appears binary if OpenSSL was disabled at build time. A few consts added and a few more warnings fixed to continue to comply with WARNS=5.
(In reply to Mikhail Teterin from comment #25) The patch still has multiple issues: - cron.8 part is corrupted, please check it out carefully; - the change for usage() function still has the bug I already noted: it adds -m instead of right -M; - the change still does not respect WITHOUT_FILE build option that is required for CDDL-clean build of the system. Also, while increasing WARNS is good intention, such style changes should not be mixed with functional additions like yours. Please split the change in two, one for your own code and second for a patch touching other parts of code just fixing warnings. Among other things, it will make code review easier.
Eugene, I've concluded long ago, you have no real interest in this work -- the points you're raising (with the possible exception of respecting WITHOUT_FILE) amount to bikeshed-painting. You could've done the little fixups you deemed necessary two years ago yourself... You now demand, I split the warning-fixes into a separate patch -- but you were perfectly fine requesting, I add encoding to base64 earlier. That fix -- to a problem I didn't introduce -- didn't make it any harder for you to review, indeed, you insisted it be mixed with my main new feature. Please, stop trying to develop my character with this condescending master-pupil discipline and simply unsubscribe yourself from this ticket. Maybe, some day, someone with better faith and more interest will come along... Thank you.