Bug 219132 - [NEW PORT] textproc/scws Simple Chinese word segmentation program and lib
Summary: [NEW PORT] textproc/scws Simple Chinese word segmentation program and lib
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Torsten Zuehlsdorff
URL:
Keywords:
Depends on:
Blocks: 219649
  Show dependency treegraph
 
Reported: 2017-05-08 06:15 UTC by Jov
Modified: 2017-07-17 10:16 UTC (History)
2 users (show)

See Also:


Attachments
new port shar file (3.39 KB, text/plain)
2017-05-08 06:15 UTC, Jov
no flags Details
scws.shar (2.41 KB, text/plain)
2017-07-14 10:43 UTC, Jov
no flags Details
test.sh (742 bytes, text/plain)
2017-07-14 10:48 UTC, Jov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jov 2017-05-08 06:15:08 UTC
Created attachment 182388 [details]
new port shar file

SCWS (Simple Chinese Word Segmentation) is a frequency dictionary based Chinese
word segmentation engine, it can cut a whole section of the Chinese text into
words. Word is the smallest unit of morpheme in Chinese, but in Chinese words
are not separated by spaces,so word segmentation is an important step for
Chinese language process.SCWS is written in C without other dependencies and
accept GBK and UTF-8 encoding for both the Simple Chinese (zh_CN) and the
Traditional Chinese (such as zh_TW).

WWW: http://www.xunsearch.com/scws/index.php
Comment 1 Martin Wilke freebsd_committer 2017-05-13 05:04:34 UTC
Comment on attachment 182388 [details]
new port shar file

># This is a shell archive.  Save it in a file, remove anything before
># this line, and then unpack it by entering "sh file".  Note, it may
># create directories; files and directories will be owned by you and
># have default permissions.
>#
># This archive contains:
>#
>#	scws
>#	scws/distinfo
>#	scws/tmp
>#	scws/Makefile
>#	scws/pkg-descr
>#	scws/pkg-plist
>#
>echo c - scws
>mkdir -p scws > /dev/null 2>&1
>echo x - scws/distinfo
>sed 's/^X//' >scws/distinfo << '9894864824fc6e3b607eae66f59e52c9'
>XTIMESTAMP = 1494223276
>XSHA256 (scws-1.2.3.tar.bz2) = 60d50ac3dc42cff3c0b16cb1cfee47d8cb8c8baa142a58bc62854477b81f1af5
>XSIZE (scws-1.2.3.tar.bz2) = 485903
>9894864824fc6e3b607eae66f59e52c9
>echo x - scws/tmp
>sed 's/^X//' >scws/tmp << 'c21a431cd890495b6974063b9bc51dc2'
>X/you/have/to/check/what/makeplist/gives/you
>Xbin/scws
>Xbin/scws-gen-dict
>X%%ETCDIR%%/rules.ini.sample
>X%%ETCDIR%%/rules.utf8.ini.sample
>X%%ETCDIR%%/rules_cht.utf8.ini.sample
>Xinclude/scws/charset.h
>Xinclude/scws/crc32.h
>Xinclude/scws/darray.h
>Xinclude/scws/pool.h
>Xinclude/scws/rule.h
>Xinclude/scws/scws.h
>Xinclude/scws/version.h
>Xinclude/scws/xdb.h
>Xinclude/scws/xdict.h
>Xinclude/scws/xtree.h
>Xlib/libscws.la
>Xlib/libscws.so
>Xlib/libscws.so.1
>Xlib/libscws.so.1.1.0
>c21a431cd890495b6974063b9bc51dc2
>echo x - scws/Makefile
>sed 's/^X//' >scws/Makefile << '1605fec3e0a421cd44a8dc7da17f49ed'
>X# Created by: Jov <amutu@amutu.com>
>X# $FreeBSD$
>X
>XPORTNAME=	scws
>XPORTVERSION=	1.2.3
>XCATEGORIES=	textproc
>XMASTER_SITES=	http://www.xunsearch.com/scws/down/
>X
>XMAINTAINER=	amutu@amutu.com
>XCOMMENT=	Simple Chinese word segmentation program and lib
>X
>XLICENSE=	BSD2CLAUSE
>X
>XGNU_CONFIGURE=	yes
>XUSES=		gmake libtool:keepla tar:bzip2
>XUSE_LDCONFIG=	yes
>X
>XCONFIGURE_ARGS=	--sysconfdir=${PREFIX}/etc/scws \
>X		--with-pic
>X
>XINSTALL_TARGET=install-strip
>X
>Xpost-install:
>X	${MV} ${STAGEDIR}${PREFIX}/etc/scws/rules.ini \
>X	                ${STAGEDIR}${PREFIX}/etc/scws/rules.ini.sample
>X	${MV} ${STAGEDIR}${PREFIX}/etc/scws/rules.utf8.ini \
>X	                ${STAGEDIR}${PREFIX}/etc/scws/rules.utf8.ini.sample
>X	${MV} ${STAGEDIR}${PREFIX}/etc/scws/rules_cht.utf8.ini \
>X	                ${STAGEDIR}${PREFIX}/etc/scws/rules_cht.utf8.ini.sample
>X
>X.include <bsd.port.mk>
>1605fec3e0a421cd44a8dc7da17f49ed
>echo x - scws/pkg-descr
>sed 's/^X//' >scws/pkg-descr << '31f5a13c77d8f428f238ab0d8084dfb9'
>XSCWS (Simple Chinese Word Segmentation) is a frequency dictionary based Chinese
>Xword segmentation engine, it can cut a whole section of the Chinese text into
>Xwords. Word is the smallest unit of morpheme in Chinese, but in Chinese words
>Xare not separated by spaces,so word segmentation is an important step for
>XChinese language process.SCWS is written in C without other dependencies and
>Xaccept GBK and UTF-8 encoding for both the Simple Chinese (zh_CN) and the
>XTraditional Chinese (such as zh_TW).
>X
>XWWW: http://www.xunsearch.com/scws/index.php
>31f5a13c77d8f428f238ab0d8084dfb9
>echo x - scws/pkg-plist
>sed 's/^X//' >scws/pkg-plist << '8e14e730a1dd29627c91dc2cf0df2327'
>Xbin/scws
>Xbin/scws-gen-dict
>X%%ETCDIR%%/rules.ini.sample
>X%%ETCDIR%%/rules.utf8.ini.sample
>X%%ETCDIR%%/rules_cht.utf8.ini.sample
>Xinclude/scws/charset.h
>Xinclude/scws/crc32.h
>Xinclude/scws/darray.h
>Xinclude/scws/pool.h
>Xinclude/scws/rule.h
>Xinclude/scws/scws.h
>Xinclude/scws/version.h
>Xinclude/scws/xdb.h
>Xinclude/scws/xdict.h
>Xinclude/scws/xtree.h
>Xlib/libscws.la
>Xlib/libscws.so
>Xlib/libscws.so.1
>Xlib/libscws.so.1.1.0
>8e14e730a1dd29627c91dc2cf0df2327
>exit
>
Comment 2 Jov 2017-05-26 01:35:30 UTC
Hi, Martin, I am working on another new port which depends on this port, what can I do to accelerate accept process for this port?
Comment 3 Jov 2017-06-20 01:06:20 UTC
ping
Comment 4 Mathieu Arnold freebsd_committer 2017-07-06 13:52:43 UTC
Assignee timeout. Give back to the pool.
Comment 5 Torsten Zuehlsdorff freebsd_committer 2017-07-13 15:24:07 UTC
1) I suspect the file "tmp" could be removed?

2) i see some *.sample files in %%ETCDIR%%. Are the usable as they are? In this case we should mark them with @sample.

We always should aim to make the usage of the port as easy as possible. If there is reasonable default-config we just should use it.

How can i test the port? Me kanji understanding is far to worse to get the instruction.
Comment 6 Jov 2017-07-14 10:43:50 UTC
Created attachment 184341 [details]
scws.shar

do not change the rule file name.
Comment 7 Jov 2017-07-14 10:48:11 UTC
Created attachment 184342 [details]
test.sh

test this PR:

set your env(assume csh):
setenv LANG zh_CN.UTF-8
./test.sh

will show:
env ok
test the lib: 
test ok!
scws-dict-chs-utf8.tar.bz2                    100% of 3994 kB 1054 kBps 00m03s
x dict.utf8.xdb
test the scws cmd: 
FreeBSD/en 是/v 一个/m 伟大/a 的/uj 操作系统/l 
+--[scws(scws-cli/1.2.3)]----------+
| TextLen:   37                  |
| Prepare:   0.0012    (sec)     |
| Segment:   0.0002    (sec)     |
+--------------------------------+
Comment 8 Torsten Zuehlsdorff freebsd_committer 2017-07-17 10:16:27 UTC
Thanks! Committed! :)
Comment 9 commit-hook freebsd_committer 2017-07-17 10:16:58 UTC
A commit references this bug:

Author: tz
Date: Mon Jul 17 10:16:05 UTC 2017
New revision: 446058
URL: https://svnweb.freebsd.org/changeset/ports/446058

Log:
  New port: textproc/scws

  SCWS (Simple Chinese Word Segmentation) is a frequency dictionary based Chinese
  word segmentation engine, it can cut a whole section of the Chinese text into
  words. Word is the smallest unit of morpheme in Chinese, but in Chinese words
  are not separated by spaces,so word segmentation is an important step for
  Chinese language process.SCWS is written in C without other dependencies and
  accept GBK and UTF-8 encoding for both the Simple Chinese (zh_CN) and the
  Traditional Chinese (such as zh_TW).

  WWW: http://www.xunsearch.com/scws/index.php

  PR:           219132
  Submitted by: Jov <amutu@amutu.com>

Changes:
  head/textproc/Makefile
  head/textproc/scws/
  head/textproc/scws/Makefile
  head/textproc/scws/distinfo
  head/textproc/scws/pkg-descr
  head/textproc/scws/pkg-plist