egrep -i is terrible slow if the locale is set to utf-8. In fact, it is 77 times slower then a case sensitive search. How to repeat: First, we create a 100MB text file: for i in $(seq 1 20);do man tcsh;done > /tmp/tcsh20; for i in $(seq 1 20); do cat /tmp/tcsh20;done > /tmp/tcsh400 $ du -hs /tmp/tcsh400 99M /tmp/tcsh400 # case sensitive search with utf-8 LANG=en_CA.UTF-8 time egrep -c foobar /tmp/tcsh400 0 0.11 real 0.06 user 0.04 sys # case in-sensitive search with utf-8, terrible slow LANG=en_CA.UTF-8 time egrep -ic foobar /tmp/tcsh400 0 8.47 real 8.42 user 0.04 sys # case sensitive search with ASCII LANG=C time egrep -c foobar /tmp/tcsh400 0 0.10 real 0.06 user 0.03 sys # case in-sensitive search with ASCII LANG=C time egrep -ic foobar /tmp/tcsh400 0 0.10 real 0.07 user 0.03 sys
Is this gnugrep or bsdgrep? (`egrep -V`) I'm getting something along these lines with your reproduction steps: root@www2:/usr/bin # env LANG=en_CA.UTF-8 time egrep -c foobar /tmp/tcsh400 0 1.13 real 0.73 user 0.08 sys root@www2:/usr/bin # env LANG=en_CA.UTF-8 time egrep -ic foobar /tmp/tcsh400 0 3.73 real 3.26 user 0.08 sys root@www2:/usr/bin # env LANG=C time egrep -c foobar /tmp/tcsh400 0 1.08 real 0.72 user 0.06 sys root@www2:/usr/bin # env LANG=C time egrep -ic foobar /tmp/tcsh400 0 1.11 real 0.74 user 0.07 sys With: root@www2:/usr/bin # grep -V grep (BSD grep, GNU compatible) 2.6.0-FreeBSD So egrep -i is still slower with bsdgrep, but it's only 3x slower than the equivalent case-sensitive search here.
I'm using the standard egrep from the base system $ /usr/bin/egrep -V egrep (GNU grep) 2.5.1-FreeBSD for bsdgrep I get: $ /usr/bin/bsdgrep -V bsdgrep (BSD grep, GNU compatible) 2.6.0-FreeBSD LANG=en_CA.UTF-8 time bsdgrep -c foobar /tmp/tcsh400 0 0.50 real 0.47 user 0.02 sys LANG=en_CA.UTF-8 time bsdgrep -ci foobar /tmp/tcsh400 0 2.09 real 2.04 user 0.04 sys