2012-03-31

Announcing mmshget: mmsh:// (MMS-over-HTTP) video stream downloader and reference implementation

This blog post is to announce mmshget, a command-line Python script to download streaming videos of the mmsh:// (MMS-over-HTTP) protocol, in .wmv (or .asf) format. mmshget can also be used as an easy-to-understand, simple, client-side, partial reference implementation of the mmsh:// protocol.

Download the Python script or see the source tree.

mmshget is inspired by and similar to mimms, but it is smaller, easier to understand and has less features. mimms supports both seekable and live streams (mmshget supports seeakable streams only), and mimms additionally supports non-HTTP versions of the MMS protocol (mmshget supports only mmsh://, the HTTP version). mimms depends on the C library libmms, mmshget is implemented in pure Python (needs Python 2.4 later only).

2012-03-27

UTF-8 issue: find doesn't find all your files

Public bug announcement: Beware that GNU find in findutils 4.4.2 (as shipped on Ubuntu Lucid) will not find all your files if it's run in the UTF-8 locale: even if the file is there, find may just skip printing its name. Solution: If you have non-ASCII characters in your file names, use LC_CTYPE=C find instead of find.

Example:

$ echo $LC_CTYPE
en_US.UTF-8
$ ls foo*                                                    
ls: cannot access foo*: No such file or directory
$ perl -e 'die if !open F, ">", "foo\x80bar"'
$ ls foo*
foo?bar
$ find -type f
...
./foo?bar
...
$ find -name 'foo*'
$ LC_CTYPE=C find -name 'foo*'                               
./foo?bar

Possible explanation: The file name matcher won't match a file if its name cannot be parsed properly in the current locale (LC_CTYPE). That is, since foo\x80bar is not valid UTF-8, GNU find 4.4.2 will not find it.

This strange behavior can be very surprising and possibly dangerous, especially in automated shell scripts.