A probem with CTLESC, CTLQUOTEMARK and UTF-8.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good day (or night)!

I am Ubuntu user, as for Jaunty 9.04 we have dash 0.5.4 installed as the
default sh interpreter. Ubuntu uses multibute UTF-8 to represent local
symbols, these symbols are often to be found in file names.

I found a bug, when trying to find out why a python script, doing some
little work of converting music files, would fail on songs with Cyrillic
names, containing letters с,ш,Ё. The reason was sh in system(...) call,
that created files with garbage in names when using "> $file_name"
redirection when $file_name contained these three letters.

For example, a sequence рсшЁъ (byte-by-byte)
{d1 80 d1 81 d1 88 d0 81 d1 8a}
is turned into
{d1 80 d1 d1 d0 d1 8a}. Bytes hex 81 and hex 88 disappear from the file
name.

The reason for such behaviour is in expand.c:239-240 for dash 0.5.4. The
lines and bug look similar in dash 0.5.5.1, here the place is
expand.c:216-217.

The piece of code:
########################################################################
		if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
			rmescapes(p);
########################################################################
cuts bytes x81 and x88. The behaviour seems to be allways unwanted,
because according to UTF-8 specifications, x81 and x88 can not represent
an individual symbol. Indeed, hex 81 = binary 10000001, hex 88 = binary
10001000; the upper two bits are 10, what means that the byte is
data-carrier and must always trail initiating byte (from
http://en.wikipedia.org/wiki/UTF-8#Description).

The problem, probably, do not occur when using a single-byte KOI8-R
encoding for Cyrillics, which is default for Debian.

I have also created a launchpad bug for Ubuntu,
https://bugs.launchpad.net/ubuntu/+source/dash/+bug/422298.

That's it, thanks for attention.
--
To unsubscribe from this list: send the line "unsubscribe dash" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [LARTC]     [Bugtraq]     [Yosemite Forum]     [Photo]

  Powered by Linux