[Urwid] Problem related to UTF-8 processing of Latin strings
(Fatal)
Ian Ward
ian at excess.org
Sun Apr 16 20:18:32 EDT 2006
Neil Tallim wrote:
> I think I've found a problem, and it isn't just a complaint about
> Latin strings being mangled if they're evaluated as UTF-8.
>
> Options/modules used for testing:
> raw_display
> UTF-8 processing
>
> Details:
>
> In urwid/utable.py, lines 88, 98, and 111 ("b2 = ord(text[pos+1])")
> will throw an IndexError if one of a few high-ord Latin characters is
> at the end of a string. (I think Urwid is expecting another character
> to make a UTF-8 pair, but the string being evaluated is plain Latin)
It looks like your encoding is set to UTF-8, and you're passing plain
strings to Urwid to display in the latin-1 encoding. In general, Urwid
assumes that plain strings are in the system's default encoding, so you
can't use plain strings in an encoding other than UTF-8 when the
system's encoding is UTF-8.
If your application is designed to handle strings in the latin-1
encoding, convert them to unicode strings before displaying them:
eg:
textwidget = Text( unicode( mystring, "latin-1" ) )
You can force Urwid to disable its UTF-8 processing by calling
urwid.set_encoding("latin-1") but if your terminal really is in UTF-8
mode then the characters won't be displayed properly.
If your system's encoding is not UTF-8 then Urwid shouldn't be trying to
decode your strings with the utable module.. What is the output when
you run the "locale" command on your system?
Ian
More information about the Urwid
mailing list