[Urwid] Problem related to UTF-8 processing of Latin strings
(Fatal)
Ian Ward
ian at excess.org
Sun Apr 16 23:12:13 EDT 2006
Neil Tallim wrote:
> In the event that a Latin-1 string is evaluated as UTF-8 (my
> application's fault for not validating encoding when accepting input;
> I'm not blaming this part on anyone but myself), if the *final*
> character is, say, 'ä', then 'text[pos+1]' will cause an IndexError to
> be thrown. I believe this is because that character marks the start of
> a new UTF-8 double-byte character, so Urwid incorrectly assumes there
> will be another byte.
>
> Discarding, question-marking, or returning the ordinal value of this
> trailing character, if it exists, would prevent an error from being
> thrown, which would prevent the possibility of an unexpected crash.
> Related lines: 88, 98, 111 in urwid/utable.py
Ah, of course you're right.. the bug was staring me in the face. I've
attached a patch to utable.py that should fix the problem. I'll include
it in the next release.
Thank you for the bug report!
Ian
-------------- next part --------------
A non-text attachment was scrubbed...
Name: urwid_utable_bounds.patch
Type: text/x-patch
Size: 449 bytes
Desc: not available
Url : http://lists.excess.org/pipermail/urwid/attachments/20060416/5dd90eec/urwid_utable_bounds.bin
More information about the Urwid
mailing list