This project has moved. For the latest updates, please go here.

Need help with encoding issue!

Jun 22, 2014 at 1:23 PM
I am portuguese so I deal with characters such as ç, á, à, ã, among others.
when I do this:
import sys
I get: utf-8
So I guess I have to change it to latin1 or something but I didn't find a way to do it.
I can't even create a variable such as:
without getting an error: "SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xe1 in position 0: unexpected end of data"
This wouldn't be much of an issue in itself but when I query a database the results are also not satisfactory since I have to use a workaround to prevent it from failing immediately but that workaround still doesn't encode/decode the text in the database correctly. Example:

for row in cur.fetchall():
for field in row: 
    print (field.encode(encoding='latin'), end=" ") #Using this I get weird conversions of strings such as ç or à
    print (field, end=" ") #Using this I get a "fatal" error of unable to decode
Can someone please help me? I'm sure there's an easy fix..
thanks in advance!
Jun 22, 2014 at 7:44 PM
Judging by your getdefaultencoding() and the syntax of print(), you're using Python 3.x -. If so, the recommended approach to the first issue is to save your file itself as UTF-8 - it is simply the default encoding that Python chose to standardize on, and it's generally much easier to follow that than to fight it. You can control the encoding of your file in VS by going to File ->Advanced Save Options.

Alternatively, you can use the #coding comment in your source code to specify the encoding, or use Unicode escapes with \u or \U.

For the database problem, what is the type of field that you get from the query? It looks like it's bytes rather than str? if so, just encode it with encoding='utf-8'.

See these for more details about Unicode handling in Python:
Jun 23, 2014 at 10:51 AM

Thanks a lot for your help.
First off, yes I'm using python 3.4.
As for the 1st issue: it's solved thanks to the "advanced save options" solution :)
As for the db issue the type of field is actually str, but now it's returning the text exactly as it is in the db :) (probably because of the solution to the 1st issue)

However, there is still a minor issue. Even though the query result is ok and the strings that make up a list I created with the query results are fine. If I want to print them out I get an error and even though the list gets printed the execution stops with the following error:
UnicodeEncodeError: 'charmap' codec can't encode character '\xcd' in position 225: character maps to <undefined>

Do you know why this might be?

Thanks again!
Jun 23, 2014 at 5:48 PM
It's probably because the output encoding of sys.stdout (which, I believe, defaults to the current ANSI codepage on Windows - aka "language for non-Unicode programs" setting in Control Panel) is set to something that can't encode this particular string. Have a look at sys.stdout.encoding.

If changing that is not an option (it usually isn't, since this is a user-configurable setting), but you want to avoid errors when printing non-encodeable characters, you'll have to use encode explicitly like so: s.encode(sys.stdout.encoding, errors='replace') - this will print out any such characters as question marks.
Jun 23, 2014 at 7:10 PM
Once again it worked! I went to the region settings and I saw that the language for non-Unicode programs was English. Once I changed it to portuguese it worked!

You're the best! Thanks!