2
Vote

There is no console encoding setting

description

Hi,

Kudos for the great work. I have one problem with PTVS since I started using it: although the interactive prompt uses utf-8 encoding, the debugging terminal does not.

Since most of my scripts deal with CJK processing, I cannot use any of PTVS debugging features.

Steps to reproduce:
  1. Create a new utf-8 encoded python script (cjk.py)
  2. Add this line: "print(‘打印’)"
  3. Run the script in the interactive prompt: you see "打印"
  4. Open a command prompt (cmd.exe), type "chcp 65001", then "python cjk.py": you see "打印"
  5. Launch the PTVS debugger: a black console window (python.exe) opens and python crashes:
Traceback (most recent call last):
  File "cjk.py", line 1, in <module>
    \ufeffprint('\u6253\u5370')
  File "C:\Python33\lib\encodings\cp850.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-1: character maps to <undefined>
Additional notes:
Note that the problem is the same if you start python manually from a command line, but "chcp 65001" fixes it.

I've tried the following to fix the problem:
  • Set the default console encoding in windows by adding chcp 65001 to the console autorun registrey setting. No effect.
  • Override python's default encoding by setting the environment variable PYTHONIOENCODING=utf-8 . This effectively suppresses the error, but you see: "µëôÕì░".
I can see to solutions to this:
  1. Let users set the console encoding (as in Notepad++ NppExec plugin, for which setting the console encoding to utf-8 and forcing the encoding using PYTHONIOENCODING does work)
  2. Let users debug in the python interactive (that would also be great because it would prevent the annoying black window from flashing while you debug step by step).
Implementing both would be nice as well; at the moment I cannot find any workaround, which makes the debugging feature much less useful than it could be...

Thanks for the great work on PTVS.

comments

Zooba wrote Sep 3, 2013 at 5:26 PM

You're correct that there's no suitable workaround. Generic consoles are separate from cmd.exe, which is why the environment variable is about your best option. (We don't even own the console with the issue - it belongs to python.exe.)

I've marked this as a feature for now. We certainly have some work to do in supporting non-ASCII Python, but it's not yet clear how we can best handle this.

pminaev wrote Sep 3, 2013 at 6:24 PM

While we do not control the console, when launching with debugging (F5), we have an opportunity to run some code before handing control over to the main program. We could inject os.execl('chcp 65001') there somewhere to switch encoding (so far as I can see, there's no Win32 API for that).

giotte wrote Dec 16, 2013 at 11:33 PM

I found a workaround that works great, so I thought I'd pass it along in case others find it helpful. I've tested this on Win 7 x64, Python 3.3.2, and PTVS 2.0 for VS 2010.

Short version:
  1. Download win_unicode_console.zip
  2. Unzip and place contents in the site-packages directory
  3. Set PYTHONSTARTUP environment variable to "C:\Python33\Lib\site-packages\run.py" (or the equivalent python install location on your machine)
Long version:
The win_unicode_console contains python code that replaces sys.stdin/out/err with new classes that use WriteConsoleW and ReadConsoleW along with UTF-16-LE encoding. This approach bypasses the current code page so there's no need for calls to 'chcp'. The critical part (for PTVS debugging) is that the code also contains a modified REPL so that interactive debugging also uses the new classes. The 3-step process listed above is the most global approach, which will affect not only PTVS debug sessions, but also any python script run from a command window (so you don't need to do "chcp 65001" before running python). You could also do this with virtual environments; just put the files in your env site-packages directory and set the PYTHONSTARTUP variable accordingly. Alternatively, you could take a more localized approach by adding the win_unicode_console package to your local project and doing: import stream; stream.enable(), which fixes text in the PTVS debug output window (i.e. the command window that pops up) but doesn't give you full interactive debugging in the "Python Debug Interactive" window. None of these approaches make any system changes; deleting the win_unicode_console files and removing the PYTHONSTARTUP variable sets everything back to the default setup.

All credit goes to the python gurus at bugs.python.org who tackled these gnarly Windows/Unicode issues (see here and here for details).