Problems profiling

Apr 11, 2012 at 12:38 AM

I'm running into trouble using the Python Profiling tool (via Analyze > Launch Python Profiling), profiling a basic helloworld.py

Environment: VS2010 SP1, PVTS 1.1, Python 2.7.2.1 (previously running 2.7.1)

Machine: x64

Interpreter: x86 ipy.exe (failed), x64 ipy64.exe (failed)

System.AccessViolationException was unhandled 

Message=Attempted to read or write protected memory.

This is often an indication that other memory is corrupt. 

Source=IronPython.Modules.DynamicAssembly 
StackTrace:       
at InteropInvoker(IntPtr , Object , Object[] )       
at CallSite.Target(Closure , CallSite , Object , Object )       
at System.Dynamic.UpdateDelegates.UpdateAndExecute2[T0,T1,TRet](CallSite site, T0 arg0, T1 arg1)       
at CallSite.Target(Closure , CallSite , Object , Object )       
at Microsoft.Scripting.Interpreter.DynamicInstruction`3.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.Interpreter.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.LightLambda.Run4[T0,T1,T2,T3,TRet](T0 arg0, T1 arg1, T2 arg2, T3 arg3)       
at IronPython.Compiler.Ast.CallExpression.Invoke1Instruction.Run(InterpretedFrame frame)     
 at Microsoft.Scripting.Interpreter.Interpreter.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.LightLambda.Run1[T0,TRet](T0 arg0)       
at IronPython.Compiler.PythonCallTargets.OriginalCallTarget0(PythonFunction function)     
 at IronPython.Runtime.PythonFunction.FunctionCaller.Call0(CallSite site, CodeContext context, Object func)     
 at System.Dynamic.UpdateDelegates.UpdateAndExecute2[T0,T1,TRet](CallSite site, T0 arg0, T1 arg1)       
at Microsoft.Scripting.Interpreter.FuncCallInstruction`5.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.Interpreter.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.LightLambda.Run3[T0,T1,T2,TRet](T0 arg0, T1 arg1, T2 arg2)       
at IronPython.Compiler.Ast.CallExpression.Invoke0Instruction.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.Interpreter.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.LightLambda.Run5[T0,T1,T2,T3,T4,TRet](T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4)     
 at IronPython.Compiler.PythonCallTargets.OriginalCallTarget4(PythonFunction function, Object arg0, Object arg1, Object arg2, Object arg3)       
at IronPython.Runtime.PythonFunction.FunctionCaller`4.Call4(CallSite site, CodeContext context, Object func, T0 arg0, T1 arg1, T2 arg2, T3 arg3)       
at System.Dynamic.UpdateDelegates.UpdateAndExecute6[T0,T1,T2,T3,T4,T5,TRet](CallSite site, T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5)     
 at Microsoft.Scripting.Interpreter.FuncCallInstruction`9.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.Interpreter.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.LightLambda.Run7[T0,T1,T2,T3,T4,T5,T6,TRet](T0 arg0, T1 arg1, T2 arg2, T3 arg3, T4 arg4, T5 arg5, T6 arg6)       
at IronPython.Compiler.Ast.CallExpression.Invoke4Instruction.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.Interpreter.Run(InterpretedFrame frame)       
at Microsoft.Scripting.Interpreter.LightLambda.Run1[T0,TRet](T0 arg0)       
at IronPython.Compiler.RuntimeScriptCode.InvokeTarget(Scope scope)       
at IronPython.Compiler.RuntimeScriptCode.Run(Scope scope)       
at IronPython.Hosting.PythonCommandLine.RunFileWorker(String fileName)     
 at IronPython.Hosting.PythonCommandLine.RunFile(String fileName)       
at Microsoft.Scripting.Hosting.Shell.CommandLine.Run()       
at IronPython.Hosting.PythonCommandLine.Run()       
at Microsoft.Scripting.Hosting.Shell.CommandLine.Run(ScriptEngine engine, IConsole console, ConsoleOptions options)     
 at Microsoft.Scripting.Hosting.Shell.ConsoleHost.RunCommandLine()     
 at Microsoft.Scripting.Hosting.Shell.ConsoleHost.ExecuteInternal()       
at PythonConsoleHost.ExecuteInternal()       
at Microsoft.Scripting.Hosting.Shell.ConsoleHost.Execute()     
 at Microsoft.Scripting.Hosting.Shell.ConsoleHost.Run(String[] args)       
at PythonConsoleHost.Main(String[] args)  InnerException: 

Can any one else who have successfully done this share their setup?

Coordinator
Apr 11, 2012 at 12:43 AM

Unfortunate the profiling feature actually only works w/ CPython and not IronPython.  I'm actually a little surprised it made it this far, and we should probably try and block it earlier and give a better error message.  The reason is that the profiler is written in C for performance reasons and talks directly to the CPython C embedding API.

Profiling of IronPython is a little more difficult but can be done with the normal Visual Studio profiler.  But it'll require pre-compiling the code in your app using pyc.py (in <IPYINSTALLDIR>Tools\Scripts\) and then launching the resulting EXE under the normal profiler.  That'll also let you use sampling instead of instrumentation which will give you much less runtime overhead.

Apr 11, 2012 at 2:27 AM

Thank you.

Is there any way to identify what is the hot code inside the Python code itself? The profiler reports UpdateDelegates.UpdateAndExecute4() and IronPython.Modules.Builtin.getattr() being very busy. I guess we are using a bit too much dynamism in our existing code.

Would getting rid of "def __getattr__()" help the DLR to perform better?

Are there any guides to writing performant IronPython code? I noticed that PyStone benchmarks take more advantage of the DLR (e.g. DLRCachedCode.Proc0$96 is hot)

 

 

Coordinator
Apr 11, 2012 at 2:45 AM

I usually use the inclusive times when I'm not getting anything too useful from the exclusive times which usually indicate a hot method.  I just start drilling down through the call graph until I start getting to the middle of it and start seeing functions which are doing interesting work, but not necessarily doing a lot of work themselves. 

If you're seeing a lot of time in getattr() it's self I'd recommend seeing if you can get rid of calls to getattr - in general IronPython will be able to do a much better job of optimizing attribute access if it can do so statically via a call site doing "a.b" instead of doing it dynamically at runtime. If you have a handful of attributes which are being accessed frequently in your getattr calls even doing something like:

 

if name == 'a':

   res = foo.a

elif name == 'b':

   res = foo.b

else:

   res = getattr(foo, name)

 

Will likely help.

If you are only accessing each attribute once or twice then it might be better to avoid doing it through attributes - there's a bunch of overhead involved in figuring out how to access the attribute the 1st time, and from there it gets much faster. 

It is possible that __getattr__ is making the overhead of figuring out to do significantly higher.  Certainly it'll have some fixed cost on the attribute access but I wouldn't expect it to be too crazy, I'd expect it to be similar to the overhead of a Python property. 

Also, if you're generating classes dynamically (by calling type(...) directly, or generating a type in a loop/closure) that can really make attribute access expensive (at least for new-style classes).

You could also try switching between new-style and old-style classes if you can as one may perform much better for you.  PyStone uses old-style classes for example.

The goal of almost all of these would be to get into a situation where you're hitting a cached rule more often or avoid generating code for new rules - spending time in UpdateAndExecute4 indicates that you're either missing a hit on a rule or are generating a new rule.

Hopefully that'll help you get started!  I'm not aware of a good guide...  I don't even think it's covered in any of the available books :(