How does PTVS debug engine work?

Jun 29 at 6:59 PM
Hi,
I am wondering how th PTVS debugger work. From the source code, I could see that there are two debug engine being used: the AD7 and Dkm (Concord?)

So, since python is a script language. It seems that AD7 is enough , why Dkm is needed? Is it used for mix-mode debugging?

And 2nd question is how PTVS debugger work (same as the subject title). I checked the source code and it seems that PTVS is actually doing remote debugging. So, the real debugger is inside the debuggee process. Is that correct?

And then the last question is how the address translation work? I mean in python code, the code address is defined by file name( or index?) plug line#. However, in C++ this is defined by PC address (EIP value). So, how could this be unified in the debug engine, if running in mixed mode?

Thank you very much!
Coordinator
Jun 29 at 8:32 PM
So, since python is a script language. It seems that AD7 is enough , why Dkm is needed? Is it used for mix-mode debugging?
Yes, DKM (Concord) is for mixed-mode debugging. It is the only way to reliably implement that on VS today - on the other hand, if done right, you get mixed everything with everything (e.g. we did it with C++ in mind, but got .NET for free, and I think we can also mix with JS in Store apps, though I haven't tried it yet).
And 2nd question is how PTVS debugger work (same as the subject title). I checked the source code and it seems that PTVS is actually doing remote debugging. So, the real debugger is inside the debuggee process. Is that correct?
For non-Concord case, yes, the visualstudio_ptvs_debugger.py script is the heart of the debugger, and is running inside debuggee. The implementations of AD7 interfaces, as well as the VS-independent layer underneath them (PythonProcess etc), mainly concern themselves with establishing the connection, sending commands to that script and receiving responses, and mapping that all to AD7 method calls and events.
And then the last question is how the address translation work? I mean in python code, the code address is defined by file name( or index?) plug line#. However, in C++ this is defined by PC address (EIP value). So, how could this be unified in the debug engine, if running in mixed mode?

And then the last question is how the address translation work? I mean in python code, the code address is defined by file name( or index?) plug line#. However, in C++ this is defined by PC address (EIP value). So, how could this be unified in the debug engine, if running in mixed mode?
Concord has a notion of custom addresses (DkmCustomInstructionAddress), which are basically arbitrarily encoded byte streams. The encoding is handled by SourceLocation class in PTVS code. It's not really necessary to do any mapping there, but the debugger component must provide means to extract source information from those addresses (which in our case is easy, since the address is defined by filename + line number, as reported by Python).

The tricky part is actually stack frames, when doing a stack walk, since that's the only case where you have frames from different languages mixed together in a single place. Concord does not have a notion of "custom frames" - every frame (or a sequence of frames) must be associated with some actual address range on the physical stack, defined by frame base (basically, EBP) and size. Every frame also has an associated current instruction address, and that can be a custom address as defined below (and is responsible for displaying function & file names and line numbers in the call stack window, and navigating to source document when the entry is double-clicked), but for stack walk, only frame base & size matter.

Now, for Python stack walk, we actually "cheat" - we don't do our own stack walk for Python frames. Instead, we let the native debugger do the complete walk (any frame that is not claimed by some other runtime is going to be reported as a native frame), but install a call stack filter that runs after the walk, and replaces any PyEval_EvalFrameEx native frames with the Python frames that they represent (reading the pointer to PyFrameObject from the parameters of EvalFrameEx, directly from the stack). Because there is a 1:1 mapping there, we can simply take the frame base & size, as well as all the other physical artifacts (like registers), from the corresponding native frame. We also completely hide all frames that originate from python??.dll, but are not PyEval_EvalFrameEx (though there's an undocumented switch to disable that behavior, which lets you see internal interpreter call stacks). The result is what you see in practice when you run the code. Take a look at CallStackFilter to see how that works.

If you're trying to adapt this code to some other language/interpreter, there are different ways to go about it, which depend on what exactly that interpreter looks like from the inside (in Lua, for example, the stack filtering scheme will not work, because Lua frames don't map to native frames somewhere inside the interpreter 1:1 - so you'll need to do your own stack walk). I'll be happy to help you with that, if you can share more details.
Jun 30 at 6:41 AM
Edited Jun 30 at 6:53 AM
Hi Pavel,
Thank you so much for your reply! Now I am clear on the first two questions.

For the stackwalking issue , actually, I am not (so far) worried much about it, since the runtime is actually a JIT for native program, where all user frames are native. So, the most challenging issue in front of me is how to present to the debugger or IDE that the user is running code natively, ie. suppress the effect of JIT. Currently, in the Visual Studio IDE, I will see that the program is actually running inside code cache (JIT code), which causes the issue that debugger won't be able to find its corresponding source code.

Of course, remote debugging could easily solve the issue, since all debugger knows is the information coming from the remote side, which could fake it. But I prefer more about native debugging. And the technical question is how to convince the IDE that the program is actually in the original program address.

Probably, I am so far not so familiar with the API, thus I am not sure how to give IDE such a view. Would you like to tell me about it? I think python has the same issue here, you also need to feed IDE with the right instruction address which is filename + line number. What I don't understand is that how this information goes to the IDE and finally presented correctly to the user.

Derive from this question, the other one is that: if I need to implement the custom instruction representation, where should it be implemented? I mean should I implement it in AD7 (building my own AD7 engine) or inside the Dkm. Implementing in AD7 seems overkill my problem as I feel, since I am not creating a new language. (I feel probably I am wrong here, please correct me)

So, the issue in my mind is still in the high level. I am not sure which strategy is correct: 1) IDE has the original program view, while debugger has the real view (means that it is aware of the JIT). 2) both IDE and debugger has the original view.

If, following strategy 1, one idea is that, since Visual Studio higher level is using boundBreakpoint + PendingBreakpoint, Dkm level is using RuntimeBreakpoint. So, could I do the following: when receiving a runtime breakpoint, creating another runtime breakpoint with custom instruction address and throw the event to the higher level. And high level debugger monitor could receive it and match it with a boundBreakpoint correctly.

Strategy 2 seems more like a remote debugging strategy, so it could be a fallback if strategy 1 does not work.

Thank you.
Coordinator
Jun 30 at 8:51 AM
First of all, to clarify: AD7 and Concord are basically mutually exclusive. Either your engine is AD7-based, or it is Concord-based (Concord is actually layered on top of AD7). If you're targeting Concord, you don't need to worry about AD7 artifacts, except possibly for the attach code to have your language show up as another code type in Attach to Process.

Generally speaking, if you're tweaking some aspects of the native debugger, then you should probably be using Concord.

It is not entirely clear to me what you mean by "suppress the effect of JIT" - or, for that matter, what "JIT for native program" should imply. If you have a JIT, this implies the existence of some kind of a VM, in which case you really are introducing a custom runtime and possibly a custom language into the picture. It would help to know more details.

The general flow of custom instructions & the associated data for breakpoints in PTVS Concord debugger is as follows:
  • Python source files are reported as custom modules that are associated with Python language & runtime (see ModuleManager and CreateModuleRequest classes)
  • a component is registered for various IDkmSymbol* interfaces, filtered by LanguageId (see .vsdconfigxml and ModuleManager) - that handles FindDocuments requests to enumerate documents corresponding to a given module, and FindSymbols to map a text span in a given document to a custom instruction address (which is done by simply encoding the span data for Python); that instruction is associated with the Python runtime
  • when a user breakpoint is created, VS will use FindDocuments+FindSymbols to obtain the instruction address, and will then pass it to the component implementing IDkmRuntimeMonitorBreakpointHandler for the Python runtime (see RemoteComponent and TraceManager)
Regarding the implementation of breakpoints, Concord has two kinds of them - high-level ones, which are normally created by the user and show up as distinct entities in VS (Breakpoints window, text editor marging etc); and low-level ones, which are not visible to the user, but which pause debugging and raise the corresponding event when hit. Normally, high-level breakpoints for other languages are implemented in terms of low-level native breakpoints. There are two different event interfaces for these - the one you want to handle low-level breakpoints is IDkmRuntimeBreakpointReceived. Then, when you get that notification, you signal OnHit on the corresponding high-level breakpoint that you own.
Jul 7 at 9:37 PM
Hi Pavel,
I just come up with one more question. So, there are lots of interfaces in Concord that I could implement for my own DE extension. But in some cases, I would like to ask the default concord module to do the job. How could I implement the interface, but pass the control to the lower (or upper) component in the DE? (this implementation looks like a wrapper).

I can see in PythonTools that there are several sendLower and sendHigher functions to implement message passing between components. But my case is that I need to pass the information to an unknown component. And I am not sure which message type it accepts. So, I am wondering how may I do that.

Thanks a lot.
Coordinator
Jul 8 at 6:47 PM
I don't think there is a generic mechanism for this. In many cases, returning E_NOTIMPL (or throwing NotImplementedException if you're using managed code) seems to be achieving the desired effect. In others, you might have to manually call the corresponding client API on the native runtime.