Bug: Locals Window does not expand or show a dictionary item correctly.


[Edited:] I'm rewriting the entire issue, see my third Comment for January 1 below.
In the Locals window (or Watch or other tree-like displays), if I have a dict and a key is an object that I defined, and the value is some expandable object, then the value will not expand. When clicking on the + symbol, the + symbol disappears.

I've put some print statements into the debugger script to see what is happening. The debugger is being asked to report on the children of the dictionary item's value. In an example, I type
{inst.sched: inst} into the Watch window. Expanding the dictionary shows
+ [<Sched.MySched object at 0x0000000002778940>] Instance 1
When I click on the + symbol, the debugger is called to expand the object
{inst.sched: inst}[<Sched.MySched object at 0x0000000002778940>]
Python cannot understand this expression (I tried typing this string into the Watch window and I got parsing errors).

I don't know anything about the interface with VS. But I see from the debugger script that the child of the dictionary is being specified by a child index (the key), the child repr(), and the type name. All of these are strings that appear in the Watch window. It is also given a flag for being expandable (I assume this determines the existence of the + symbol), and two flags called indices_are_enumerate and indices_are_index. These last two flags apply to the entire set of indices.

Perhaps if the indices_are_index is set, then the call to expand the child would look like {inst.sched: inst}[0] instead, and then the debug script would just enumerate the dictionary items and get the desired item. However, what would happen if the dictionary changed after showing the dictionary expansion and before requesting the item expansion? Is there a mechanism for updating the expansion of the dictionary every time the program breaks?


mrolle wrote Jan 4 at 8:31 AM

On further investigation into the interaction between the IDE and the Python process, I see that the IDE really knows nothing about any debug property other than (1) name / value / type strings, (2) whether expandable or not, and (3) an interface to get the property's children, among other things.

This interface resides in the PythonProcess.cs module, and can be implemented any way you like.

So I suggest that PythonProcess can record a dict item as numeric index into the dict's items() value, while at the same time telling the IDE to show the name as [<object repr>].

By the way, how does the IDE discover changes to displayed values after, say, stepping through the current frame? Does it request all the values again? I assume this is the case, but then I wonder about values whose evaluation has side effects. If PythonProcess can report that expanding children had a (possible) side effect, does this tell the IDE to avoid expanding it again, and show the 'refresh' icon for that value?

I don't suppose you could ever be sure that evaluating any expression, such as expanding children, is free of side effects. Especially if there is non-Python code involved. I guess you take the position that displaying a local variable or an attribute of an object is usually safe.

mrolle wrote Jan 4 at 8:19 PM

Here's a related issue.
If I do Add Watch on a dict item from the context menu, the new Watch item also has the name like {inst.sched: inst}[<Sched.MySched object at 0x0000000002778940>]
This displays, of course, as an Error.
I assume that the IDE gets this name from the Interface that represents the property that I click on. So this has to be fixed as well.
I want this item to show up in the Watch window with a name that is a valid Python expression. Since the Python side knows this item to be the first item in the dict, the expression {inst.sched: inst}.values()[0] works fine.
Now if the key is known to be reproducible by its repr(), such as an int or str, then the current method of appending '[key]' works just fine, and should be used as it's easier to understand in a Watch window.
I had thought of inventing some new syntax for the expression, one that would be parsed by the debugger script, but that seems like a bad idea, since it would have to be added to the PTVS documentation and would not make sense to the user who would look at it in the Watch window.

mrolle wrote Jan 4 at 10:49 PM

Replacement for all of the above...
There are various problems in the Locals, Watch, etc. windows for a child of a dict, when the key is not a self-describing value such as an int or str.
For example, I type {inst.sched: inst} into Watch. When I expand this item, it shows as
+       [<Sched.MySched object at 0x0000000002778940>]  Instance 1
Problem 1: If I click on the + sign, it does not expand, and the + disappears.
Problem 2: If I use the context menu 'Add Watch', it adds a line to the Watch window as
{inst.sched: inst}[<Sched.MySched object at 0x0000000002791940>]    Error: unexpected token '<'
unexpected token 'Sched'
unexpected token 'Sched'
unexpected token '.'
In looking at the debugger source code and the VS debugging extensions docs, I see that these values that show up in Locals, Watch, and other windows are presented to the IDE as an IDebugProperty2 object. This interface tells the IDE how to enumerate the children, and information such as name, full name, type, and value.
I gather that the full name would be used by the IDE to copy the item to the Watch window. I also gather that the full name is passed to the Python process in requests to evaluate, or enumerate children.
Proposed solution:
Given the above example, the property name will remain as "[<Sched.MySched object at 0x0000000002778940>]". However, the property full name will now be {inst.sched: inst}.items()[#], where # is replaced by the index of the item in items().
This change should be made only when the key is not self-describing, such as an int or str.
The interface between write_object() in the debugger .py script and ReadPythonObject() in PythonProcess.cs needs to be changed, so that the debugger can indicate three options for computing the full name from the parent's full name: (1) append '.' + child name, (2) append child name (which already contains the []'s), or (3) append something else (such as '.items()[#]').

Also, I think that the indices_are_enumerate parameter works properly. The effect is that when the children are given as a numeric index, and one of these children, say C, is later asked for its children, C's full name is stripped of the ending [#], this name is evaluated to some iterable, and then the iterable is called the given number of times to get the [#] value. The problem is that this trick is only used at the first level of child enumeration for C. The [#] in the name of C will not be magically fixed when copying C to a Watch window, or in enumerating children of one of C's children.

pminaev wrote Jan 8 at 10:17 PM

The catch here is that we don't know which key types are not self-describing (i.e. for which ones, their repr() is not a valid expression).

Also, while indexing by position will work for immediate expansion, it will be very fragile for Watch when stepping - since there's no guarantee on the ordering of items in a dict, merely adding or removing a single item can completely change the order in the dict, making the watch point to a completely different item.

pminaev wrote Jan 10 at 1:08 AM

One thing that we might want to consider is devising some generic way of referring to arbitrary objects. C# debugger has the feature called "Object ID", where, once you have some object as a result of evaluation in Locals, Watch etc, you can request a numeric ID for it, and refer to it by that ID in any future expressions, so long as it is still alive. The feature is actually a generic debugger feature that can be implemented by any debugger; we just don't support it yet. But if we did, then we could generate such IDs for keys, and then use them in full names etc.

How to do so is another question. The standard id() function gives what is effectively the address of an object on CPython, and ctypes can be used to get the actual object back. But it's dangerous because it's not a strong reference, and the object can be deleted before the next evaluation of an expression containing an object ID. I don't see any reliable way to determine whether a given ID is dangling or not.

We could also maintain our own registry of objects for which IDs have been issued in the debugger script. The catch there is that we can't use weakrefs because not everything is weakly referenceable, and strong refs would prevent those objects from being deleted when actual code using them expects them to.

pminaev wrote Jan 15 at 1:29 AM

So after some experimenting, I decided to use indexing by position after all. Reasoning being that: 1) we already use that for sets and other things that are enumerable but not indexable, 2) anything that might cause the value of dictionary to be recomputed will refresh Locals anyway, 3) stale Watch entries are better than not expanding dicts at all, and 4) I don't see any better options.

To avoid unnecessary copies (which are going to be very expensive for large dicts), I'm going to use viewitems() rather than items() where available - i.e. 2.7+. Also, the code that generates the expression for such an index is the generic one that's shared by all non-indexable iterables. The expression then looks like so (given dictionary dd, and the second item in it):
next((x for i, x in enumerate(dd.viewitems()) if i == 2))
Rather than keep piling on flags for various ways of computing the child expression, I'm going to make the debugger script report both child name and child expression to VS, instead of the latter computing it from parent expression + name. This way we can have the debugger return all kinds of fancy expressions as children, such as e.g. len(). In particular, I'm using it to insert a node representing items() / viewitems() on all dict instances. This is handy as it allows to expand keys in Locals, and not just values.

mrolle wrote Jan 17 at 8:08 AM

There's a similar situation in C++ when viewing various types of containers. If I expand a map<,> instance, I get what looks like an array with [index] names. Each value further expands to a [0] and [1] which are the key and the value. The expression for [index], as you would see if adding it to the Watch window, gets very long for a large index; however, it does work correctly. This is analogous to what a Python items() node would look like.

I like your next(...) expression, because it doesn't actually take up a lot of storage. I had in mind an expression such as list(dd)[2] for your example above, but that would make a list of the entire dict.

The next(...) solution should also work for other kinds of iterables.

mrolle wrote Jan 17 at 8:22 AM

Regarding keeping an ID of dictionary keys, which are not self-describing...
  1. The debugger script could determine if the key's __repr__() evaluates to the same object, simply by doing the evaluation and see if gives a different value or an exception. So having an alternative expression for the dictionary value only has to be done in the (less frequent) case that the key is not a number or a string or some such.
  2. I think the debugger could possibly keep a registry of objects currently on display that you identify by their ID. I'm assuming that the script gets notifications about what is visible in the IDE. Certainly, items visible in the Locals window could be flushed whenever switching to a different frame. I'm not sure about expressions in the Watch windows, however. It may be that the debug script doesn't have any way of knowing what's in the Watch windows, and all it gets is requests to evaluate expressions. It seems that anything that is currently visible in Locals or Watch, including expansions as currently chosen by the IDE user, will be evaluated every time the program breaks. So you could drop an ID from the registry if you don't see it requested the next time around.

mrolle wrote Jan 17 at 8:33 AM

pminaev, your proposed solution also fixes another problem, that I haven't reported yet...
Let's say I have a set s, which expands to [2], etc. I can expand item [2] just fine. However, if, say [2] is some object o with an attribute o.a, then I cannot expand o.a. This is because the debug script gets a request to enumerate the children of s[2].a, which is not the right expression. An expression like next((x for i, x in enumerate(s) if i == 2)).a will work just fine.
I'll be glad to see this fixed in the future. My workaround for debugging the set s is to make it a list instead of a set (temporarily) or watch list(s).

pminaev wrote Jan 17 at 8:43 AM

The way it figures out if it can rely on repr() is not is pretty similar, except that rather than trying to compare the object to its eval(repr()), it uses the latter to index into the collection, and then compares the result of that indexing with the original object (using == rather than is). This way we also use the next() workaround for collections which are indexable but not with usual stable semantics.

For perf - since eval is rather expensive - we also hardcode a list of types which are known to round-trip - int, bool, str etc. We also have special code for known collection types like tuple and frozenset where we know that they round-trip if their elements round-trip; so this avoids eval for tuple-indexed dictionaries, as well.

pminaev wrote Jan 17 at 8:47 AM

And yes, we could maintain a registry of objects that are referenced in one way or another by entries in Locals, Watch etc. We don't actually get notifications for when things are visible or not, but we can find out when they are gone - every such value is represented by an IDebugProperty object, and debugger releases its reference when it wants to get rid of one; we could clean up the referenced objects then. The only problem is that there's no way to easily hook up "last COM reference to this managed object is gone" in .NET/COM Interop. We'll need to play tricks with COM aggregation to make this work.

On the other hand, it's actually useful for more than just object IDs. If we avoid re-evaling parent expression as part of the child every time we expand, and instead save the resulting object in the registry and get it from there to read its children, we can actually support things like expanding children of children of a one-off non-repeatable function call or an iterator.

pminaev wrote Jan 17 at 8:49 AM

C++ gets this all a bit easier because they are always walking the physical structure of the objects in the end. When you look at a map or a vector, the fancy element-by-element visualization is actually done on VS side (search for "natvis" for the gory details), but what they get from the debuggee is the raw object. So they always have a reliable way to identify and name something in memory, if it can be accessed by some means. The ultimate fallback for them is just using the raw pointer value.

mrolle wrote Jan 18 at 1:57 AM

next((x for i, x in enumerate(dd.viewitems()) if i == 2))
It seems to me that dd.iteritems() is just as efficient as dd.viewitems(). They both produce an object (the types are different) which is iterable. The difference is that iteritems() is an iterator itself, rather than an object that has an iterator, but since you are only using it once, it shouldn't matter.

mrolle wrote Jan 18 at 2:29 AM

I wanted to amend previous comment -- only the first two paragraphs are a quote, the rest is my comment. But CodePlex only lets me edit the original issue or add a new comment. Can you make the change for me, and delete this comment, if you have a way of doing so; that would be nice.

pminaev wrote Jan 18 at 3:04 AM

Yes, you're right - iteritems() is better, and will work the same on all supported Python 2.x versions. I'll make that change.
It sounds like since an IDebugProperty can be anything you want it to be, it doesn't have to correspond to a valid Python expression at all. An operation like Add Watch will put a reference to the same IDebugProperty object in the Watch window (is this true??), so that case is handled. But I guess you still want to have an equivalent Python expression if possible if the user wants to copy the text out of the Watch window and paste it back in later, or edit the expression.
IDebugProperty is basically what we return to VS in response to 1) enumerating locals, 2) enumerating children of another property, and 3) evaluating an expression (user input in Watch & Immediate windows, but also e.g. data tips - when you hover over some identifier in the editor). The lifetime of the property object is tied to its use by VS - when it's gone, we know that VS has no more need for it, e.g. because the watch expression is deleted, the data tip is hidden etc.

Adding a watch does not reference the same IDebugProperty. Rather, it asks it for its "full name", and then asks the debugger to create a new property instance for the same full name. Normally, that full name is the expression that is used to compute the value of the property, but it doesn't have to be. We can even have a property with a blank full name - the only catch with such things is that you won't be able to use Add Watch on them (well you will, but it'll just add a blank string).

What I was proposing above, however, would not require us to have expressionless properties. It just means that once the property is created, we don't re-evaluate its backing expression to retrieve its children, but rather just re-read that object from the registry. It still wouldn't let you do things like creating a watch for an element produced by an exhausted iterator, because that watch would be a separate property and would have to re-eval. But it would let you expand that element to see its children, and expand those children etc. Right now, when an expression is a one-off, you only get its repr(), but not its children, because the latter already require a re-eval.