Pyvot – A Python to/from Excel Connector

Introduction

Pyvot connects familiar data-exploration and visualization tools in Excel with the powerful data analysis and transformation capabilities of Python, with an emphasis on tabular data. It provides a minimal and Pythonic interface to Excel, smoothing over the pain points in using the existing Excel object model as exposed via COM.

Examples

We’ll now present a quick tour of Pyvot’s key features and workflow. For a more thorough treatment, make sure to check out the full Pyvot tutorial.

Python to Excel and back again

Pyvot enables easy transit of data between Python and Excel.

For example, we can move a list of Python values to Excel, view and manipulate the data, and retrieve the new version. In a symmetric and equally common usage, we can grab and process Excel values in Python, and display the result to Excel.

>>> xl.Workbook() # Create empty Excel workbook, stored as the 'active' workbook
>>> a = range(1,10)
>>> a
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> xl.view(a) # show Python enumerable in Excel (active workbook)

The Python array is copied into Excel. By default, Pyvot picks an open range on the active worksheet.

Retrieving data from Excel requires an xl.Range object, which represents the cells on which to operate. The xl.view function returns the range that it selected to store the data:

>>> r = xl.view(range(1,10))
>>> r
<ColumnVector range object for $A$2:$A$1048576 (visible only)>
>>> r.get()
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
>>> sum(r) # ranges are iterable
45.0

Note that view chose the entirety of column A (sans header) to store our list, but Range.get assumed we only wanted its useful data.

Now we can manipulate the Excel range, and fetch the new results back. For example, suppose we delete some rows using the Excel UI and change the values like so:

>>> r.get() # get updated values
[1.0, 20.0, 8.0, 9.0]

The current selection is also available as a Range. This allows Excel to act as an interface for selecting interesting data:

>>> xl.selected_range().get()
8.0

Mapping Excel columns

You can also run map operations on Excel data. This can be useful for writing Python functions to clean the data and return new computed columns.

For example, suppose we want to combine the city and state columns in a single column:

>>> def Merge(city, state):
...     return city + ", " + state
...
>>> wCities = xl.Workbook("cities.xlsx")
>>> xl.map(Merge, wCities.get("City"), wCities.get("State"))
<ColumnVector range object for $N$2:$N$51 (visible only)>

Note that we retrieved Range objects for the table columns by name. Pyvot can obtain ranges for table columns, named ranges, and plain A1:B2-style references.

xl.map works just like the built-in map function, but inserts the results as a new column in Excel. It will infer a column name from the function name, and it will pick a column related to its arguments. Note below that “Merge” becomes a new column in the source table:

Table joins

Pyvot supports join operations on Excel tables. xl.join(rangeDest, rangeSrc) takes in 2 Ranges. It infers the containing tables for each range. For each value in rangeDest, it will do the lookup in rangeSrc and merge the row into the table in rangeDest.

For example, suppose we want to city crime data from in “cities.xlsx” with population data from “population.xlsx”.

First we load (or connect to) the workbooks containing tables we want to join:

>>> wCities = xl.Workbook(“cities.xlsx”)
>>> wPop = xl.Workbook("Population.xlsx")

Then we compute a primary key column, if necessary. In this case, we’re joining on multiple columns (both city and state), so we’ll merge them into a single column for the join:

>>> xl.map(Merge, wPop.get("City"), wPop.get("State"))
<ColumnVector range object for $D$2:$D$51 (visible only)>

The same operation is then applied on wCities.

The two mapped columns act as keys for the join:

>>> xl.join(wCities.get("Merge"), wPop.get("Merge"))

Destination table

Source table

Destination table after join

Installation

Pyvot requires CPython 2.6 or 2.7 with the Python for Windows extensions (pywin32) installed, and at the minimum, Office 2010.

If a clean Python session can import the win32com module, Pyvot is ready to be installed:

PS C:\> python
Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
>>> import win32com
>>>

Installing from PTVS

Python Tools for Visual Studio now ships an add-on for Pyvot integration. This add-on includes a Pyvot project template as well as support for installing Pyvot into multiple interpreters.

First, install Pyvot into the desired interpreters from the Tools menu (keep in mind that Pyvot requires CPython):

Then, create a new Pyvot project:

The new project will need to be configured to use an interpreter for which Pyvot has been installed.

Installing with setuptools

If you have setuptools installed, you can install a source or binary distribution of Pyvot with easy_install. easy_install is usually in <Python directory, ex. C:\Python27>\Scripts

easy_install Pyvot

easy_install path\to\file

Last edited Apr 27, 2015 at 11:31 PM by crwilcox, version 15

roofrins Feb 15, 2014 at 4:46 AM
I was interested in creating a function in Excel that would allow you to right-click a cell, and choose an option to create a pop-up "window-of-cells" that would have a specifiable number of rows and columns, and which could return, say, a summation of some of those values back into the reference cell. In plain English, the point is, if I'm recording a roofing job, and I have one cell for the "material cost," I'd like to be able to "open that cell up" to see all the purchases that go toward that total. Then, I'd also like, within that "pop-up," to see all the items purchased on one single purchase, showing up as a secondary "pop-up." So basically, it would be a cascading, 3D-sort-of-effect, allowing you a lot more power in Excel.

For a few hand-drawn sketches of what I'm talking about, you can go to the following pictures:

Anyway, I'm having all the trouble in the world trying to just find the code behind the actual Excel program (not the cheesy VBA thing that lets you "code" for some cheesy math formula, but the actual code that dictates that the rows and columns and header and buttons all show up on your computer). Without knowing how to get to this, how can I change it??

Any suggestions appreciated! caldwbr@gmail.com brad@roofrinserun.com www.RoofRinseRun.com

amarcuvitz Nov 4, 2013 at 8:39 PM
I’m doing analytics on a live large production db. I can’t create tables or place much additional load on the db. For simple things I have used MS Query into Excel which works fairly well and creates dynamically sized tables that drive charts directly the way one would want (no edits required of the spreadsheet). Python does a great job of doing analysis that would make you cross-eyed if tried it in SQL. Pyvot is perfect for updating data to Excel to exploit all its display sophistication. You define the charts you like with “fake” data in Excel and then use them as display templates to be updated by Pyvot. This is very a very attractive way to isolate the presentation task from the number crunching task.

What is missing is a way to for Pyvot to write a Tables of varying size to Excel. Pyvot permits writes to Tables as a named range but the Table size can’t be changed based on, say, the number of rows that came back from the db. This leads you to size your Table to fit the largest number of rows you expect thus breaking the nicely designed link between Table size and chart data range that normally charts dynamically sized Tables perfectly. Something is needed like xl.Workbook(“foo.xlsx”).range(“ExcelTable”).set(PythonTableData, autosize=True) where the optional second argument of .set specifies that ExcelTable should be expanded/contracted to fit the dimensions of PythonTableData.

I am trying a workaround using a formula in a name definition ( chart_data_range = OFFSET(‘DBdata’!$A$1, 0, 0, RowsFormula, Columns) ) and using that in the chart_data_range box. It’s a hack and seems to reset data series labels on the chart. I will probably find a way to get it to work but it has been much more effort than anyone would care to go through. You asked for feedback on Pyvot which I think is great. Adding the autosizing of tables or some way to set Table size would really make Pyvot and Excel quite valuable for displaying variable size data.

arulprakash Mar 16, 2013 at 3:50 AM
A worthier altrnative to xlutils. :)

caoimhinp May 11, 2012 at 10:29 PM
I figured out that I can do any of the below:

# sets the active worksheet by name - Activate() or Select() works.
wb.xlWorkbook.Sheets('Julia').Activate()

# sets the active worksheet by cardinal - Activate() or Select() works.
wb.xlWorkbook.Sheets[1].Select()

The below, however, doesn’t work. The active sheet trumps it. No error.
wb.get(r"'Julia'!A1:A5").get()

caoimhinp May 10, 2012 at 5:44 PM
Questions...

It doesn't look like the API supports changing the active sheet except by actually changing the active sheet in a running Excel instance. Am I missing something? I see how to get the name of the active sheet. I can also see how to get the tables in a sheet but not how to use them: xl.sheet.Worksheet(xlSheet)

Also, I don't understand the intention behind the namespace, xl.sheet. It seems backwards i.e. xl.sheet.Workbook() is reversed intuitively. It is also strange since I can't specify the sheet I am using. What is the intention here?

Thanks for this. Overall the concept is awesome and I'm looking forward to continued development. I'd offer to help but I am not a COM guy... therefore my extreme pleasure at finding this project.