Pyvot – A Python to/from Excel Connector

 

pyvot.screenshot

Click this to view: http://www.youtube.com/watch?v=Oi3QKuFugWk&hd=1 

Introduction

Pyvot connects familiar data-exploration and visualization tools in Excel with the powerful data analysis and transformation capabilities of Python, with an emphasis on tabular data. It provides a minimal and Pythonic interface to Excel, smoothing over the pain points in using the existing Excel object model as exposed via COM.

Examples

We’ll now present a quick tour of Pyvot’s key features and workflow. For a more thorough treatment, make sure to check out the full Pyvot tutorial.

Python to Excel and back again

Pyvot enables easy transit of data between Python and Excel.

For example, we can move a list of Python values to Excel, view and manipulate the data, and retrieve the new version. In a symmetric and equally common usage, we can grab and process Excel values in Python, and display the result to Excel.

>>> xl.Workbook() # Create empty Excel workbook, stored as the 'active' workbook
>>> a = range(1,10) 
>>> a
[1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> xl.view(a) # show Python enumerable in Excel (active workbook)

The Python array is copied into Excel. By default, Pyvot picks an open range on the active worksheet.

view 

Retrieving data from Excel requires an xl.Range object, which represents the cells on which to operate. The xl.view function returns the range that it selected to store the data:

>>> r = xl.view(range(1,10))
>>> r
<ColumnVector range object for $A$2:$A$1048576 (visible only)>
>>> r.get()
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
>>> sum(r) # ranges are iterable
45.0

Note that view chose the entirety of column A (sans header) to store our list, but Range.get assumed we only wanted its useful data.

Now we can manipulate the Excel range, and fetch the new results back. For example, suppose we delete some rows using the Excel UI and change the values like so:

view_post_delete

>>> r.get() # get updated values
[1.0, 20.0, 8.0, 9.0]

The current selection is also available as a Range. This allows Excel to act as an interface for selecting interesting data:

>>> xl.selected_range().get()
8.0

Mapping Excel columns

You can also run map operations on Excel data. This can be useful for writing Python functions to clean the data and return new computed columns.

For example, suppose we want to combine the city and state columns in a single column:

>>> def Merge(city, state):
...     return city + ", " + state
...
>>> wCities = xl.Workbook("cities.xlsx")
>>> xl.map(Merge, wCities.get("City"), wCities.get("State"))
<ColumnVector range object for $N$2:$N$51 (visible only)>

Note that we retrieved Range objects for the table columns by name. Pyvot can obtain ranges for table columns, named ranges, and plain A1:B2-style references.

xl.map works just like the built-in map function, but inserts the results as a new column in Excel. It will infer a column name from the function name, and it will pick a column related to its arguments. Note below that “Merge” becomes a new column in the source table:

alaska_map

 

Table joins

Pyvot supports join operations on Excel tables. xl.join(rangeDest, rangeSrc) takes in 2 Ranges. It infers the containing tables for each range. For each value in rangeDest, it will do the lookup in rangeSrc and merge the row into the table in rangeDest.

For example, suppose we want to city crime data from in “cities.xlsx” with population data from “population.xlsx”.

First we load (or connect to) the workbooks containing tables we want to join:

>>> wCities = xl.Workbook(“cities.xlsx”)
>>> wPop = xl.Workbook("Population.xlsx")

Then we compute a primary key column, if necessary. In this case, we’re joining on multiple columns (both city and state), so we’ll merge them into a single column for the join:

>>> xl.map(Merge, wPop.get("City"), wPop.get("State"))
<ColumnVector range object for $D$2:$D$51 (visible only)>

The same operation is then applied on wCities.

The two mapped columns act as keys for the join:

>>> xl.join(wCities.get("Merge"), wPop.get("Merge"))

alaska_join_city

Destination table

alaska_join_pop

Source table

alaska_join_final

Destination table after join

Installation

Pyvot requires CPython 2.6 or 2.7 with the Python for Windows extensions (pywin32) installed, and Office 2010.

If a clean Python session can import the win32com module, Pyvot is ready to be installed:

PS C:\> python
Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import win32com
>>>

Installing from PTVS

Python Tools for Visual Studio now ships an add-on for Pyvot integration. This add-on includes a Pyvot project template as well as support for installing Pyvot into multiple interpreters.

First, install Pyvot into the desired interpreters from the Tools menu (keep in mind that Pyvot requires CPython):

image

Then, create a new Pyvot project:

image

The new project will need to be configured to use an interpreter for which Pyvot has been installed.

 

Installing with setuptools

If you have setuptools installed, you can install a source or binary distribution of Pyvot with easy_install. easy_install is usually in <Python directory, ex. C:\Python27>\Scripts

  • To install the latest version from PyPI:

    easy_install Pyvot
  • To install an already-downloaded source (.zip) or binary (.egg) distribution:

    easy_install path\to\file

Last edited Nov 3, 2011 at 4:50 PM by ptools, version 10

Comments

arulprakash Mar 16 at 3:50 AM 
A worthier altrnative to xlutils. :)

caoimhinp May 11, 2012 at 10:29 PM 
I figured out that I can do any of the below:

# sets the active worksheet by name - Activate() or Select() works.
wb.xlWorkbook.Sheets('Julia').Activate()

# sets the active worksheet by cardinal - Activate() or Select() works.
wb.xlWorkbook.Sheets[1].Select()


The below, however, doesn’t work. The active sheet trumps it. No error.
wb.get(r"'Julia'!A1:A5").get()

caoimhinp May 10, 2012 at 5:44 PM 
Questions...

It doesn't look like the API supports changing the active sheet except by actually changing the active sheet in a running Excel instance. Am I missing something? I see how to get the name of the active sheet. I can also see how to get the tables in a sheet but not how to use them: xl.sheet.Worksheet(xlSheet)

Also, I don't understand the intention behind the namespace, xl.sheet. It seems backwards i.e. xl.sheet.Workbook() is reversed intuitively. It is also strange since I can't specify the sheet I am using. What is the intention here?

Thanks for this. Overall the concept is awesome and I'm looking forward to continued development. I'd offer to help but I am not a COM guy... therefore my extreme pleasure at finding this project.