Friday, December 11, 2009

Processing ESRI Shapefiles with Python

For the last few days, I've been working on translating data between an ESRI ArcSDE instance and KML for display within Google Earth. For 'point' and 'line' feature classes this was a fairly easy task, but translating 'polygon' feature classes turned out to be more hassle than I was expecting, so I thought I'd share my experiences. I was working with an ArcSDE 9.2 instance, but I am quite confident that very little is different, in terms of the geometry schema, in ArcGIS 9.3 software.

The ArcGIS geometry storage schema is quite complex, so I will try to restrict this post to the fundamental concepts. Basically, accessing geometry proceeds as follows:

Each feature class is made up of many rows.
Each row has an associated geometry.
A geometry may consist of several parts.
Each part is composed of either a single point or an array of points
Each point has X, Y and Z values.

Additionally, the geometry type can be identified by the 'Type' attribute of the geometry class.

The full geoprocessor model for ArcGIS 9.3 can be found here

Now, single point feature classes are easily processed as follows:


rows = gp.SearchCursor(myFC)
thisRow = rows.next()

while (thisRow):
rowGeom=thisRow.getvalue("SHAPE")
for part in range(rowGeom.partcount):
geomPart=rowGeom.getPart(part-1)
thisPoint=geomPart.get(0)
thisRow=rows.next()


Extracting single lines is a bit trickier, as you must process the array of points:


rows = gp.SearchCursor(myFC)
thisRow = rows.next()

while (thisRow):
rowGeom=thisRow.getvalue("SHAPE")
for part in range(rowGeom.partcount):
geomPart=rowGeom.getPart(part-1)
pointArray=geomPart.get(0)
thisPoint=pointArray.next()
while (thisPoint):
#Do some processing on thisPoint.
thisPoint=pointArray.next()
thisRow=rows.next()


Extracting polygons is where things get complicated, due to the way in which 'inner rings' are stored. Inner rings are linear rings that contain an area which is not to be included in the original polygon. Visually, these appear as 'holes' inside the polygon. An example of this is the polygon representing the political region of South Africa:



The empty area in the middle of South Africa is Lesotho. In this example the external boundary of South Africa forms the 'outer ring' and the border of Lesotho forms an 'inner ring'. The region bounded by the outer ring minus the region bounded by the inner ring(s) defines the polygon.

The ArcGIS schema stores polygon information in the geometry's array of points. As described above, the point array is made up of a list of points which can accessed using the iterator function 'next()'. For polygons, the list stores an initial 'outer ring' and optionally, several 'inner rings'. The outer ring is always the first contiguous list of points in the array. An inner ring can then be identified by a null point followed by the list of points forming a second linear ring. Additional inner rings may be added in the same way. This data structure dictates that processing polygons with inner rings can be done using the following logic:


rows = gp.SearchCursor(myFC)
thisRow = rows.next()

while (thisRow):
rowGeom=thisRow.getvalue("SHAPE")
for part in range(rowGeom.partcount):
geomPart=rowGeom.getPart(part-1)
pointArray=geomPart.get(0)
thisPoint=pointArray.next()
while (thisPoint):
#Do some processing on thisPoint.
thisPoint=pointArray.next()

#Checking if this polygon contains an inner ring.
if not thisPoint:
thisPoint=pointArray.next()
if thisPoint:
innerRing=True
#Begin point list for inner ring
thisRow=rows.next()



I will extend this logic code to create KML in my next post.

Cheers,
JamesO