Trail Mapping with Python

Table Of Contents
Trail Mapping with Python
Introduction
In the world of Data Science, I'm most strongly drawn to geographically-linked data. Choropleth maps, for example, are about the most powerful way to convey statistical information and get a point across, if you will forgive the intolerable pun. Attaching data to geography somehow makes information more relatable, more personal, and more easily absorbable. It gives people a reference point, a "you are here", if you will, or maybe "There, but for the grace of God...".
I am fortunate to live in New Mexico where I can take beautiful and varied walks, hikes and bike rides nearly every day, which I do. I record most of my excursions with an app on my phone, and I realized recently that I must have lots of data that I can play with to do mapping and analysis. I was pleased to discover that the simple app I use on my phone to track my excursions can easily export the data to GPX format. I had no idea what GPX format was then, but assumed it was some standard, so things looked promising, and off I went.
The data I use comes from an app called SportActive. Apps like AllTrails and Strava are wonderful, especially when exploring new places, but they are overkill IMO for simple tracking of daily, and largely repetitive, activities. SportActive simply records my walks and rides, without asking if I want to share my walk with my "friends" like a meal on Facebook. (Disclaimer: I have no financial relationship with SportActive, although if you could arrange such a thing I'd happily change this disclaimer.)
In this and two following articles, I will show how the data can be used with Python to map and analyze the GPS information. I will show how to do analyses such as profiling elevations, calculating speeds and durations, identifying pauses, and segmenting paths based on various criteria. This type of analysis, which I'm doing for fun, is the very same as could be used to, for example, study bird migrations or the movement of container ships.
Python provides many libraries based around the pandas ecosystem which make working with geospatial data easy. GeoPandas extends Pandas to incorporate geometries and coordinate reference systems. GPS data is a series of geolocated points, easily handled by GeoPandas. MovingPandas, another library, facilitates turning the point geometries into "trajectories", allowing for calculations of speed, duration and direction.
This article will cover parsing the raw GPS data to a csv file, which I then import into a geopandas GeoDataFrame
. From that I will create maps and generate some basic statistical information about the trek such as distance and elevation profiles. These articles assume basic familiarity with Python, experience with Pandas being helpful. An expanded version of the code in the articles can be obtained from my GitHub repository at https://github.com/bisotty666/GPX.
XML to CSV
The GPX file turns out to be an xml file with standardized tags. Most importantly, it contains a list of tags called trkpt
which, as the name implies, are points along the trek, each recording the geographic location in latitude and longitude, together with the elevation and the time of the recording. An example is
<trkpt lat="41.043858" lon="-74.063712">
<ele>79</ele>
<time>2024-09-05T15:25:13.858Z</time>
</trkpt>
This format cannot be directly imported into a GeoDataFrame
, but csv
files can be, and fortunately, there is a Python library called BeautifulSoup that can not only efficiently parse HTML, it can also process XML files, which I can then write to a csv
file. So I'll start by importing it as well as the csv
library, and "make the soup":
from bs4 import BeautifulSoup
import csv
with open(gpx_file, 'r') as f:
contents = f.read()
soup = BeautifulSoup(contents, 'xml')
I'll grab the elements of interest, as well as some metadata, with:
tracks = soup.find_all('trkpt')
elevations = soup.find_all('ele')
times = soup.find_all('time')
temp = soup.find('s2t:temperature').text
weather = soup.find('s2t:weather').text
name = soup.find('name').text
I expect to concatenate a bunch of these files, so I'll also add the filename itself as a column:
sf_name = os.path.splitext(gpx_file)[0]
id = os.path.split(sf_name)[1]
csv_file = sf_name + '.csv'
Then I make a list of the points with each of these elements connected with their geographic coordinates:
data = []
for track, elevation, time in zip(tracks, elevations, times):
latitude = track['lat']
longitude = track['lon']
elevation_value = elevation.text
time = time.text
data.append([
id, latitude, longitude, elevation_value,
time, temp, weather
])
And now I can write the CSV file:
with open(csv_file, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Id', 'Lat', 'Lon', 'Elev', 'Time',
'Temp', 'Weather'])
writer.writerows(data)
Since I want to process a lot of these files, as well as concatenate them, I put this together in a script for batch processing, which I'll add at the end.
Importing into Geopandas
With the data in a csv file, I am ready to make the GeoDataFrame
. First I'll import the libraries I'll be using in this section:
import pandas as pd
import geopandas as gpd
import contextily as ctx
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.nonparametric.smoothers_lowess import lowess
from shapely import LineString
I'll use pandas to import the csv
file to a pandas DataFrame
:
trek_df = pd.read_csv('data.csv')
trek_df.iloc[0]
Id wo-2024-09-05
Lat 41.043858
Lon -74.063712
Elev 79
Time 2024-09-05T15:25:13.004Z
Temp 20.51
Weather 0
Name: 0, dtype: object
Then I'll create a GeoPandas GeoDataFrame
using the lat/lon coordinates and an appropriate coordinate reference:
trek_gdf = gpd.GeoDataFrame(
trek_df,
geometry=gpd.points_from_xy(x=trek_df.Lon,
y=trek_df.Lat)
).set_crs(4269)
trek_gdf.info()
<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 650 entries, 0 to 649
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 650 non-null object
1 Lat 650 non-null float64
2 Lon 650 non-null float64
3 Elev 650 non-null int64
4 Time 650 non-null object
5 Temp 650 non-null float64
6 Weather 650 non-null int64
7 geometry 650 non-null geometry
dtypes: float64(3), geometry(1), int64(2), object(2)
memory usage: 40.8+ KB
That looks good, with no null values. The Time
should be a datetime
type, so I'll go ahead and change that:
trek_gdf['Time'] = pd.to_datetime(trek_gdf.Time)
OK, let's make the first map. I'll use matplotlib
together which contextily
, a library which provides a wide variety of base maps with ease:
import matplotlib.pyplot as plt
import contextily as ctx
f, ax = plt.subplots(figsize=(10,10))
trek_gdf.plot(ax=ax, alpha=0.3, markersize=4, c='b')
ctx.add_basemap(ax, crs=trek_gdf.crs,
source=ctx.providers.OpenStreetMap.Mapnik);

Sweet. In order to do calculations, however, I'll need to project the data to a crs appropriate for the location, which is New Jersey. A quick visit to epsg.io
suggests EPSG:32111
, so I'll use that. It's in meters, not feet, which I personally prefer anyway.:
trek_proj = trek_gdf.to_crs(32111)
f, ax = plt.subplots(figsize=(10,10))
trek_proj.plot(ax=ax, alpha=0.3, markersize=4, c='b')
ctx.add_basemap(ax, crs=trek_proj.crs,
source=ctx.providers.CartoDB.Voyager);

Elevations
Let's profile the elevations for the walk. First, some basic statistics:
line = LineString(trek_proj.geometry)
print(f'''
The walk was {line.length / 1000:.1f} kilometers
Elevations:
Initial: {trek_proj.iloc[0].Elev} meters
Final: {trek_proj.iloc[-1].Elev} meters
Highest: {trek_proj.Elev.max()} meters
Lowest: {trek_proj.Elev.min()} meters
''')
The walk was 7.0 kilometers
Elevations:
Initial: 79 meters
Final: 22 meters
Highest: 81 meters
Lowest: 11 meters
And a basic plot:
f, ax = plt.subplots(figsize=(9,5))
ax = trek_proj.Elev.plot()
ax.set_title(f'''
Elevation profile for {trek_gdf.Name[0]} on {trek_gdf.Time[0].strftime('%x')}
''')
ax.set_ylabel('meters');

Using Seaborn, I can use regression to smooth this out as much or as little as I want:
sns.set_theme()
plot = sns.lmplot(data=trek_proj, x='Point', y='Elev',
order=4, truncate=False, height=8, aspect=1)
plot.figure.subplots_adjust(top=0.9, left=.11)
plot.fig.suptitle(f'''
Smoothed profile for {trek_gdf.Name[0]} on {trek_gdf.Time[0].strftime('%x')}
''', fontsize=18);

And here's smoothing with lowess
:
sns.set_theme(rc={'figure.figsize':(6,6)})
plot = sns.lmplot(data=trek_proj, x='Point', y='Elev',
lowess=True, truncate=False,
height=8, aspect=1)
plot.figure.subplots_adjust(top=0.9, left=.11)
plot.fig.suptitle(f'''
Lowess smoothed profile for {trek_gdf.Name[0]} on {trek_gdf.Time[0].strftime('%x')}
''', fontsize=18);

Next steps
I naturally want to be able to make calculations of speed and distance, identify pauses, and do other exploration. Starting from discrete points, the steps to do so manually would be simple but exceedingly tedious. Fortunately there is a wonderful library called movingpandas
which makes these things all very simple. I'll explore that in the next articles.
I'll go ahead and save the GeoDataFrames
for future use:
trek_gdf.to_file('data/trek_gdf.gpkg')
trek_projected.to_file('data/trek_projected.gpkg')
Code
The code for this article is available in my Github Repository in the form of a Jupyter notebook. Here is the final script for batch processing the files:
'''
Convert a directory of gpx files to csv.
The script captures trek points with coordinates,
elevation and time stamp as well as trek name and
weather conditions.
The script requires an input directory path.
'''
from bs4 import BeautifulSoup
import csv
import sys
import pathlib
import os
if len(sys.argv) < 2:
sys.exit("Must supply an input directory")
inPath = sys.argv[1]
files = [os.path.join(inPath, f)
for f in os.listdir(inPath)
if f.endswith(".gpx")]
cfile = str(os.path.join(inPath, 'combined.csv'))
with open(cfile, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['Id', 'Lat', 'Lon', 'Elev', 'Time',
'Temp', 'Weather'])
for gpx_file in files:
with open(gpx_file, 'r') as f:
contents = f.read()
soup = BeautifulSoup(contents, 'xml')
tracks = soup.find_all('trkpt')
elevations = soup.find_all('ele')
times = soup.find_all('time')
temp = soup.find('s2t:temperature').text
weather = soup.find('s2t:weather').text
name = soup.find('name').text
sf_name = os.path.splitext(gpx_file)[0]
id = os.path.split(sf_name)[1]
csv_file = sf_name + '.csv'
data = []
for track, elevation, time in zip(tracks,
elevations,
times):
latitude = track['lat']
longitude = track['lon']
elevation_value = elevation.text
time = time.text
data.append([
id, latitude, longitude, elevation_value,
time, temp, weather
])
with open(csv_file, 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Id', 'Lat', 'Lon', 'Elev',
'Time', 'Temp', 'Weather'])
writer.writerows(data)
with open(cfile, 'a', newline='') as f:
writer = csv.writer(f)
writer.writerows(data)