Analysis
The analysis module provides a variety of functions to analyze the mobility datasets computed by trackintel.
Labelling
- trackintel.analysis.labelling.create_activity_flag(staypoints, method='time_threshold', time_threshold=15.0, activity_column_name='is_activity')[source]
Add a flag whether or not a staypoint is considered an activity.
- Parameters:
staypoints (GeoDataFrame (as trackintel staypoints)) – The original input staypoints
method ({'time_threshold'}, default = 'time_threshold') –
‘time_threshold’ : All staypoints with a duration greater than the time_threshold are considered an activity.
time_threshold (float, default = 15 (minutes)) – The time threshold for which a staypoint is considered an activity in minutes. Used by method ‘time_threshold’
activity_column_name (str , default = 'is_activity') – The name of the newly created column that holds the activity flag.
- Returns:
staypoints – Original staypoints with the additional activity column
- Return type:
GeoDataFrame (as trackintel staypoints)
Examples
>>> sp = sp.as_staypoints.create_activity_flag(method='time_threshold', time_threshold=15) >>> print(sp['activity'])
- trackintel.analysis.labelling.predict_transport_mode(triplegs, method='simple-coarse', **kwargs)[source]
Predict the transport mode of triplegs.
Predict/impute the transport mode that was likely chosen to cover the given tripleg, e.g., car, bicycle, or walk.
- Parameters:
triplegs (GeoDataFrame (as trackintel triplegs)) – The original input triplegs.
method ({'simple-coarse'}) –
The following methods are available for transport mode inference/prediction:
’simple-coarse’ : Uses simple heuristics to predict coarse transport classes.
- Returns:
triplegs – The triplegs with added column mode, containing the predicted transport modes.
- Return type:
GeoDataFrame (as trackintel triplegs)
Notes
simple-coarse
method includes{'slow_mobility', 'motorized_mobility', 'fast_mobility'}
. In the default classification,slow_mobility
(<15 km/h) includes transport modes such as walking or cycling,motorized_mobility
(<100 km/h) modes such as car or train, andfast_mobility
(>100 km/h) modes such as high-speed rail or airplanes. These categories are default values and can be overwritten using the keyword argument categories.Examples
>>> tpls = tpls.as_triplegs.predict_transport_mode() >>> print(tpls["mode"])
Tracking Quality
- trackintel.analysis.tracking_quality.temporal_tracking_quality(source, granularity='all')[source]
Calculate per-user temporal tracking quality (temporal coverage).
- Parameters:
df (GeoDataFrame (as trackintel datamodels)) – The source dataframe to calculate temporal tracking quality.
granularity ({"all", "day", "week", "weekday", "hour"}) – The level of which the tracking quality is calculated. The default “all” returns the overall tracking quality; “day” the tracking quality by days; “week” the quality by weeks; “weekday” the quality by day of the week (e.g, Mondays, Tuesdays, etc.) and “hour” the quality by hours.
- Returns:
quality – A per-user per-granularity temporal tracking quality dataframe.
- Return type:
DataFrame
Notes
Requires at least the following columns:
['user_id', 'started_at', 'finished_at']
which means the function supports trackintelstaypoints
,triplegs
,trips
andtours
datamodels and their combinations (e.g., staypoints and triplegs sequence).The temporal tracking quality is the ratio of tracking time and the total time extent. It is calculated and returned per-user in the defined
granularity
. The time extents and the columns for the returnedquality
df for differentgranularity
are:all
:time extent: between the latest “finished_at” and the earliest “started_at” for each user.
columns:
['user_id', 'quality']
.
week
:time extent: the whole week (604800 sec) for each user.
columns:
['user_id', 'week_monday', 'quality']
.
day
:time extent: the whole day (86400 sec) for each user
columns:
['user_id', 'day', 'quality']
weekday
time extent: the whole day (86400 sec) * number of tracked weeks for each user for each user
columns:
['user_id', 'weekday', 'quality']
hour
:time extent: the whole hour (3600 sec) * number of tracked days for each user
columns:
['user_id', 'hour', 'quality']
Examples
>>> # calculate overall tracking quality of staypoints >>> temporal_tracking_quality(sp, granularity="all") >>> # calculate per-day tracking quality of sp and tpls sequence >>> temporal_tracking_quality(sp_tpls, granularity="day")
Modal Split
- trackintel.analysis.modal_split.calculate_modal_split(tpls, freq=None, metric='count', per_user=False, norm=False)[source]
Calculate the modal split of triplegs
- Parameters:
tpls (GeoDataFrame (as trackintel triplegs)) – triplegs require the column mode.
freq (str) – frequency string passed on as freq keyword to the pandas.Grouper class. If freq=None the modal split is calculated on all data. A list of possible values can be found here.
metric ({'count', 'distance', 'duration'}) – Aggregation used to represent the modal split. ‘distance’ returns in the same unit as the crs. ‘duration’ returns values in seconds.
per_user (bool, default: False) – If True the modal split is calculated per user
norm (bool, default: False) – If True every row of the modal split is normalized to 1
- Returns:
modal_split – The modal split represented as pandas Dataframe with (optionally) a multi-index. The index can have the levels: (‘user_id’, ‘timestamp’) and every mode as a column.
- Return type:
DataFrame
Notes
freq=’W-MON’ is used for a weekly aggregation that starts on mondays.
If freq=None and per_user=False are passed the modal split collapses to a single column.
The modal split can be visualized using
trackintel.visualization.plot_modal_split()
Examples
>>> triplegs.calculate_modal_split() >>> tripleg.calculate_modal_split(freq='W-MON', metric='distance')
Location Identification
- trackintel.analysis.location_identification.location_identifier(staypoints, method='FREQ', pre_filter=True, **pre_filter_kwargs)[source]
Assign “home” and “work” activity label for each user with different methods.
- Parameters:
staypoints (Geodataframe (as trackintel staypoints)) – Staypoints with column “location_id”.
method ({'FREQ', 'OSNA'}, default "FREQ") – ‘FREQ’: Generate an activity label per user by assigning the most visited location the label “home” and the second most visited location the label “work”. The remaining locations get no label. ‘OSNA’: Use weekdays data divided in three time frames [“rest”, “work”, “leisure”]. Finds most popular home location for timeframes “rest” and “leisure” and most popular “work” location for “work” timeframe.
pre_filter (bool, default True) – Prefiltering the staypoints to exclude locations with not enough data. The filter function can also be accessed via pre_filter_locations.
pre_filter_kwargs (dict) – Kwargs to hand to pre_filter_locations if used. See function for more informations.
- Returns:
sp – With additional column purpose assigning one of three activity labels {‘home’, ‘work’, None}.
- Return type:
Geodataframe (as trackintel staypoints)
Note
The methods are adapted from [1]. The original algorithms count the distinct hours at a location as the home location is derived from geo-tagged tweets. We directly sum the time spent at a location as our data model includes that.
References
[1] Chen, Qingqing, and Ate Poorthuis. 2021. ‘Identifying Home Locations in Human Mobility Data: An Open-Source R Package for Comparison and Reproducibility’. International Journal of Geographical Information Science 0 (0): 1–24. https://doi.org/10.1080/13658816.2021.1887489.
Examples
>>> from ti.analysis.location_identification import location_identifier >>> location_identifier(staypoints, pre_filter=True, method="FREQ")
- trackintel.analysis.location_identification.pre_filter_locations(staypoints, agg_level='user', thresh_sp=10, thresh_loc=10, thresh_sp_at_loc=10, thresh_loc_time='1h', thresh_loc_period='5h')[source]
Filter locations and user out that have not enough data to do a proper analysis.
To disable a specific filter parameter set it to zero.
- Parameters:
staypoints (GeoDataFrame (as trackintel staypoints)) – Staypoints with the column “location_id”.
agg_level ({"user", "dataset"}, default "user") – The level of aggregation when filtering locations. ‘user’ : locations are filtered per-user; ‘dataset’ : locations are filtered over the whole dataset.
thresh_sp (int, default 10) – Minimum staypoints a user must have to be included.
thresh_loc (int, default 10) – Minimum locations a user must have to be included.
thresh_sp_at_loc (int, default 10) – Minimum number of staypoints at a location must have to be included.
thresh_loc_time (str or pd.Timedelta, default "1h") – Minimum sum of durations that was spent at location to be included. If str must be parsable by pd.to_timedelta.
thresh_loc_period (str or pd.Timedelta, default "5h") – Minimum timespan of first to last visit at a location to be included. If str must be parsable by pd.to_timedelta.
- Returns:
total_filter – Boolean series containing the filter as a mask.
- Return type:
pd.Series
Examples
>>> from ti.analysis.location_identification import pre_filter_locations >>> mask = pre_filter_locations(staypoints) >>> staypoints = staypoints[mask]
- trackintel.analysis.location_identification.freq_method(staypoints, *labels)[source]
Generate an activity label per user.
Assigning the most visited location the label “home” and the second most visited location the label “work”. The remaining locations get no label.
Labels can also be given as arguments.
- Parameters:
staypoints (GeoDataFrame (as trackintel staypoints)) – Staypoints with the column “location_id”.
labels (collection of str, default ("home", "work")) – Labels in decreasing time of activity.
- Returns:
sp – The input staypoints with additional column “purpose”.
- Return type:
GeoDataFrame (as trackintel staypoints)
Examples
>>> from ti.analysis.location_identification import freq_method >>> staypoints = freq_method(staypoints, "home", "work")
- trackintel.analysis.location_identification.osna_method(staypoints)[source]
Find “home” location for timeframes “rest” and “leisure” and “work” location for “work” timeframe.
Use weekdays data divided in three time frames [“rest”, “work”, “leisure”] to generate location labels. “rest” + “leisure” locations are weighted together. The location with the longest duration is assigned “home” label. The longest “work” location is assigned “work” label.
- Parameters:
staypoints (GeoDataFrame (as trackintel staypoints)) – Staypoints with the column “location_id”.
- Returns:
The input staypoints with additional column “purpose”.
- Return type:
GeoDataFrame (as trackintel staypoints)
Note
The method is adapted from [1]. When “home” and “work” label overlap, the method selects the “work” location by the 2nd highest score. The original algorithm count the distinct hours at a location as the home location is derived from geo-tagged tweets. We directly sum the time spent at a location.
References
[1] Efstathiades, Hariton, Demetris Antoniades, George Pallis, and Marios Dikaiakos. 2015. ‘Identification of Key Locations Based on Online Social Network Activity’. In https://doi.org/10.1145/2808797.2808877.
Examples
>>> from ti.analysis.location_identification import osna_method >>> staypoints = osna_method(staypoints)