Analysis

The analysis module provides a variety of functions to analyze the mobility datasets computed by trackintel.

Labelling

trackintel.analysis.create_activity_flag(staypoints, method='time_threshold', time_threshold=15.0, activity_column_name='is_activity')[source]

Add a flag whether or not a staypoint is considered an activity based on a time threshold.

Parameters:
  • staypoints (Staypoints) –

  • method ({'time_threshold'}, default = 'time_threshold') –

    • ‘time_threshold’ : All staypoints with a duration greater than the time_threshold are considered an activity.

  • time_threshold (float, default = 15 (minutes)) – The time threshold for which a staypoint is considered an activity in minutes. Used by method ‘time_threshold’

  • activity_column_name (str , default = 'is_activity') – The name of the newly created column that holds the activity flag.

Returns:

staypoints – Original staypoints with the additional activity column

Return type:

Staypoints

Examples

>>> sp  = sp.create_activity_flag(method='time_threshold', time_threshold=15)
>>> print(sp['is_activity'])
trackintel.analysis.predict_transport_mode(triplegs, method='simple-coarse', **kwargs)[source]

Predict the transport mode of triplegs.

Predict/impute the transport mode that was likely chosen to cover the given tripleg, e.g., car, bicycle, or walk.

Parameters:
  • triplegs (Triplegs) –

  • method ({'simple-coarse'}, default 'simple-coarse') –

    The following methods are available for transport mode inference/prediction:

    • ’simple-coarse’ : Uses simple heuristics to predict coarse transport classes.

Returns:

triplegs – The triplegs with added column mode, containing the predicted transport modes.

Return type:

Triplegs

Notes

simple-coarse method includes {'slow_mobility', 'motorized_mobility', 'fast_mobility'}. In the default classification, slow_mobility (<15 km/h) includes transport modes such as walking or cycling, motorized_mobility (<100 km/h) modes such as car or train, and fast_mobility (>100 km/h) modes such as high-speed rail or airplanes. These categories are default values and can be overwritten using the keyword argument categories.

Examples

>>> tpls  = tpls.predict_transport_mode()
>>> print(tpls["mode"])

Tracking Quality

trackintel.analysis.temporal_tracking_quality(source, granularity='all')[source]

Calculate per-user temporal tracking quality (temporal coverage).

Parameters:
  • df (Trackintel class) – The source dataframe to calculate temporal tracking quality.

  • granularity ({"all", "day", "week", "weekday", "hour"}) – The level of which the tracking quality is calculated. The default “all” returns the overall tracking quality; “day” the tracking quality by days; “week” the quality by weeks; “weekday” the quality by day of the week (e.g, Mondays, Tuesdays, etc.) and “hour” the quality by hours.

Returns:

quality – A per-user per-granularity temporal tracking quality dataframe.

Return type:

DataFrame

Notes

Requires at least the following columns: ['user_id', 'started_at', 'finished_at'] which means the function supports trackintel staypoints, triplegs, trips and tours datamodels and their combinations (e.g., staypoints and triplegs sequence).

The temporal tracking quality is the ratio of tracking time and the total time extent. It is calculated and returned per-user in the defined granularity. The time extents and the columns for the returned quality df for different granularity are:

  • all:
    • time extent: between the latest “finished_at” and the earliest “started_at” for each user.

    • columns: ['user_id', 'quality'].

  • week:
    • time extent: the whole week (604800 sec) for each user.

    • columns: ['user_id', 'week_monday', 'quality'].

  • day:
    • time extent: the whole day (86400 sec) for each user

    • columns: ['user_id', 'day', 'quality']

  • weekday
    • time extent: the whole day (86400 sec) * number of tracked weeks for each user for each user

    • columns: ['user_id', 'weekday', 'quality']

  • hour:
    • time extent: the whole hour (3600 sec) * number of tracked days for each user

    • columns: ['user_id', 'hour', 'quality']

Examples

>>> # calculate overall tracking quality of staypoints
>>> temporal_tracking_quality(sp, granularity="all")
>>> # calculate per-day tracking quality of sp and tpls sequence
>>> temporal_tracking_quality(sp_tpls, granularity="day")

Location Identification

trackintel.analysis.location_identifier(staypoints, method='FREQ', pre_filter=True, **pre_filter_kwargs)[source]

Assign “home” and “work” activity label for each user with different methods.

Parameters:
  • staypoints (Staypoints) – Staypoints with column “location_id”.

  • method ({'FREQ', 'OSNA'}, default "FREQ") – ‘FREQ’: Generate an activity label per user by assigning the most visited location the label “home” and the second most visited location the label “work”. The remaining locations get no label. ‘OSNA’: Use weekdays data divided in three time frames [“rest”, “work”, “leisure”]. Finds most popular home location for timeframes “rest” and “leisure” and most popular “work” location for “work” timeframe.

  • pre_filter (bool, default True) – Prefiltering the staypoints to exclude locations with not enough data. The filter function can also be accessed via pre_filter_locations.

  • pre_filter_kwargs (dict) – Kwargs to hand to pre_filter_locations if used. See function for more informations.

Returns:

sp – With additional column purpose assigning one of three activity labels {‘home’, ‘work’, None}.

Return type:

Staypoints

Note

The methods are adapted from [1]. The original algorithms count the distinct hours at a location as the home location is derived from geo-tagged tweets. We directly sum the time spent at a location as our data model includes that.

References

[1] Chen, Qingqing, and Ate Poorthuis. 2021. ‘Identifying Home Locations in Human Mobility Data: An Open-Source R Package for Comparison and Reproducibility’. International Journal of Geographical Information Science 0 (0): 1–24. https://doi.org/10.1080/13658816.2021.1887489.

Examples

>>> from ti.analysis import location_identifier
>>> location_identifier(staypoints, pre_filter=True, method="FREQ")
trackintel.analysis.pre_filter_locations(staypoints, agg_level='user', thresh_sp=10, thresh_loc=10, thresh_sp_at_loc=10, thresh_loc_time='1h', thresh_loc_period='5h')[source]

Filter locations and user out that have not enough data to do a proper analysis.

To disable a specific filter parameter set it to zero.

Parameters:
  • staypoints (Staypoints) – Staypoints with the column “location_id”.

  • agg_level ({"user", "dataset"}, default "user") – The level of aggregation when filtering locations. ‘user’ : locations are filtered per-user; ‘dataset’ : locations are filtered over the whole dataset.

  • thresh_sp (int, default 10) – Minimum staypoints a user must have to be included.

  • thresh_loc (int, default 10) – Minimum locations a user must have to be included.

  • thresh_sp_at_loc (int, default 10) – Minimum number of staypoints at a location must have to be included.

  • thresh_loc_time (str or pd.Timedelta, default "1h") – Minimum sum of durations that was spent at location to be included. If str must be parsable by pd.to_timedelta.

  • thresh_loc_period (str or pd.Timedelta, default "5h") – Minimum timespan of first to last visit at a location to be included. If str must be parsable by pd.to_timedelta.

Returns:

total_filter – Boolean series containing the filter as a mask.

Return type:

pd.Series

Examples

>>> from ti.analysis import pre_filter_locations
>>> mask = pre_filter_locations(staypoints)
>>> staypoints = staypoints[mask]
trackintel.analysis.freq_method(staypoints, *labels)[source]

Generate an activity label per user.

Assigning the most visited location the label “home” and the second most visited location the label “work”. The remaining locations get no label.

Labels can also be given as arguments.

Parameters:
  • staypoints (Staypoints) – Staypoints with the column “location_id”.

  • labels (collection of str, default ("home", "work")) – Labels in decreasing time of activity.

Returns:

sp – The input staypoints with additional column “purpose”.

Return type:

Staypoints

Examples

>>> from ti.analysis import freq_method
>>> staypoints = freq_method(staypoints, "home", "work")
trackintel.analysis.osna_method(staypoints)[source]

Find “home” location for timeframes “rest” and “leisure” and “work” location for “work” timeframe.

Use weekdays data divided in three time frames [“rest”, “work”, “leisure”] to generate location labels. “rest” + “leisure” locations are weighted together. The location with the longest duration is assigned “home” label. The longest “work” location is assigned “work” label.

Parameters:

staypoints (Staypoints) – Staypoints with the column “location_id”.

Returns:

The input staypoints with additional column “purpose”.

Return type:

Staypoints

Note

The method is adapted from [1]. When “home” and “work” label overlap, the method selects the “work” location by the 2nd highest score. The original algorithm count the distinct hours at a location as the home location is derived from geo-tagged tweets. We directly sum the time spent at a location.

References

[1] Efstathiades, Hariton, Demetris Antoniades, George Pallis, and Marios Dikaiakos. 2015. ‘Identification of Key Locations Based on Online Social Network Activity’. In https://doi.org/10.1145/2808797.2808877.

Examples

>>> from ti.analysis import osna_method
>>> staypoints = osna_method(staypoints)

Metrics

trackintel.analysis.radius_gyration(sp, method='count', print_progress=False)[source]

Radius of gyration for individual users.

Parameters:
  • sp (Staypoints) –

  • method (string, {"count", "duration"}) –

    Weighting for center of mass and average distance calculation.

    • count: assigns each Point the same weight of 1.

    • duration: assigns each Point a weight based on duration.

  • print_progress (bool, default False) – Show per-user progress if set to True.

Returns:

Radius of gyration for individual users.

Return type:

Series

References

[1] Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A. L. (2008). Understanding individual human mobility patterns. Nature, 453(7196), 779-782.

trackintel.analysis.jump_length(staypoints)[source]

Jump length between consecutive staypoints per users.

Parameters:

sp (Staypoints) –

Returns:

Distance between consecutive staypoints. Last entry of user is NaN. Shares index with sp.

Return type:

pd.Series

References

[1] Brockmann, D., Hufnagel, L., & Geisel, T. (2006). The scaling laws of human travel. Nature, 439(7075), 462-465.