Input/Output

We primarily support three types of data persistence:

Our primary focus lies on supporting PostGIS databases for persistence, but of course you can use the standard Pandas/Python tools to persist your data to any database with a minimal bit of tweaking. And of course you can also keep all data in memory while you do an analysis, e.g., in a Jupyter notebook.

All the read/write functions are made available in the top-level trackintel module, i.e., you can use them as trackintel.read_positionfixes_csv('data.csv'), etc. Note that these functions are wrappers around the (Geo)Pandas CSV, renaming and SQL functions. As such, all *args and **kwargs are forwarded to them.

CSV File Import

trackintel.io.file.read_positionfixes_csv(*args, columns=None, tz=None, index_col=None, geom_col='geom', crs=None, **kwargs)[source]

Read positionfixes from csv file.

Wraps the pandas read_csv function, extracts longitude and latitude and builds a geopandas GeoDataFrame (POINT). This also validates that the ingested data conforms to the trackintel understanding of positionfixes (see Model).

Parameters:
  • args – Arguments as passed to pd.read_csv().

  • columns (dict, optional) – The column names to rename in the format {‘old_name’:’trackintel_standard_name’}. The required columns for this function include: “user_id”, “tracked_at”, “latitude” and “longitude”.

  • tz (str, optional) – pytz compatible timezone string. If None UTC is assumed.

  • index_col (str, optional) – column name to be used as index. If None the default index is assumed as unique identifier.

  • geom_col (str, default "geom") – Name of the column containing the geometry.

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg ‘EPSG:4326’) or a WKT string.

  • kwargs – Additional keyword arguments passed to pd.read_csv().

Returns:

pfs – A GeoDataFrame containing the positionfixes.

Return type:

GeoDataFrame (as trackintel positionfixes)

Notes

Note that this function is primarily useful if data is available in a longitude/latitude format. If your data already contains a WKT column, might be easier to just use the GeoPandas import functions trackintel.io.from_geopandas.read_positionfixes_gpd().

Examples

>>> trackintel.read_positionfixes_csv('data.csv')
>>> trackintel.read_positionfixes_csv('data.csv', columns={'time':'tracked_at', 'User':'user_id'})
                     tracked_at  user_id                        geom
id
0     2008-10-23 02:53:04+00:00        0  POINT (116.31842 39.98470)
1     2008-10-23 02:53:10+00:00        0  POINT (116.31845 39.98468)
2     2008-10-23 02:53:15+00:00        0  POINT (116.31842 39.98469)
3     2008-10-23 02:53:20+00:00        0  POINT (116.31839 39.98469)
4     2008-10-23 02:53:25+00:00        0  POINT (116.31826 39.98465)
trackintel.io.file.read_triplegs_csv(*args, columns=None, tz=None, index_col=None, geom_col='geom', crs=None, **kwargs)[source]

Read triplegs from csv file.

Wraps the pandas read_csv function, extracts a WKT for the tripleg geometry (LINESTRING) and builds a geopandas GeoDataFrame. This also validates that the ingested data conforms to the trackintel understanding of triplegs (see Model).

Parameters:
  • args – Arguments as passed to pd.read_csv().

  • columns (dict, optional) – The column names to rename in the format {‘old_name’:’trackintel_standard_name’}. The required columns for this function include: “user_id”, “started_at”, “finished_at” and “geom”.

  • tz (str, optional) – pytz compatible timezone string. If None UTC is assumed.

  • index_col (str, optional) – Column name to be used as index. If None the default index is assumed as unique identifier.

  • geom_col (str, default "geom") – Name of the column containing the geometry as WKT.

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • kwargs – Additional keyword arguments passed to pd.read_csv().

Returns:

tpls – A GeoDataFrame containing the triplegs.

Return type:

GeoDataFrame (as trackintel triplegs)

Examples

>>> trackintel.read_triplegs_csv('data.csv')
>>> trackintel.read_triplegs_csv('data.csv', columns={'start_time':'started_at', 'User':'user_id'})
    user_id                started_at               finished_at                                               geom
id
0         1 2015-11-27 08:00:00+00:00 2015-11-27 10:00:00+00:00  LINESTRING (8.54878 47.37652, 8.52770 47.39935...
1         1 2015-11-27 12:00:00+00:00 2015-11-27 14:00:00+00:00  LINESTRING (8.56340 47.95600, 8.64560 47.23345...
trackintel.io.file.read_staypoints_csv(*args, columns=None, tz=None, index_col=None, geom_col='geom', crs=None, **kwargs)[source]

Read staypoints from csv file.

Wraps the pandas read_csv function, extracts a WKT for the staypoint geometry (POINT) and builds a geopandas GeoDataFrame. This also validates that the ingested data conforms to the trackintel understanding of staypoints (see Model).

Parameters:
  • args – Arguments as passed to pd.read_csv().

  • columns (dict, optional) – The column names to rename in the format {‘old_name’:’trackintel_standard_name’}. The required columns for this function include: “user_id”, “started_at”, “finished_at” and “geom”.

  • tz (str, optional) – pytz compatible timezone string. If None UTC is assumed.

  • index_col (str, optional) – column name to be used as index. If None the default index is assumed as unique identifier.

  • geom_col (str, default "geom") – Name of the column containing the geometry as WKT.

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • kwargs – Additional keyword arguments passed to pd.read_csv().

Returns:

sp – A GeoDataFrame containing the staypoints.

Return type:

GeoDataFrame (as trackintel staypoints)

Examples

>>> trackintel.read_staypoints_csv('data.csv')
>>> trackintel.read_staypoints_csv('data.csv', columns={'start_time':'started_at', 'User':'user_id'})
    user_id                started_at               finished_at                      geom
id
0         1 2015-11-27 08:00:00+00:00 2015-11-27 10:00:00+00:00  POINT (8.52822 47.39519)
1         1 2015-11-27 12:00:00+00:00 2015-11-27 14:00:00+00:00  POINT (8.54340 47.95600)
trackintel.io.file.read_locations_csv(*args, columns=None, index_col=None, crs=None, **kwargs)[source]

Read locations from csv file.

Wraps the pandas read_csv function, extracts a WKT for the location center (POINT) (and extent (POLYGON)) and builds a geopandas GeoDataFrame. This also validates that the ingested data conforms to the trackintel understanding of locations (see Model).

Parameters:
  • args – Arguments as passed to pd.read_csv().

  • columns (dict, optional) – The column names to rename in the format {‘old_name’:’trackintel_standard_name’}. The required columns for this function include: “user_id” and “center”.

  • index_col (str, optional) – column name to be used as index. If None the default index is assumed as unique identifier.

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • kwargs – Additional keyword arguments passed to pd.read_csv().

Returns:

locs – A GeoDataFrame containing the locations.

Return type:

GeoDataFrame (as trackintel locations)

Examples

>>> trackintel.read_locations_csv('data.csv')
>>> trackintel.read_locations_csv('data.csv', columns={'User':'user_id'})
    user_id                    center                                             extent
id
0         1  POINT (8.54878 47.37652)  POLYGON ((8.548779487999999 47.37651505, 8.527...
1         1  POINT (8.56340 47.95600)  POLYGON ((8.5634 47.956, 8.6456 47.23345, 8.45...
trackintel.io.file.read_trips_csv(*args, columns=None, tz=None, index_col=None, geom_col=None, crs=None, **kwargs)[source]

Read trips from csv file.

Wraps the pandas read_csv function and extracts proper datetimes. This also validates that the ingested data conforms to the trackintel understanding of trips (see Model).

Parameters:
  • args – Arguments as passed to pd.read_csv().

  • columns (dict, optional) – The column names to rename in the format {‘old_name’:’trackintel_standard_name’}. The required columns for this function include: “user_id”, “started_at”, “finished_at”, “origin_staypoint_id” and “destination_staypoint_id”. An optional column is “geom” of type MultiPoint, containing start and destination points of the trip

  • tz (str, optional) – pytz compatible timezone string. If None UTC is assumed.

  • index_col (str, optional) – column name to be used as index. If None the default index is assumed as unique identifier.

  • geom_col (str, default None) – Name of the column containing the geometry as WKT. If None no geometry gets added.

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string. Ignored if geom_col is None.

  • kwargs – Additional keyword arguments passed to pd.read_csv().

Returns:

trips – A DataFrame containing the trips. GeoDataFrame if geometry column exists.

Return type:

(Geo)DataFrame (as trackintel trips)

Notes

Geometry is not mandatory for trackintel trips.

Examples

>>> trackintel.read_trips_csv('data.csv')
>>> trackintel.read_trips_csv('data.csv', columns={'start_time':'started_at', 'User':'user_id'})
    user_id                started_at               finished_at  origin_staypoint_id  destination_staypoint_id    id
0         1 2015-11-27 08:00:00+00:00 2015-11-27 08:15:00+00:00                    2                         5
1         1 2015-11-27 08:20:22+00:00 2015-11-27 08:35:22+00:00                    5                         3
                            geom
id
0   MULTIPOINT (116.31842 39.98470, 116.29873 39.999729)
1   MULTIPOINT (116.29873 39.98402, 116.32480 40.009269)
trackintel.io.file.read_tours_csv(*args, columns=None, index_col=None, tz=None, **kwargs)[source]

Read tours from csv file.

Wraps the pandas read_csv function and extracts proper datetimes. This also validates that the ingested data conforms to the trackintel understanding of tours (see Model).

Parameters:
  • args – Arguments as passed to pd.read_csv().

  • columns (dict, optional) – The column names to rename in the format {‘old_name’:’trackintel_standard_name’}.

  • index_col (str, optional) – column name to be used as index. If None the default index is assumed as unique identifier.

  • tz (str, optional) – pytz compatible timezone string. If None UTC is assumed.

  • kwargs – Additional keyword arguments passed to pd.read_csv().

Returns:

tours – A DataFrame containing the tours.

Return type:

DataFrame (as trackintel tours)

Examples

>>> trackintel.read_tours_csv('data.csv', columns={'uuid':'user_id'})

GeoDataFrame Import

trackintel.io.from_geopandas.read_positionfixes_gpd(gdf, tracked_at='tracked_at', user_id='user_id', geom_col=None, crs=None, tz=None, mapper=None)[source]

Read positionfixes from GeoDataFrames.

Warps the pd.rename function to simplify the import of GeoDataFrames.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame with valid point geometry, containing the positionfixes to import

  • tracked_at (str, default 'tracked_at') – Name of the column storing the timestamps.

  • user_id (str, default 'user_id') – Name of the column storing the user_id.

  • geom_col (str, optional) – Name of the column storing the geometry. If None assumes geometry is already set.

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • tz (str, optional) – pytz compatible timezone string. If None UTC will be assumed

  • mapper (dict, optional) – Further columns that should be renamed.

Returns:

pfs – A GeoDataFrame containing the positionfixes.

Return type:

GeoDataFrame (as trackintel positionfixes)

Examples

>>> trackintel.read_positionfixes_gpd(gdf, user_id='User', geom_col='geom', tz='utc')
trackintel.io.from_geopandas.read_triplegs_gpd(gdf, started_at='started_at', finished_at='finished_at', user_id='user_id', geom_col=None, crs=None, tz=None, mapper=None)[source]

Read triplegs from GeoDataFrames.

warps the pd.rename function to simplify the import of GeoDataFrames.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame with valid line geometry, containing the triplegs to import.

  • started_at (str, default 'started_at') – Name of the column storing the starttime of the triplegs.

  • finished_at (str, default 'finished_at') – Name of the column storing the endtime of the triplegs.

  • user_id (str, default 'user_id') – Name of the column storing the user_id.

  • geom_col (str, optional) – Name of the column storing the geometry. If None assumes geometry is already set.

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • tz (str, optional) – pytz compatible timezone string. If None UTC is assumed.

  • mapper (dict, optional) – Further columns that should be renamed.

Returns:

tpls – A GeoDataFrame containing the triplegs

Return type:

GeoDataFrame (as trackintel triplegs)

Examples

>>> trackintel.read_triplegs_gpd(gdf, user_id='User', geom_col='geom', tz='utc')
trackintel.io.from_geopandas.read_staypoints_gpd(gdf, started_at='started_at', finished_at='finished_at', user_id='user_id', geom_col=None, crs=None, tz=None, mapper=None)[source]

Read staypoints from GeoDataFrames.

Warps the pd.rename function to simplify the import of GeoDataFrames.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame with valid point geometry, containing the staypoints to import

  • started_at (str, default 'started_at') – Name of the column storing the starttime of the staypoints.

  • finished_at (str, default 'finished_at') – Name of the column storing the endtime of the staypoints.

  • user_id (str, default 'user_id') – Name of the column storing the user_id.

  • geom_col (str) – Name of the column storing the geometry. If None assumes geometry is already set.

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • tz (str, optional) – pytz compatible timezone string. If None UTC is assumed.

  • mapper (dict, optional) – Further columns that should be renamed.

Returns:

sp – A GeoDataFrame containing the staypoints

Return type:

GeoDataFrame (as trackintel staypoints)

Examples

>>> trackintel.read_staypoints_gpd(gdf, started_at='start_time', finished_at='end_time', tz='utc')
trackintel.io.from_geopandas.read_locations_gpd(gdf, user_id='user_id', center='center', extent=None, crs=None, mapper=None)[source]

Read locations from GeoDataFrames.

Warps the pd.rename function to simplify the import of GeoDataFrames.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame with valid point geometry, containing the locations to import.

  • user_id (str, default 'user_id') – Name of the column storing the user_id.

  • center (str, default 'center') – Name of the column storing the geometry (center of the location).

  • extent (str, optional) – Name of the column storing the additionaly geometry (extent of location).

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string.

  • mapper (dict, optional) – Further columns that should be renamed.

Returns:

locs – A GeoDataFrame containing the locations.

Return type:

GeoDataFrame (as trackintel locations)

Examples

>>> trackintel.read_locations_gpd(df, user_id='User', center='geometry')
trackintel.io.from_geopandas.read_trips_gpd(gdf, started_at='started_at', finished_at='finished_at', user_id='user_id', origin_staypoint_id='origin_staypoint_id', destination_staypoint_id='destination_staypoint_id', geom_col=None, crs=None, tz=None, mapper=None)[source]

Read trips from GeoDataFrames/DataFrames.

Warps the pd.rename function to simplify the import of GeoDataFrames (DataFrames).

Parameters:
  • gdf (GeoDataFrame or DataFrame) – (Geo)DataFrame containing the trips to import.

  • started_at (str, default 'started_at') – Name of the column storing the starttime of the staypoints.

  • finished_at (str, default 'finished_at') – Name of the column storing the endtime of the staypoints.

  • user_id (str, default 'user_id') – Name of the column storing the user_id.

  • origin_staypoint_id (str, default 'origin_staypoint_id') – Name of the column storing the staypoint_id of the start of the tripleg.

  • destination_staypoint_id (str, default 'destination_staypoint_id') – Name of the column storing the staypoint_id of the end of the tripleg

  • geom_col (str, optional) – Name of the column storing the geometry. If None assumes has no geometry!

  • crs (pyproj.crs or str, optional) – Set coordinate reference system. The value can be anything accepted by pyproj.CRS.from_user_input(), such as an authority string (eg “EPSG:4326”) or a WKT string. Ignored if “geom_col” is None.

  • tz (str, optional) – pytz compatible timezone string. If None UTC is assumed.

  • mapper (dict, optional) – Further columns that should be renamed.

Returns:

trips – A (Geo)DataFrame containing the trips.

Return type:

(Geo)DataFrame (as trackintel trips)

Examples

>>> trackintel.read_trips_gpd(df, tz='utc')
trackintel.io.from_geopandas.read_tours_gpd(gdf, user_id='user_id', started_at='started_at', finished_at='finished_at', tz=None, mapper=None)[source]

Read tours from GeoDataFrames.

Wraps the pd.rename function to simplify the import of GeoDataFrames.

Parameters:
  • gdf (GeoDataFrame) – GeoDataFrame containing the tours to import.

  • user_id (str, default 'user_id') – Name of the column storing the user_id.

  • started_at (str, default 'started_at') – Name of the column storing the start time of the tours.

  • finished_at (str, default 'finished_at') – Name of the column storing the end time of the tours.

  • tz (str, optional) – pytz compatible timezone string. If None UTC is assumed.

  • mapper (dict, optional) – Further columns that should be renamed.

Returns:

tours – A GeoDataFrame containing the tours

Return type:

GeoDataFrame (as trackintel tours)

PostGIS Import

trackintel.io.postgis.read_positionfixes_postgis(sql, con, geom_col='geom', crs=None, index_col=None, coerce_float=True, parse_dates=None, params=None, chunksize=None, **kwargs)[source]

Reads positionfixes from a PostGIS database.

Parameters:
  • sql (str) – SQL query e.g. “SELECT * FROM positionfixes”

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • geom_col (str, default 'geom') – The geometry column of the table.

  • crs (optional) – Coordinate reference system to use for the returned GeoDataFrame

  • index_col (string or list of strings, optional, default: None) – Column(s) to set as index(MultiIndex)

  • coerce_float (boolean, default True) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

  • parse_dates (list or dict, default None) –

    • List of column names to parse as dates.

    • Dict of {column_name: format string} where format string is

      strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

    • Dict of {column_name: arg dict}, where the arg dict

      corresponds to the keyword arguments of pandas.to_datetime(). Especially useful with databases without native Datetime support, such as SQLite.

  • params (list, tuple or dict, optional, default None) – List of parameters to pass to execute method.

  • chunksize (int, default None) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

  • **kwargs – Further keyword arguments as available in trackintels trackintel.io.read_positionfixes_gpd(). Especially useful to rename column names from the SQL table to trackintel conform column names. See second example how to use it in code.

Returns:

A GeoDataFrame containing the positionfixes.

Return type:

GeoDataFrame

Examples

>>> pfs = ti.io.read_positionfixes_postgis("SELECT * FROM positionfixes", con, geom_col="geom")
>>> pfs = ti.io.read_positionfixes_postgis("SELECT * FROM positionfixes", con, geom_col="geom",
...                                        index_col="id", user_id="USER", tracked_at="time")
trackintel.io.postgis.read_triplegs_postgis(sql, con, geom_col='geom', crs=None, index_col=None, coerce_float=True, parse_dates=None, params=None, chunksize=None, **kwargs)[source]

Reads triplegs from a PostGIS database.

Parameters:
  • sql (str) – SQL query e.g. “SELECT * FROM triplegs”

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • geom_col (str, default 'geom') – The geometry column of the table.

  • crs (optional) – Coordinate reference system to use for the returned GeoDataFrame

  • index_col (string or list of strings, optional, default: None) – Column(s) to set as index(MultiIndex)

  • coerce_float (boolean, default True) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

  • parse_dates (list or dict, default None) –

    • List of column names to parse as dates.

    • Dict of {column_name: format string} where format string is

      strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

    • Dict of {column_name: arg dict}, where the arg dict

      corresponds to the keyword arguments of pandas.to_datetime(). Especially useful with databases without native Datetime support, such as SQLite.

  • params (list, tuple or dict, optional, default None) – List of parameters to pass to execute method.

  • chunksize (int, default None) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

  • **kwargs – Further keyword arguments as available in trackintels trackintel.io.read_triplegs_gpd(). Especially useful to rename column names from the SQL table to trackintel conform column names. See second example how to use it in code.

Returns:

A GeoDataFrame containing the triplegs.

Return type:

GeoDataFrame

Examples

>>> tpls = ti.io.read_triplegs_postgis("SELECT * FROM triplegs", con, geom_col="geom")
>>> tpls = ti.io.read_triplegs_postgis("SELECT * FROM triplegs", con, geom_col="geom", index_col="id",
...                                    started_at="start_time", finished_at="end_time", user_id="USER")
trackintel.io.postgis.read_staypoints_postgis(sql, con, geom_col='geom', crs=None, index_col=None, coerce_float=True, parse_dates=None, params=None, chunksize=None, **kwargs)[source]

Read staypoints from a PostGIS database.

Parameters:
  • sql (str) – SQL query e.g. “SELECT * FROM staypoints”

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • geom_col (str, default 'geom') – The geometry column of the table.

  • crs (optional) – Coordinate reference system to use for the returned GeoDataFrame

  • index_col (string or list of strings, optional, default: None) – Column(s) to set as index(MultiIndex)

  • coerce_float (boolean, default True) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

  • parse_dates (list or dict, default None) –

    • List of column names to parse as dates.

    • Dict of {column_name: format string} where format string is

      strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

    • Dict of {column_name: arg dict}, where the arg dict

      corresponds to the keyword arguments of pandas.to_datetime(). Especially useful with databases without native Datetime support, such as SQLite.

  • params (list, tuple or dict, optional, default None) – List of parameters to pass to execute method.

  • chunksize (int, default None) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

  • **kwargs – Further keyword arguments as available in trackintels trackintel.io.read_staypoints_gpd(). Especially useful to rename column names from the SQL table to trackintel conform column names. See second example how to use it in code.

Returns:

A GeoDataFrame containing the staypoints.

Return type:

GeoDataFrame

Examples

>>> sp = ti.io.read_staypoints_postgis("SELECT * FROM staypoints", con, geom_col="geom")
>>> sp = ti.io.read_staypoints_postgis("SELECT * FROM staypoints", con, geom_col="geom", index_col="id",
...                                      started_at="start_time", finished_at="end_time", user_id="USER")
trackintel.io.postgis.read_locations_postgis(sql, con, center='center', crs=None, index_col=None, coerce_float=True, parse_dates=None, params=None, chunksize=None, **kwargs)[source]

Reads locations from a PostGIS database.

Parameters:
  • sql (str) – SQL query e.g. “SELECT * FROM locations”

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • center (str, default 'center') – The geometry column of the table. For the center of the location.

  • crs (optional) – Coordinate reference system to use for the returned GeoDataFrame

  • index_col (string or list of strings, optional, default: None) – Column(s) to set as index(MultiIndex)

  • coerce_float (boolean, default True) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

  • parse_dates (list or dict, default None) –

    • List of column names to parse as dates.

    • Dict of {column_name: format string} where format string is

      strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

    • Dict of {column_name: arg dict}, where the arg dict

      corresponds to the keyword arguments of pandas.to_datetime(). Especially useful with databases without native Datetime support, such as SQLite.

  • params (list, tuple or dict, optional, default None) – List of parameters to pass to execute method.

  • chunksize (int, default None) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

  • **kwargs – Further keyword arguments as available in trackintels trackintel.io.read_locations_gpd(). Especially useful to rename column names from the SQL table to trackintel conform column names. See second example how to use it in code.

Returns:

A GeoDataFrame containing the locations.

Return type:

GeoDataFrame

Examples

>>> locs = ti.io.read_locations_postgis("SELECT * FROM locations", con, center="center")
>>> locs = ti.io.read_locations_postgis("SELECT * FROM locations", con, center="geom", index_col="id",
...                                     user_id="USER", extent="extent")
)
trackintel.io.postgis.read_trips_postgis(sql, con, geom_col=None, crs=None, index_col=None, coerce_float=True, parse_dates=None, params=None, chunksize=None, **kwargs)[source]

Read trips from a PostGIS database.

Parameters:
  • sql (str) – SQL query e.g. “SELECT * FROM trips”

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • geom_col (str, optional) – The geometry column of the table (if exists). Start and endpoint of the trip.

  • crs (optional) – Coordinate reference system if table has geometry.

  • index_col (string or list of strings, optional, default: None) – Column(s) to set as index(MultiIndex)

  • coerce_float (boolean, default True) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

  • parse_dates (list or dict, default None) –

    • List of column names to parse as dates.

    • Dict of {column_name: format string} where format string is

      strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

    • Dict of {column_name: arg dict}, where the arg dict

      corresponds to the keyword arguments of pandas.to_datetime(). Especially useful with databases without native Datetime support, such as SQLite.

  • params (list, tuple or dict, optional, default None) – List of parameters to pass to execute method.

  • chunksize (int, default None) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

  • **kwargs – Further keyword arguments as available in trackintels trackintel.io.read_trips_gpd(). Especially useful to rename column names from the SQL table to trackintel conform column names. See second example how to use it in code.

Returns:

A GeoDataFrame containing the trips.

Return type:

GeoDataFrame

Examples

>>> trips = ti.io.read_trips_postgis("SELECT * FROM trips", con)
>>> trips = ti.io.read_trips_postgis("SELECT * FROM trips", con, geom_col="geom", index_col="id",
...                                  started_at="start_time", finished_at="end_time", user_id="USER",
...                                  origin_staypoint_id="ORIGIN", destination_staypoint_id="DEST")
trackintel.io.postgis.read_tours_postgis(sql, con, geom_col=None, crs=None, index_col=None, coerce_float=True, parse_dates=None, params=None, chunksize=None, **kwargs)[source]

Read tours from a PostGIS database.

Parameters:
  • sql (str) – SQL query e.g. “SELECT * FROM tours”

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – Active connection to PostGIS database.

  • geom_col (str, optional) – The geometry column of the table (if exists).

  • crs (optional) – Coordinate reference system if table has geometry.

  • index_col (string or list of strings, optional) – Column(s) to set as index(MultiIndex)

  • coerce_float (boolean, default True) – Attempt to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets

  • parse_dates (list or dict, default None) –

    • List of column names to parse as dates.

    • Dict of {column_name: format string} where format string is

      strftime compatible in case of parsing string times, or is one of (D, s, ns, ms, us) in case of parsing integer timestamps.

    • Dict of {column_name: arg dict}, where the arg dict

      corresponds to the keyword arguments of pandas.to_datetime(). Especially useful with databases without native Datetime support, such as SQLite.

  • params (list, tuple or dict, optional, default None) – List of parameters to pass to execute method.

  • chunksize (int, default None) – If specified, return an iterator where chunksize is the number of rows to include in each chunk.

  • **kwargs – Further keyword arguments as available in trackintels trackintel.io.read_tours_gpd(). Especially useful to rename column names from the SQL table to trackintel conform column names. See second example how to use it in code.

Returns:

A GeoDataFrame containing the tours.

Return type:

GeoDataFrame (as trackintel tours)

Examples

>>> tours = ti.io.read_tours_postgis("SELECT * FROM tours", con)
>>> tours = ti.io.read_tours_postgis("SELECT * FROM tours", con, index_col="id", started_at="start_time",
...                                  finished_at="end_time", user_id="USER")

CSV File Export

trackintel.io.file.write_positionfixes_csv(positionfixes, filename, *args, **kwargs)[source]

Write positionfixes to csv file.

Wraps the pandas to_csv function, but strips the geometry column and stores the longitude and latitude in respective columns.

Parameters:
  • positionfixes (GeoDataFrame (as trackintel positionfixes)) – The positionfixes to store to the CSV file.

  • filename (str) – The file to write to.

  • args – Additional arguments passed to pd.DataFrame.to_csv().

  • kwargs – Additional keyword arguments passed to pd.DataFrame.to_csv().

Notes

“longitude” and “latitude” is extracted from the geometry column and the orignal geometry column is dropped.

Examples

>>> ps.as_positionfixes.to_csv("export_pfs.csv")
trackintel.io.file.write_triplegs_csv(triplegs, filename, *args, **kwargs)[source]

Write triplegs to csv file.

Wraps the pandas to_csv function, but transforms the geometry into WKT before writing.

Parameters:
  • triplegs (GeoDataFrame (as trackintel triplegs)) – The triplegs to store to the CSV file.

  • filename (str) – The file to write to.

  • args – Additional arguments passed to pd.DataFrame.to_csv().

  • kwargs – Additional keyword arguments passed to pd.DataFrame.to_csv().

Examples

>>> tpls.as_triplegs.to_csv("export_tpls.csv")
trackintel.io.file.write_staypoints_csv(staypoints, filename, *args, **kwargs)[source]

Write staypoints to csv file.

Wraps the pandas to_csv function, but transforms the geometry into WKT before writing.

Parameters:
  • staypoints (GeoDataFrame (as trackintel staypoints)) – The staypoints to store to the CSV file.

  • filename (str) – The file to write to.

  • args – Additional arguments passed to pd.DataFrame.to_csv().

  • kwargs – Additional keyword arguments passed to pd.DataFrame.to_csv().

Examples

>>> tpls.as_triplegs.to_csv("export_tpls.csv")
trackintel.io.file.write_locations_csv(locations, filename, *args, **kwargs)[source]

Write locations to csv file.

Wraps the pandas to_csv function, but transforms the center (and extent) into WKT before writing.

Parameters:
  • locations (GeoDataFrame (as trackintel locations)) – The locations to store to the CSV file.

  • filename (str) – The file to write to.

  • args – Additional arguments passed to pd.DataFrame.to_csv().

  • kwargs – Additional keyword arguments passed to pd.DataFrame.to_csv().

Examples

>>> locs.as_locations.to_csv("export_locs.csv")
trackintel.io.file.write_trips_csv(trips, filename, *args, **kwargs)[source]

Write trips to csv file.

Wraps the pandas to_csv function. Geometry get transformed to WKT before writing.

Parameters:
  • trips ((Geo)DataFrame (as trackintel trips)) – The trips to store to the CSV file.

  • filename (str) – The file to write to.

  • args – Additional arguments passed to pd.DataFrame.to_csv().

  • kwargs – Additional keyword arguments passed to pd.DataFrame.to_csv().

Examples

>>> trips.as_trips.to_csv("export_trips.csv")
trackintel.io.file.write_tours_csv(tours, filename, *args, **kwargs)[source]

Write tours to csv file.

Wraps the pandas to_csv function.

Parameters:
  • tours (DataFrame (as trackintel tours)) – The tours to store to the CSV file.

  • filename (str) – The file to write to.

  • args – Additional arguments passed to pd.DataFrame.to_csv().

  • kwargs – Additional keyword arguments passed to pd.DataFrame.to_csv().

Examples

>>> tours.as_tours.to_csv("export_tours.csv")

PostGIS Export

trackintel.io.postgis.write_positionfixes_postgis(positionfixes, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)[source]

Stores positionfixes to PostGIS. Usually, this is directly called on a positionfixes DataFrame (see example below).

Parameters:
  • positionfixes (GeoDataFrame (as trackintel positionfixes)) – The positionfixes to store to the database.

  • name (str) – The name of the table to write to.

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • schema (str, optional) – The schema (if the database supports this) where the table resides.

  • if_exists (str, {'fail', 'replace', 'append'}, default 'fail') –

    How to behave if the table already exists.

    • fail: Raise a ValueError.

    • replace: Drop the table before inserting new values.

    • append: Insert new values to the existing table.

  • index (bool, default True) – Write DataFrame index as a column. Uses index_label as the column name in the table.

  • index_label (str or sequence, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used.

  • chunksize (int, optional) – How many entries should be written at the same time.

  • dtype (dict of column name to SQL type, default None) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types.

Examples

>>> pfs.as_positionfixes.to_postgis(conn_string, table_name)
>>> ti.io.postgis.write_positionfixes_postgis(pfs, conn_string, table_name)
trackintel.io.postgis.write_triplegs_postgis(triplegs, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)[source]

Stores triplegs to PostGIS. Usually, this is directly called on a triplegs DataFrame (see example below).

Parameters:
  • triplegs (GeoDataFrame (as trackintel triplegs)) – The triplegs to store to the database.

  • name (str) – The name of the table to write to.

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • schema (str, optional) – The schema (if the database supports this) where the table resides.

  • if_exists (str, {'fail', 'replace', 'append'}, default 'fail') –

    How to behave if the table already exists.

    • fail: Raise a ValueError.

    • replace: Drop the table before inserting new values.

    • append: Insert new values to the existing table.

  • index (bool, default True) – Write DataFrame index as a column. Uses index_label as the column name in the table.

  • index_label (str or sequence, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used.

  • chunksize (int, optional) – How many entries should be written at the same time.

  • dtype (dict of column name to SQL type, default None) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types.

Examples

>>> tpls.as_triplegs.to_postgis(conn_string, table_name)
>>> ti.io.postgis.write_triplegs_postgis(tpls, conn_string, table_name)
trackintel.io.postgis.write_staypoints_postgis(staypoints, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)[source]

Stores staypoints to PostGIS. Usually, this is directly called on a staypoints DataFrame (see example below).

Parameters:
  • staypoints (GeoDataFrame (as trackintel staypoints)) – The staypoints to store to the database.

  • name (str) – The name of the table to write to.

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • schema (str, optional) – The schema (if the database supports this) where the table resides.

  • if_exists (str, {'fail', 'replace', 'append'}, default 'fail') –

    How to behave if the table already exists.

    • fail: Raise a ValueError.

    • replace: Drop the table before inserting new values.

    • append: Insert new values to the existing table.

  • index (bool, default True) – Write DataFrame index as a column. Uses index_label as the column name in the table.

  • index_label (str or sequence, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used.

  • chunksize (int, optional) – How many entries should be written at the same time.

  • dtype (dict of column name to SQL type, default None) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types.

Examples

>>> sp.as_staypoints.to_postgis(conn_string, table_name)
>>> ti.io.postgis.write_staypoints_postgis(sp, conn_string, table_name)
trackintel.io.postgis.write_locations_postgis(locations, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)[source]

Stores locations to PostGIS. Usually, this is directly called on a locations DataFrame (see example below).

Parameters:
  • locations (GeoDataFrame (as trackintel locations)) – The locations to store to the database.

  • name (str) – The name of the table to write to.

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • schema (str, optional) – The schema (if the database supports this) where the table resides.

  • if_exists (str, {'fail', 'replace', 'append'}, default 'fail') –

    How to behave if the table already exists.

    • fail: Raise a ValueError.

    • replace: Drop the table before inserting new values.

    • append: Insert new values to the existing table.

  • index (bool, default True) – Write DataFrame index as a column. Uses index_label as the column name in the table.

  • index_label (str or sequence, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used.

  • chunksize (int, optional) – How many entries should be written at the same time.

  • dtype (dict of column name to SQL type, default None) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types.

Examples

>>> locs.as_locations.to_postgis(conn_string, table_name)
>>> ti.io.postgis.write_locations_postgis(locs, conn_string, table_name)
trackintel.io.postgis.write_trips_postgis(trips, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)[source]

Stores trips to PostGIS. Usually, this is directly called on a trips DataFrame (see example below).

Parameters:
  • trips (GeoDataFrame (as trackintel trips)) – The trips to store to the database.

  • name (str) – The name of the table to write to.

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • schema (str, optional) – The schema (if the database supports this) where the table resides.

  • if_exists (str, {'fail', 'replace', 'append'}, default 'fail') –

    How to behave if the table already exists.

    • fail: Raise a ValueError.

    • replace: Drop the table before inserting new values.

    • append: Insert new values to the existing table.

  • index (bool, default True) – Write DataFrame index as a column. Uses index_label as the column name in the table.

  • index_label (str or sequence, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used.

  • chunksize (int, optional) – How many entries should be written at the same time.

  • dtype (dict of column name to SQL type, default None) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types.

Examples

>>> trips.as_trips.to_postgis(conn_string, table_name)
>>> ti.io.postgis.write_trips_postgis(trips, conn_string, table_name)
trackintel.io.postgis.write_tours_postgis(tours, name, con, schema=None, if_exists='fail', index=True, index_label=None, chunksize=None, dtype=None)[source]

Stores tours to PostGIS. Usually, this is directly called on a tours DataFrame (see example below).

Parameters:
  • tours (GeoDataFrame (as trackintel tours)) – The tours to store to the database.

  • name (str) – The name of the table to write to.

  • con (sqlalchemy.engine.Connection or sqlalchemy.engine.Engine) – active connection to PostGIS database.

  • schema (str, optional) – The schema (if the database supports this) where the table resides.

  • if_exists (str, {'fail', 'replace', 'append'}, default 'fail') –

    How to behave if the table already exists.

    • fail: Raise a ValueError.

    • replace: Drop the table before inserting new values.

    • append: Insert new values to the existing table.

  • index (bool, default True) – Write DataFrame index as a column. Uses index_label as the column name in the table.

  • index_label (str or sequence, default None) – Column label for index column(s). If None is given (default) and index is True, then the index names are used.

  • chunksize (int, optional) – How many entries should be written at the same time.

  • dtype (dict of column name to SQL type, default None) – Specifying the datatype for columns. The keys should be the column names and the values should be the SQLAlchemy types.

Examples

>>> tours.as_tours.to_postgis(conn_string, table_name)
>>> ti.io.postgis.write_tours_postgis(tours, conn_string, table_name)

Predefined dataset readers

We also provide functionality to parse well-known datasets directly into the trackintel framework.

Geolife

We support easy parsing of the Geolife dataset including available mode labels.

trackintel.io.dataset_reader.read_geolife(geolife_path, print_progress=False)[source]

Read raw geolife data and return trackintel positionfixes.

This functions parses all geolife data available in the directory geolife_path

Parameters:
  • geolife_path (str) – Path to the directory with the geolife data

  • print_progress (Bool, default False) – Show per-user progress if set to True.

Returns:

  • gdf (GeoDataFrame (as trackintel positionfixes)) – Contains all loaded geolife positionfixes

  • labels (dict) – Dictionary with the available mode labels. Keys are user ids of users that have a “labels.txt” in their folder.

Notes

The geopandas dataframe has the following columns and datatype: ‘elevation’: float64 (in meters); ‘tracked_at’: datetime64[ns]; ‘user_id’: int64; ‘geom’: shapely geometry; ‘accuracy’: None;

For some users, travel mode labels are provided as .txt file. These labels are read and returned as label dictionary. The label dictionary contains the user ids as keys and DataFrames with the available labels as values. Labels can be added to each user at the tripleg level, see trackintel.io.dataset_reader.geolife_add_modes_to_triplegs() for more details.

The folder structure within the geolife directory needs to be identical with the folder structure available from the official download. The means that the top level folder (provided with ‘geolife_path’) contains the folders for the different users:

geolife_path
├── 000
│ ├── Trajectory
│ │ ├── 20081023025304.plt
│ │ ├── 20081024020959.plt
│ │ ├── 20081026134407.plt
│ │ └── …
├── 001
│ ├── Trajectory
│ │ └── …
│ …
├── 010
│ ├── labels.txt
│ ├── Trajectory
│ │ └── …
└── …

the geolife dataset as it can be downloaded from:

https://www.microsoft.com/en-us/research/publication/geolife-gps-trajectory-dataset-user-guide/

References

[1] Yu Zheng, Lizhu Zhang, Xing Xie, Wei-Ying Ma. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of International conference on World Wild Web (WWW 2009), Madrid Spain. ACM Press: 791-800.

[2] Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, Wei-Ying Ma. Understanding Mobility Based on GPS Data. In Proceedings of ACM conference on Ubiquitous Computing (UbiComp 2008), Seoul, Korea. ACM Press: 312-321.

[3] Yu Zheng, Xing Xie, Wei-Ying Ma, GeoLife: A Collaborative Social Networking Service among User, location and trajectory. Invited paper, in IEEE Data Engineering Bulletin. 33, 2, 2010, pp. 32-40.

Example

>>> from trackintel.io.dataset_reader import read_geolife
>>> pfs, mode_labels = read_geolife(os.path.join('downloads', 'Geolife Trajectories 1.3'))
trackintel.io.dataset_reader.geolife_add_modes_to_triplegs(triplegs, labels, ratio_threshold=0.5, max_triplegs=20, max_duration_tripleg=604800)[source]

Add available mode labels to geolife data.

The Geolife dataset provides a set of tripleg labels that are defined by a duration but are not matched to the Geolife tracking data. This function matches the labels to triplegs based on their temporal overlap.

Parameters:
  • triplegs (GeoDataFrame (as trackintel triplegs)) – Geolife triplegs.

  • labels (dictionary) – Geolife labels as provided by the trackintel read_geolife function.

  • ratio_threshold (float, default 0.5) – How much a label needs to overlap a tripleg to assign a the to this tripleg.

  • max_triplegs (int, default 20) – Number of neighbors that are considered in the search for matching triplegs.

  • max_duration_tripleg (float, default 7 * 24 * 60 * 60 (seconds)) – Used for a primary filter. All triplegs that are further away in time than ‘max_duration_tripleg’ from a label won’t be considered for matching.

Returns:

tpls – triplegs with mode labels.

Return type:

GeoDataFrame (as trackintel triplegs)

Notes

In the case that several labels overlap with the same tripleg the label with the highest overlap (relative to the tripleg) is chosen

Example

>>> from trackintel.io.dataset_reader import read_geolife, geolife_add_modes_to_triplegs
>>> pfs, mode_labels = read_geolife(os.path.join('downloads', 'Geolife Trajectories 1.3'))
>>> pfs, sp = pfs.as_positionfixes.generate_staypoints()
>>> pfs, tpls = pfs.as_positionfixes.generate_triplegs(sp)
>>> tpls = geolife_add_modes_to_triplegs(tpls, mode_labels)