Dataframe#
Helper functions for work with pandas.DataFrame
- gordo_client.dataframe.dataframe_from_dict(data: dict) DataFrame #
The inverse procedure done by
multi_lvl_column_dataframe_from_dict()
Reconstructed apandas.MultiIndex
column dataframe from a previously serialized one.Expects
data
to be a nested dictionary where each top level key has a value capable of being loaded frompandas.DataFrame.from_dict()
- Parameters:
data – Data to be loaded into a MultiIndex column dataframe
- Return type:
MultiIndex
column dataframe.
Examples
>>> serialized = { ... 'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4}, ... 'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}}, ... 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6}, ... 'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}} ... } >>> dataframe_from_dict(serialized) feature0 feature1 sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1 2019-01-01 0 1 2 3 2019-02-01 4 5 6 7
- gordo_client.dataframe.dataframe_from_parquet_bytes(buf: bytes) DataFrame #
Convert bytes representing a parquet table into a pandas dataframe.
- Parameters:
buf – Bytes representing a parquet table. Can be the direct result from
gordo.server.utils.dataframe_into_parquet_bytes
- Return type:
- gordo_client.dataframe.dataframe_into_parquet_bytes(df: DataFrame, compression: str = 'snappy') bytes #
Convert a dataframe into bytes representing a parquet table.
- Parameters:
df – DataFrame to be compressed
compression – Compression to use, passed to
pyarrow.parquet.write_table()
- Return type:
- gordo_client.dataframe.dataframe_to_dict(df: DataFrame) dict #
Convert a dataframe can have a
pandas.MultiIndex
as columns into a dict.Each key is the top level column name, and the value is the array of columns under the top level name. If it’s a simple dataframe,
pandas.core.DataFrame.to_dict()
will be used.This allows
json.dumps()
to be performed, wherepandas.DataFrame.to_dict()
would convert such a multi-level column dataframe into keys oftuple()
objects, which are not json serializable. However this ends up working withpandas.DataFrame.from_dict()
- Parameters:
df – Dataframe expected to have columns of type
pandas.MultiIndex
2 levels deep.- Return type:
List of records representing the dataframe in a ‘flattened’ form.
Examples
>>> import pprint >>> import pandas as pd >>> import numpy as np >>> columns = pd.MultiIndex.from_tuples((f"feature{i}", f"sub-feature-{ii}") for i in range(2) for ii in range(2)) >>> index = pd.date_range('2019-01-01', '2019-02-01', periods=2) >>> df = pd.DataFrame(np.arange(8).reshape((2, 4)), columns=columns, index=index) >>> df feature0 feature1 sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1 2019-01-01 0 1 2 3 2019-02-01 4 5 6 7 >>> serialized = dataframe_to_dict(df) >>> pprint.pprint(serialized) {'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4}, 'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}}, 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6}, 'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}}