Dataframe#

Helper functions for work with pandas.DataFrame

gordo_client.dataframe.dataframe_from_dict(data: dict) DataFrame#

The inverse procedure done by multi_lvl_column_dataframe_from_dict() Reconstructed a pandas.MultiIndex column dataframe from a previously serialized one.

Expects data to be a nested dictionary where each top level key has a value capable of being loaded from pandas.DataFrame.from_dict()

Parameters:

data – Data to be loaded into a MultiIndex column dataframe

Return type:

MultiIndex column dataframe.

Examples

>>> serialized = {
... 'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4},
...              'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}},
... 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6},
...              'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}
... }
>>> dataframe_from_dict(serialized)  
                feature0                    feature1
       sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1
2019-01-01             0             1             2             3
2019-02-01             4             5             6             7
gordo_client.dataframe.dataframe_from_parquet_bytes(buf: bytes) DataFrame#

Convert bytes representing a parquet table into a pandas dataframe.

Parameters:

buf – Bytes representing a parquet table. Can be the direct result from gordo.server.utils.dataframe_into_parquet_bytes

Return type:

pandas.DataFrame

gordo_client.dataframe.dataframe_into_parquet_bytes(df: DataFrame, compression: str = 'snappy') bytes#

Convert a dataframe into bytes representing a parquet table.

Parameters:
  • df – DataFrame to be compressed

  • compression – Compression to use, passed to pyarrow.parquet.write_table()

Return type:

bytes

gordo_client.dataframe.dataframe_to_dict(df: DataFrame) dict#

Convert a dataframe can have a pandas.MultiIndex as columns into a dict.

Each key is the top level column name, and the value is the array of columns under the top level name. If it’s a simple dataframe, pandas.core.DataFrame.to_dict() will be used.

This allows json.dumps() to be performed, where pandas.DataFrame.to_dict() would convert such a multi-level column dataframe into keys of tuple() objects, which are not json serializable. However this ends up working with pandas.DataFrame.from_dict()

Parameters:

df – Dataframe expected to have columns of type pandas.MultiIndex 2 levels deep.

Return type:

List of records representing the dataframe in a ‘flattened’ form.

Examples

>>> import pprint
>>> import pandas as pd
>>> import numpy as np
>>> columns = pd.MultiIndex.from_tuples((f"feature{i}", f"sub-feature-{ii}") for i in range(2) for ii in range(2))
>>> index = pd.date_range('2019-01-01', '2019-02-01', periods=2)
>>> df = pd.DataFrame(np.arange(8).reshape((2, 4)), columns=columns, index=index)
>>> df  
                feature0                    feature1
           sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1
2019-01-01             0             1             2             3
2019-02-01             4             5             6             7
>>> serialized = dataframe_to_dict(df)
>>> pprint.pprint(serialized)
{'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4},
              'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}},
 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6},
              'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}}