Anomeda DataFrame API

Here you can find the documentation for anomeda.DataFrame class.

DataFrame

DataFrame(*args, **kwargs)

Bases: DataFrame

Data to be processed by anomeda. The class inherits pandas.DataFrame.

Please note that whenever the underlying pandas.DataFrame object is changed, you may need to apply the constructor again in order to keep some of the characteristics of the data consistent with the new object.

Parameters:

  • *args

    Parameters for initialization a pandas.DataFrame object. Other parameters must be passed as **kwargs only.

  • **kwargs

    Parameters for initialization a pandas.DataFrame object. Other parameters must be passed as **kwargs only.

  • measures_names ('list | tuple' = []) –

    A list containing columns considered as measures. If None, your data is supposed to have no measures.

  • measures_types ('dict' = {}) –

    A dictionary containing 'categorical' and/or 'continuous' keys and list of measures as values. Continuous measures will be discretized automatically if not presented in discretized_measures parameter. If your data has any measures, you must provide its' types.

  • discretized_measures_mapping ('dict' = {}) –

    Custom dictionary with a mapping between a discrete value of the meauser and corresponding continous ranges. The lower bound must be including, the higher bound must be excluding. It uses the following format:

    {
        'measure_name': {
            discrete_value_1: [[continuous_threshold_min_inc, continuous_threshold_max_excl], [...]],
            descrete_value_2: ... 
        }
    }
    
  • discretized_measures ('dict' = {}) –

    A dictionary containig names of the measures as keys and array-like objects containing customly discretized values of the measure. If not provided, continuous measures will be discretized automatically.

  • index_name ('str | list | None' = None) –

    An index column containg Integer or pandas.DatetimeIndex. If None, index is taken from the pandas.DataFrame.

  • metric_name (str) –

    A metric column.

  • agg_func

    Way of aggregating metric_name by measures. Can be 'sum', 'avg', 'count' or callable compatible with pandas.DataFrame.groupby.

Examples:

anmd_df = anomeda.DataFrame(
    df,
    measures_names=['dummy_measure_col', 'dummy_numeric_measure_col'],
    measures_types={
        'categorical': ['dummy_measure_col'], 
        'continuous': ['dummy_numeric_measure_col']
    },
    index_name='dt',
    metric_name='metric_col',
    agg_func='sum'
)

copy_anomeda_df

copy_anomeda_df()

Return a copy of an anomeda.DataFrame object

get_agg_func

get_agg_func()

Return the function used to aggregate the metric by measures.

get_discretization_mapping

get_discretization_mapping()

Return a dict with a mapping between discrete values and actual ranges of continous measures.

In some cases, there may be more than one interval for each discrete values

Examples:

>>> anmd_df.get_discretization_mapping()

{
    'dummy_numeric_measure': {
        0: [[0.08506988648110014, 0.982366623262143]], # [[inc, exl)]
        1: [[0.9855150328648835, 2.458970726947438]] # [[inc, exl)]
    }
}

get_discretized_measures

get_discretized_measures()

Return discretized versions of continous measures.

get_index_name

get_index_name()

Return the name of an index column.

get_measures_names

get_measures_names()

Return a list of columns considered as measures.

get_measures_types

get_measures_types()

Return the measures_types dict.

get_metric_name

get_metric_name()

Return the name of a metric column.

replace_df

replace_df(data: pandas.DataFrame, inplace=False, keep_clusters: bool = False, keep_trends: bool = False, keep_discretization: bool = False)

Replace the pandas.DataFrame content, underlying the anomeda.DataFrame, with a new one

Parameters:

  • data (DataFrame) –

    A new data object.

  • inplace (bool, default: False ) –

    If True, then no new object will be returned. Otherwise, create and return a new anomeda.DataFrame

set_agg_func

set_agg_func(agg_func)

Set a function to aggregate the metric by measures.

Parameters:

  • agg_func

    Can be "sum", "avg", "count" or callable compatible with pandas.DataFrame.groupby

set_discretization_mapping

set_discretization_mapping(discretized_measures_mapping, recalculate_measures=True)

Set custom thresholds for discretization.

Parameters:

  • discretized_measures_mapping (dict) –

    Dict with mapping between discrete value of the meause and corresponding continous values. Threshold must have the following format. As you can see, several different ranges of continuous values may be mapped into the same descrete values if you want. The lower bound must be including, the higher bound must be excluding.

    {
        'measure_name': {
            discrete_value: [[continuous_threshold_min_inc, continuous_threshold_max_excl], [..., ...], ...], 
            ...
            },
        ...
    }
    

Examples:

anmd_df.set_discretization_mapping({
    'dummy_numeric_measure': {
        0: [[0.00, 0.05001], [0.95, 1.001]], # may correspond to "extreme" values; 0.05 are 1. are excluding bounds
        1: [[0.5, 0.94999]] # may correspond to "normal" values; 94999 is an excluding bound
    }
})

set_discretized_measures

set_discretized_measures(discretized_measures: dict)

Set custom discretization for continous measures.

Parameters:

  • discretized_measures (dict) –

    Dict containing discrete values of each measure in the format {'measure_name': [0, 1, 1, ...]}. Array of values must have same shape as original measure had.

set_measures_names

set_measures_names(measures_names)

Let anomeda.DataFrame object know what columns are measures.

Columns are picked from an underlying pandas.DataFrame object, so they must be present there.

Parameters:

  • measures_names (list of str) –

    List containing columns which will be considered as measures

set_measures_types

set_measures_types(measures_types: dict)

Set measures types.

Measure can be either 'categorical' or 'continous'. Types are used to clusterize the data properly.

Parameters:

  • measures_types (dict) –

    Dict containing 'continous' and/or 'categorical' keys and lists of measures as values

Examples:

anmd_df.set_measures_types({
    'continous': ['numeric_measure_1'],
    'categorical': ['measure_1']
})

set_metric_name

set_metric_name(metric_name)

Set the name of a metric to be analyzed.

Parameters:

  • metric_name (str) –

    Must be present among columns of an underlying pandas.DataFrame. If metric column is currently set as a measure, you need to change the list of measures first