Anomeda DataFrame API
Here you can find the documentation for anomeda.DataFrame class.
DataFrame
DataFrame(*args, **kwargs)
Bases: DataFrame
Data to be processed by anomeda. The class inherits pandas.DataFrame.
Please note that whenever the underlying pandas.DataFrame object is changed, you may need to apply the constructor again in order to keep some of the characteristics of the data consistent with the new object.
Parameters:
-
*args–Parameters for initialization a pandas.DataFrame object. Other parameters must be passed as **kwargs only.
-
**kwargs–Parameters for initialization a pandas.DataFrame object. Other parameters must be passed as **kwargs only.
-
measures_names('list | tuple' = []) –A list containing columns considered as measures. If None, your data is supposed to have no measures.
-
measures_types('dict' = {}) –A dictionary containing 'categorical' and/or 'continuous' keys and list of measures as values. Continuous measures will be discretized automatically if not presented in discretized_measures parameter. If your data has any measures, you must provide its' types.
-
discretized_measures_mapping('dict' = {}) –Custom dictionary with a mapping between a discrete value of the meauser and corresponding continous ranges. The lower bound must be including, the higher bound must be excluding. It uses the following format:
{ 'measure_name': { discrete_value_1: [[continuous_threshold_min_inc, continuous_threshold_max_excl], [...]], descrete_value_2: ... } } -
discretized_measures('dict' = {}) –A dictionary containig names of the measures as keys and array-like objects containing customly discretized values of the measure. If not provided, continuous measures will be discretized automatically.
-
index_name('str | list | None' = None) –An index column containg Integer or pandas.DatetimeIndex. If None, index is taken from the pandas.DataFrame.
-
metric_name(str) –A metric column.
-
agg_func–Way of aggregating metric_name by measures. Can be 'sum', 'avg', 'count' or callable compatible with pandas.DataFrame.groupby.
Examples:
anmd_df = anomeda.DataFrame(
df,
measures_names=['dummy_measure_col', 'dummy_numeric_measure_col'],
measures_types={
'categorical': ['dummy_measure_col'],
'continuous': ['dummy_numeric_measure_col']
},
index_name='dt',
metric_name='metric_col',
agg_func='sum'
)
copy_anomeda_df
copy_anomeda_df()
Return a copy of an anomeda.DataFrame object
get_agg_func
get_agg_func()
Return the function used to aggregate the metric by measures.
get_discretization_mapping
get_discretization_mapping()
Return a dict with a mapping between discrete values and actual ranges of continous measures.
In some cases, there may be more than one interval for each discrete values
Examples:
>>> anmd_df.get_discretization_mapping()
{
'dummy_numeric_measure': {
0: [[0.08506988648110014, 0.982366623262143]], # [[inc, exl)]
1: [[0.9855150328648835, 2.458970726947438]] # [[inc, exl)]
}
}
get_discretized_measures
get_discretized_measures()
Return discretized versions of continous measures.
get_index_name
get_index_name()
Return the name of an index column.
get_measures_names
get_measures_names()
Return a list of columns considered as measures.
get_measures_types
get_measures_types()
Return the measures_types dict.
get_metric_name
get_metric_name()
Return the name of a metric column.
replace_df
replace_df(data: pandas.DataFrame, inplace=False, keep_clusters: bool = False, keep_trends: bool = False, keep_discretization: bool = False)
Replace the pandas.DataFrame content, underlying the anomeda.DataFrame, with a new one
Parameters:
-
data(DataFrame) –A new data object.
-
inplace(bool, default:False) –If True, then no new object will be returned. Otherwise, create and return a new anomeda.DataFrame
set_agg_func
set_agg_func(agg_func)
Set a function to aggregate the metric by measures.
Parameters:
-
agg_func–Can be "sum", "avg", "count" or callable compatible with pandas.DataFrame.groupby
set_discretization_mapping
set_discretization_mapping(discretized_measures_mapping, recalculate_measures=True)
Set custom thresholds for discretization.
Parameters:
-
discretized_measures_mapping(dict) –Dict with mapping between discrete value of the meause and corresponding continous values. Threshold must have the following format. As you can see, several different ranges of continuous values may be mapped into the same descrete values if you want. The lower bound must be including, the higher bound must be excluding.
{ 'measure_name': { discrete_value: [[continuous_threshold_min_inc, continuous_threshold_max_excl], [..., ...], ...], ... }, ... }
Examples:
anmd_df.set_discretization_mapping({
'dummy_numeric_measure': {
0: [[0.00, 0.05001], [0.95, 1.001]], # may correspond to "extreme" values; 0.05 are 1. are excluding bounds
1: [[0.5, 0.94999]] # may correspond to "normal" values; 94999 is an excluding bound
}
})
set_discretized_measures
set_discretized_measures(discretized_measures: dict)
Set custom discretization for continous measures.
Parameters:
-
discretized_measures(dict) –Dict containing discrete values of each measure in the format {'measure_name': [0, 1, 1, ...]}. Array of values must have same shape as original measure had.
set_measures_names
set_measures_names(measures_names)
Let anomeda.DataFrame object know what columns are measures.
Columns are picked from an underlying pandas.DataFrame object, so they must be present there.
Parameters:
-
measures_names(list of str) –List containing columns which will be considered as measures
set_measures_types
set_measures_types(measures_types: dict)
Set measures types.
Measure can be either 'categorical' or 'continous'. Types are used to clusterize the data properly.
Parameters:
-
measures_types(dict) –Dict containing 'continous' and/or 'categorical' keys and lists of measures as values
Examples:
anmd_df.set_measures_types({
'continous': ['numeric_measure_1'],
'categorical': ['measure_1']
})
set_metric_name
set_metric_name(metric_name)
Set the name of a metric to be analyzed.
Parameters:
-
metric_name(str) –Must be present among columns of an underlying pandas.DataFrame. If metric column is currently set as a measure, you need to change the list of measures first