Anomeda methods API
Here you can find the documentation for available endpoints of anomeda Python package.
compare_clusters
compare_clusters(data: anomeda.DataFrame, period1: str, period2: str, breakdowns: 'no' | 'all' | list[str] | list[list[str]] = 'no')
Compare metric values for 2 periods.
The method generates pandas.DataFrame object with descriptions for two periods, for each cluster. You can use it to identify the cluster or set of clusters caused the differences in the overall metric values between two periods.
Parameters:
-
data(DataFrame) –Object containing data to be analyzed
-
period1(str) –Query to filter the first period. For example, 'dt < 10'.
-
period2(str) –Query to filter the second period. For example, 'dt >= 10'.
-
breakdowns(no | all | list[str], default:'no') –If 'no', the metric is grouped by date points only. If 'all', all combinations of measures are used to extract and plot clusters. If list[str], then only specific clusters specified in the list are plotted. If list[list[str]] then each internal list is a list of measures used to extract clusters.
Returns:
-
output(DataFrame) –Object describing the clusters and the changes in the metric behavior between them.
Examples:
anomeda.compare_clusters(
data,
period1='dt < 10',
period2='dt >= 10',
breakdowns='no'
)
extract_trends
extract_trends(x: numpy.ndarray[int] | pandas.DatetimeIndex, y: numpy.ndarray[float], freq: 'frequency unit for pandas.DatetimeIndex' = None, propagation_strategy: 'zeros' | 'ffil' | None = None, max_trends: int | 'auto' = 'auto', min_var_reduction: float[0, 1] | None = 0.5, verbose: bool = False)
Fit and return automatocally fitted linear trends for given X and Y.
The method can extract more than 1 trend if the metric significantly changed its behavior. The sensibility of the method to identify trend changes are set by parameters "max_trends" and "min_var_reduction".
Parameters:
-
x(ndarray[int] | DatetimeIndex) –Indeces corresponding to time points. Must be an increasing array of integers. Some of the values may be omitted, e.g such x is OK: [0, 1, 5, 10].
-
y(ndarray[float]) –Metric values corresponding to time points.
-
propagation_strategy('"zeros" | "ffil" | None' = None, default:None) –How to propogate aggregated time-series for missing index values. - zeros: Let metric for missing index be equal 0. For example, aggregated metric values '2024-01-01': 1 '2024-01-03': 2 Will be propagated as '2024-01-01': 1 '2024-01-02': 0 '2024-01-03': 2 - ffil: Let metric for missing index be equal the last observed value. For example, aggregated metric values '2024-01-01': 1 '2024-01-03': 2 Will be propagated as '2024-01-01': 1 '2024-01-02': 1 '2024-01-03': 2 - None: Use only present metric and index values.
-
max_trends(int | auto, default:"auto") –Number of trends to extract. If int, the method extracts defined amount of trends or less. Less trends may be extracted if no more trends were found or if the min_var_reduction is reached. It would mean taht the variance is already explained by that amount of trends. If 'auto', the method defines the number of trends automatically using min_var_reduction parameter. Default is 'auto'.
-
min_var_reduction(float[0, 1] | None, default:0.5) –% of the variance of approximation error that must be reduced by adding trends comparing to the variance of the initial approximation with one trend. Values closer to 1 will cause extracting more trends, since more trends reduce the variance better. Values closer to 0 will cause producing less trends. If max_trends is set and reached, the extraction finishes regardless the value of the variance. If None, then not used. Default is 0.5.
-
verbose(bool, default:False) –If to produce more logs. Default False.
Returns:
-
trends(dict) –Dict contains the extracted trends in the format { trend_id: (xmin_inc, xmax_exc, (trend_slope, trend_intersept), (n_samples, metric_mean, mae, metric_sum)), ... }
Examples:
>>> x = np.array([0, 1, 4, 5])
>>> y = np.array([11.2, 10.4, 3.4, 3.1])
>>> anomeda.extract_trends(x, y, max_trends=2)
{
0: (0, 4, (-0.7999999999999989, 11.2), (2, 10.8, 0.0, 21.6)), # trend 1, for date points from 0 (inc) to 4 (excl)
# with slope -0.79 and intercept 11.2
# consisting of 2 samples,
# metric mean over date points is 10.8,
# mae for fitting trend over date points is 0.0
# sum for all metric values is 21.6
1: (4, 6, (-0.2999999999999998, 4.6), (2, 3.25, 0.0, 6.5))
}
find_anomalies
find_anomalies(data: anomeda.DataFrame | (numpy.ndarray[int], numpy.ndarray[float]), breakdowns: 'no' | 'all' | list[str] | list[list[str]] = 'no', anomalies_conf: dict = {'p_large': 1, 'p_low': 1, 'n_neighbors': 3}, return_all_points: bool = False, trend_fitting_conf: dict = None)
Find metric anomalies by looking for the most extreme metric changes.
The method finds differences between real metric and a fitted trends, find points with extreme differences and marks them as anomalies. You can find anomalies for automatically extracted clusters only if passing an anomeda.DataFrame.
Parameters:
-
data(DataFrame | (ndarray[int], ndarray[float])) –Object containing metric values to be analyzed. Trends must be fitted for the object with anomeda.fit_trends() method if anomeda.DataFrame is passed.
-
breakdowns(no | all | list[str], default:'no') –If 'no', the metric is grouped by date points only. If 'all', all combinations of measures are used to extract and plot clusters. If list[str], then only specific clusters specified in the list are plotted. If list[list[str]] then each internal list is a list of measures used to extract clusters.
-
anomalies_conf(dict, default:{'p_large': 1., 'p_low': 1., 'n_neighbors': 3}) –Dict containing 'p_large' and 'p_low' values. Both are float values between 0 and 1 corresponding to the % of the anomalies with largest and lowest metric values to be returned. For example, if you set 'p_low' to 0, no points with abnormally low metric values will be returned; if 0.5, then 50% of points with abnormally values will be returned, etc. If some of the keys is not present or None, 1 is assumed. 'n_neighbors' means number of neighbors parameter for sklearn.neighbors.LocalOutlierFactor class. The class is used to find points with abnormally large MAE. The more the parameter, typically, the less sensitive the model to anomalies.
-
return_all_points(bool, default:False) –If False, only anomaly points are returned. If True, all points with anomalies marks are returned. Default False.
-
trend_fitting_conf(dict, default:None) –Used only if data is not anomeda.DataFrame, but numpy arrays, to run anomeda.fit_trends method for them. Parameters are similar to those you would pass to the argument anomeda.fit_trends(..., trend_fitting_conf=...).
Returns:
-
res(DataFrame) –A DataFrame containing fields 'cluster', 'index', 'metric_value', 'fitted_trend_value', 'anomaly'.
Examples:
>>> anomeda.fit_trends(data)
>>> anomeda.find_anomalies(data)
fit_trends
fit_trends(data: anomeda.DataFrame | (numpy.ndarray[int], numpy.ndarray[float]) | (pandas.DatetimeIndex, numpy.ndarray[float]), trend_fitting_conf: dict = {'max_trends': 'auto', 'min_var_reduction': 0.75}, save_trends: bool = True, breakdowns: 'no' | 'all' | list[str] | list[list[str]] = 'no', metric_propagate: 'zeros' | 'ffil' | None = None, min_cluster_size: int | None = None, max_cluster_size: int | None = None, plot: bool = False, df: bool = True, verbose: bool = False)
Fit trends for a time series.
Fit trends using the data from an anomeda.DataFrame or an numpy.ndarray with metric values. You can fit trends for automatically extracted clusters only if passing an anomeda.DataFrame. If anomeda.DataFrame is passed and "save_trends" is True, it stores the trends into anomeda.DataFrame._trends attribute of the class every time the method is called. The method returns a pandas.DataFrame describing trends and/or plots the trends.
Parameters:
-
data(DataFrame | (ndarray[int], ndarray[float]) | (DatetimeIndex, ndarray[float])) –Object containing metric values. If numpy.ndarray, a tuple of arrays corresponding to x (data points) and y (metric values) respectively.
-
trend_fitting_conf(dict, default:{'max_trends': 'auto', 'min_var_reduction': 0.75}) –Parameters for calling anomeda.extract_trends() function. It consists of 'max_trends' parameter, which is responsible for the maximum number of trends that you want to identify, and 'min_var_reduction' parameter, which describes what part of variance must be reduced by estimating trends. Values close to 1 will produce more trends since more trends reduce variance more signigicantly. Default is {'max_trends': 'auto', 'min_var_reduction': 0.75}.
-
save_trends(bool, default:True) –If False, return pandas.DataFrame with trends description without assigning it to the anomeda.DataFrame._trends.
-
breakdowns(no | all | list[str], default:'no') –If 'no', the metric is grouped by date points only. If 'all', all combinations of measures are used to extract and plot clusters. If list[str], then only specific clusters specified in the list are plotted. If list[list[str]] then each internal list is a list of measures used to extract clusters.
-
metric_propagate('"zeros" | "ffil" | None' = None, default:None) –How to propogate aggregated time-series for missing index values. - zeros: Let metric for missing index be equal 0. For example, aggregated metric values '2024-01-01': 1 '2024-01-03': 2 Will be propagated as '2024-01-01': 1 '2024-01-02': 0 '2024-01-03': 2 - ffil: Let metric for missing index be equal the last observed value. For example, aggregated metric values '2024-01-01': 1 '2024-01-03': 2 Will be propagated as '2024-01-01': 1 '2024-01-02': 1 '2024-01-03': 2 - None: Use only present metric and index values.
-
min_cluster_size(int, default:None) –Skip clusters whose total size among all date points is less than the value.
-
max_cluster_size(int, default:None) –Skip clusters whose total size among all date points is more than the value.
-
plot(bool, default:False) –Indicator if to plot fitted trends. anomeda.plot_trends is responsibe for plotting if the flag is True.
-
df(bool, default:True) –Indicator if to return a pandas.DataFrame containing fitted trends.
-
verbose(bool, default:False) –Indicator if to print additional output.
Returns:
-
resp(DataFrame) –An object containing information about trends
Examples:
>>> fitted_trends = anomeda.fit_trends(
data,
trend_fitting_conf={'max_trends': 3},
breakdowns='all',
metric_propagate='zeros',
min_cluster_size=3,
plot=True,
df=True
)
plot_clusters
plot_clusters(data: anomeda.DataFrame, breakdowns: 'no' | 'all' | list[str] | list[list[str]] = 'no', metric_propagate: 'zeros' | 'ffil' | None = None, min_cluster_size: int | None = None, max_cluster_size: int | None = None, colors: dict = None, ax: matplotlib.axes.Axes = None)
Plot metric in clusters.
Plot metric extracted from clusters from anomeda.DataFrame instance.
Parameters:
-
data(DataFrame) –Object containing clusters to be plotted.
-
breakdowns(no | all | list[str], default:'no') –If 'no', the metric is grouped by date points only. If 'all', all combinations of measures are used to extract and plot clusters. If list[str], then only specific clusters specified in the list are plotted. If list[list[str]] then each internal list is a list of measures used to extract clusters.
-
metric_propagate('"zeros" | "ffil" | None' = None, default:None) –How to propogate aggregated time-series for missing index values. - zeros: Let metric for missing index be equal 0. For example, aggregated metric values '2024-01-01': 1 '2024-01-03': 2 Will be propagated as '2024-01-01': 1 '2024-01-02': 0 '2024-01-03': 2 - ffil: Let metric for missing index be equal the last observed value. For example, aggregated metric values '2024-01-01': 1 '2024-01-03': 2 Will be propagated as '2024-01-01': 1 '2024-01-02': 1 '2024-01-03': 2 - None: Use only present metric and index values.
-
min_cluster_size(int, default:None) –Skip clusters whose total size among all date points is less than the value.
-
max_cluster_size(int, default:None) –Skip clusters whose total size among all date points is more than the value.
-
colors(dict, default:None) –Dictionary with a mapping between clusters and colors used in matplotlib.
-
ax(Axes, default:None) –Axes to use for plotting. If None, a new object is created.
Returns:
-
None–
Examples:
>>> anomeda.plot_clusters(
>>> data, # anomeda.DataFrame
>>> breakdowns=[['measure_a'], ['measure_a', 'measure_b']] # plot clusters extracted using measure_a and a combination of measure_a and measure_b
>>> )
>>> anomeda.plot_clusters(
>>> data, # anomeda.DataFrame
>>> breakdowns=['measure_a==1', 'measure_a == 1 and measure_b == 2'] # plot two specified clusters
>>> )
>>> anomeda.plot_clusters(
>>> data, # anomeda.DataFrame
>>> breakdowns='no' # plot a metric grouped by index values only
>>> )
plot_trends
plot_trends(data: 'anomeda.DataFrame | pandas.DataFrame returned from anomeda.fit_trends()', breakdowns: 'no' | 'all' | list[str] | list[list[str]] = 'no', show_metric=True, colors: dict = None, ax: matplotlib.axes.Axes = None)
Plot fitted trends.
Plot trends either from anomeda.DataFrame instance or using a response from anomeda.fit_trends().
Parameters:
-
data(anomeda.DataFrame | pandas.DataFrame returned from anomeda.fit_trends()) –Object containing trends to be plotted.
-
breakdowns(no | all | list[str], default:'no') –If 'no', the metric is grouped by date points only. If 'all', all combinations of measures are used to extract and plot clusters. If list[str], then only specific clusters specified in the list are plotted. If list[list[str]] then each internal list is a list of measures used to extract clusters.
-
show_metric(bool, default:True) –Indicator if to show actual metric on plots.
-
ax(Axes, default:None) –Axes to use for plotting. If None, a new Axes is created.
Returns:
-
None–
Examples:
>>> anomeda.fit_trends(data)
>>> anomeda.plot_trends(data)