pimmslearn.plotting package#
- pimmslearn.plotting.make_large_descriptors(size='xx-large')[source]#
Helper function to have very large titles, labes and tick texts for matplotlib plots per default.
- size: str
fontsize or allowed category. Change default if necessary, default ‘xx-large’
- pimmslearn.plotting.plot_cutoffs(df: pd.DataFrame, feat_completness_over_samples: int = None, min_feat_in_sample: int = None) tuple[matplotlib.figure.Figure, np.array[matplotlib.axes.Axes]][source]#
plot number of available features along index and columns (feat vs samples), potentially including some cutoff.
- Parameters:
- Returns:
_description_
- Return type:
tuple[matplotlib.figure.Figure, np.array[matplotlib.axes.Axes]]
- pimmslearn.plotting.plot_feat_counts(df_counts: DataFrame, feat_name: str, n_samples: int, ax=None, figsize=(15, 10), count_col='counts', **kwargs)[source]#
- pimmslearn.plotting.plot_rolling_error(errors: DataFrame, metric_name: str, window: int = 200, min_freq=None, freq_col: str = 'freq', colors_to_use=None, ax=None)[source]#
- pimmslearn.plotting.savefig(fig, name, folder: Path = '.', pdf=True, dpi=300, tight_layout=True)#
Save matplotlib Figure (having method savefig) as pdf and png.
- pimmslearn.plotting.select_dates(date_series: Series, max_ticks=30) array[source]#
Get unique dates (single days) for selection in pd.plot.line with xticks argument.
- Parameters:
date_series (pd.Series) – datetime series to use (values, not index)
max_ticks (int, optional) – maximum number of unique ticks to select, by default 30
- Returns:
_description_
- Return type:
np.array
- pimmslearn.plotting.select_xticks(ax: Axes, max_ticks: int = 50) list[source]#
Limit the number of xticks displayed.
- Parameters:
ax (matplotlib.axes.Axes) – Axes object to manipulate
max_ticks (int, optional) – maximum number of set ticks on x-axis, by default 50
- Returns:
list of current ticks for x-axis. Either new or old (depending if something was changed).
- Return type:
Submodules#
pimmslearn.plotting.data module#
Plot data distribution based on pandas DataFrames or Series.
- pimmslearn.plotting.data.get_min_max_iterable(series: Iterable[Series]) Tuple[int][source]#
Get the min and max as integer from an iterable of pandas.Series.
- pimmslearn.plotting.data.min_max(s: Series) Tuple[int][source]#
Get the min and max as integer from a pandas.Series.
- Parameters:
s (pd.Series) – Series of intensities.
- Returns:
_description_
- Return type:
Tuple[int]
- pimmslearn.plotting.data.plot_feat_median_over_prop_missing(data: DataFrame, type: str = 'scatter', ax: Axes | None = None, s: int = 1, return_plot_data: bool = False) Axes | Tuple[Axes, DataFrame][source]#
Plot feature median over proportion missing in that feature. Sorted by feature median into bins.
- pimmslearn.plotting.data.plot_histogram_intensities(s: Series, interval_bins=1, min_max: Tuple[int] | None = None, ax=None, **kwargs) Tuple[Axes, range][source]#
Plot intensities in Series in a certain range and equally spaced intervals.
- pimmslearn.plotting.data.plot_missing_dist_boxplots(data: DataFrame, min_feat_per_sample=None, min_samples_per_feat=None) Figure[source]#
- pimmslearn.plotting.data.plot_missing_dist_highdim(data: DataFrame, min_feat_per_sample: int | None = None, min_samples_per_feat: int | None = None) Figure[source]#
Plot missing distribution (cdf) in high dimensional data.
- Parameters:
data (pd.DataFrame) – Intensity table with samples in rows and features in columns.
min_feat_per_sample (int, optional) – Show the minimum required features a sample has to have, by default None
min_samples_per_feat (int, optional) – Show the minimum required number of samples a feature has to be found in, by default None
- Returns:
Figure with two plots (Axes).
- Return type:
- pimmslearn.plotting.data.plot_missing_pattern_histogram(data: DataFrame, bins: int = 20, min_feat_per_sample=None, min_samples_per_feat=None) Figure[source]#
- pimmslearn.plotting.data.plot_missing_pattern_violinplot(data: DataFrame, min_feat_per_sample=None, min_samples_per_feat=None) Figure[source]#
- pimmslearn.plotting.data.plot_observations(df: DataFrame, ax: Axes | None = None, title: str = '', axis: int = 1, size: int = 1, ylabel: str = 'Frequency', xlabel: str | None = None) Axes[source]#
Plot non missing observations by row (axis=1) or column (axis=0) in order of number of available observations. No binning is applied, only counts of non-missing values are plotted.
- Parameters:
df (pd.DataFrame) – DataFrame on which notna is applied
ax (Axes, optional) – Axes to plot on, by default None
title (str, optional) – Axes title, by default ‘’
axis (int, optional) – dimension to sum over, by default 1
ylabel (str, optional) – y-Axis label, by default ‘number of features’
xlabel (str, optional) – x-Axis label, by default ‘Samples ordered by number of features’
- Returns:
Axes on which plot was plotted
- Return type:
Axes
pimmslearn.plotting.defaults module#
pimmslearn.plotting.errors module#
Plot errors based on DataFrame with model predictions.
- pimmslearn.plotting.errors.get_data_for_errors_by_median(errors: DataFrame, feat_name: str, metric_name: str, model_column: str = 'model', seed: int = 42) DataFrame[source]#
Extract Bars with confidence intervals from seaborn plot for seaborn 0.13 and above. Confident intervals are calculated with bootstrapping(sampling the mean).
- Parameters:
errors (model_column in)
function (DataFrame created by plot_errors_by_median)
feat_name (str)
{feat_name}') (feature name assigned(was transformed to 'intensity binned by median of)
metric_name (str)
errors(MAE (Metric used to calculate)
MSE
bin (etc) of intensities in)
model_column (str)
errors
names (defining model)
- pimmslearn.plotting.errors.plot_errors_binned(pred: DataFrame, target_col='observed', ax: Axes | None = None, palette: dict | None = None, metric_name: str | None = None, errwidth: float = 1.2) Axes[source]#