pimmslearn.plotting package

pimmslearn.plotting package#

pimmslearn.plotting.make_large_descriptors(size='xx-large')[source]#

Helper function to have very large titles, labes and tick texts for matplotlib plots per default.

size: str: fontsize or allowed category. Change default if necessary, default ‘xx-large’

pimmslearn.plotting.plot_cutoffs(df: pd.DataFrame, feat_completness_over_samples: int = None, min_feat_in_sample: int = None) → tuple[matplotlib.figure.Figure, np.array[matplotlib.axes.Axes]][source]#

plot number of available features along index and columns (feat vs samples), potentially including some cutoff.

Parameters:

df (pd.DataFrame) – DataFrame in wide data format.
feat_completness_over_samples (int, optional) – horizental line to plot as cutoff for features, by default None
min_feat_in_sample (int, optional) – horizental line to plot as cutoff for samples, by default None

Returns:

_description_

Return type:

tuple[matplotlib.figure.Figure, np.array[matplotlib.axes.Axes]]

pimmslearn.plotting.plot_feat_counts(df_counts: DataFrame, feat_name: str, n_samples: int, ax=None, figsize=(15, 10), count_col='counts', **kwargs)[source]#

pimmslearn.plotting.plot_rolling_error(errors: DataFrame, metric_name: str, window: int = 200, min_freq=None, freq_col: str = 'freq', colors_to_use=None, ax=None)[source]#

pimmslearn.plotting.savefig(fig, name, folder: Path = '.', pdf=True, dpi=300, tight_layout=True)#: Save matplotlib Figure (having method savefig) as pdf and png.

pimmslearn.plotting.select_dates(date_series: Series, max_ticks=30) → array[source]#

Get unique dates (single days) for selection in pd.plot.line with xticks argument.

Parameters:

date_series (pd.Series) – datetime series to use (values, not index)
max_ticks (int, optional) – maximum number of unique ticks to select, by default 30

Returns:

_description_

Return type:

np.array

pimmslearn.plotting.select_xticks(ax: Axes, max_ticks: int = 50) → list[source]#

Limit the number of xticks displayed.

Parameters:

ax (matplotlib.axes.Axes) – Axes object to manipulate
max_ticks (int, optional) – maximum number of set ticks on x-axis, by default 50

Returns:

list of current ticks for x-axis. Either new or old (depending if something was changed).

Return type:

list

Submodules#

pimmslearn.plotting.data module#

Plot data distribution based on pandas DataFrames or Series.

pimmslearn.plotting.data.get_min_max_iterable(series: Iterable[Series]) → Tuple[int][source]#: Get the min and max as integer from an iterable of pandas.Series.

pimmslearn.plotting.data.min_max(s: Series) → Tuple[int][source]#

Get the min and max as integer from a pandas.Series.

Parameters:: s (pd.Series) – Series of intensities.
Returns:: _description_
Return type:: Tuple[int]

pimmslearn.plotting.data.plot_feat_median_over_prop_missing(data: DataFrame, type: str = 'scatter', ax: Axes | None = None, s: int = 1, return_plot_data: bool = False) → Axes | Tuple[Axes, DataFrame][source]#: Plot feature median over proportion missing in that feature. Sorted by feature median into bins.

pimmslearn.plotting.data.plot_histogram_intensities(s: Series, interval_bins=1, min_max: Tuple[int] | None = None, ax=None, **kwargs) → Tuple[Axes, range][source]#: Plot intensities in Series in a certain range and equally spaced intervals.

pimmslearn.plotting.data.plot_missing_dist_boxplots(data: DataFrame, min_feat_per_sample=None, min_samples_per_feat=None) → Figure[source]#

pimmslearn.plotting.data.plot_missing_dist_highdim(data: DataFrame, min_feat_per_sample: int | None = None, min_samples_per_feat: int | None = None) → Figure[source]#

Plot missing distribution (cdf) in high dimensional data.

Parameters:

data (pd.DataFrame) – Intensity table with samples in rows and features in columns.
min_feat_per_sample (int, optional) – Show the minimum required features a sample has to have, by default None
min_samples_per_feat (int, optional) – Show the minimum required number of samples a feature has to be found in, by default None

Returns:

Figure with two plots (Axes).

Return type:

matplotlib.figure.Figure

pimmslearn.plotting.data.plot_missing_pattern_histogram(data: DataFrame, bins: int = 20, min_feat_per_sample=None, min_samples_per_feat=None) → Figure[source]#

pimmslearn.plotting.data.plot_missing_pattern_violinplot(data: DataFrame, min_feat_per_sample=None, min_samples_per_feat=None) → Figure[source]#

pimmslearn.plotting.data.plot_observations(df: DataFrame, ax: Axes | None = None, title: str = '', axis: int = 1, size: int = 1, ylabel: str = 'Frequency', xlabel: str | None = None) → Axes[source]#

Plot non missing observations by row (axis=1) or column (axis=0) in order of number of available observations. No binning is applied, only counts of non-missing values are plotted.

Parameters:

df (pd.DataFrame) – DataFrame on which notna is applied
ax (Axes, optional) – Axes to plot on, by default None
title (str, optional) – Axes title, by default ‘’
axis (int, optional) – dimension to sum over, by default 1
ylabel (str, optional) – y-Axis label, by default ‘number of features’
xlabel (str, optional) – x-Axis label, by default ‘Samples ordered by number of features’

Returns:

Axes on which plot was plotted

Return type:

Axes

pimmslearn.plotting.defaults module#

class pimmslearn.plotting.defaults.ModelColorVisualizer(models, palette)[source]#

Bases: object

as_hex()[source]#: Return a color palette with hex codes instead of RGB values.

pimmslearn.plotting.defaults.assign_colors(models)[source]#

pimmslearn.plotting.errors module#

Plot errors based on DataFrame with model predictions.

pimmslearn.plotting.errors.get_data_for_errors_by_median(errors: DataFrame, feat_name: str, metric_name: str, model_column: str = 'model', seed: int = 42) → DataFrame[source]#

Extract Bars with confidence intervals from seaborn plot for seaborn 0.13 and above. Confident intervals are calculated with bootstrapping(sampling the mean).

Parameters:

errors (model_column in)
function (DataFrame created by plot_errors_by_median)
feat_name (str)
{feat_name}') (feature name assigned(was transformed to 'intensity binned by median of)
metric_name (str)
errors(MAE (Metric used to calculate)
MSE
bin (etc) of intensities in)
model_column (str)
errors
names (defining model)

pimmslearn.plotting.errors.plot_errors_binned(pred: DataFrame, target_col='observed', ax: Axes | None = None, palette: dict | None = None, metric_name: str | None = None, errwidth: float = 1.2) → Axes[source]#

pimmslearn.plotting.errors.plot_errors_by_median(pred: DataFrame, feat_medians: Series, target_col='observed', ax: Axes | None = None, palette: dict | None = None, feat_name: str | None = None, metric_name: str | None = None, errwidth: float = 1.2) → tuple[Axes, DataFrame][source]#

pimmslearn.plotting.errors.plot_rolling_error(errors: DataFrame, metric_name: str, window: int = 200, min_freq=None, freq_col: str = 'freq', colors_to_use=None, ax=None)[source]#

pimmslearn.plotting.plotly module#

pimmslearn.plotting.plotly.apply_default_layout(fig)[source]#

pimmslearn.plotting package

Contents

pimmslearn.plotting package#

Submodules#

pimmslearn.plotting.data module#

pimmslearn.plotting.defaults module#

pimmslearn.plotting.errors module#

pimmslearn.plotting.plotly module#