vaep.models package#

class vaep.models.Metrics[source]#

Bases: object

add_metrics(pred, key)[source]#
class vaep.models.RecorderDump(recorder, name)[source]#

Bases: object

Simple Class to hold fastai Recorder Callback data for serialization using pickle.

filename_tmp = 'recorder_{}.pkl'#
classmethod load(filepath, name)[source]#
plot_loss(norm_train: int64 = 1, norm_val: int64 = 1, skip_start: int = 5, with_valid: bool = True, ax: Optional[Axes] = None) Axes#

Adapted Recorder.plot_loss to accept matplotlib.axes.Axes argument. Allows to build combined graphics.

Parameters:
  • recorder (learner.Recorder) – fastai Recorder object, learn.recorder

  • norm_train (np.int64, optional) – Normalize epoch loss by number of training samples, by default 1

  • norm_val (np.int64, optional) – Normalize epoch loss by number of validation samples, by default 1

  • skip_start (int, optional) – Skip N first batch metrics, by default 5

  • with_valid (bool, optional) – Add validation data loss, by default True

  • ax (plt.Axes, optional) – Axes to plot on, by default None

Returns:

[description]

Return type:

plt.Axes

save(folder='.')[source]#
vaep.models.calc_net_weight_count(model: Module) int[source]#
vaep.models.calculte_metrics(pred_df: ~pandas.core.frame.DataFrame, true_col: ~typing.Optional[~typing.List[str]] = None, scoring: ~typing.List[~typing.Tuple[str, ~typing.Callable]] = [('MSE', <function mean_squared_error>), ('MAE', <function mean_absolute_error>)]) dict[source]#

Create metrics based on predictions, a truth reference and a list of scoring function with a name.

Parameters:
  • pred_df (pd.DataFrame) – Prediction DataFrame containing true_col.

  • true_col (List[str], optional) – Column of ground truth values, by default None

  • scoring (List[Tuple[str, Callable]], optional) – List of tuples. A tuple is a set of (key, funtion) pairs. The function take y_true and y_pred - as for all sklearn metrics, by default scoring

Returns:

[description]

Return type:

pd.DataFrame

Raises:

ValueError – [description]

vaep.models.collect_metrics(metrics_jsons: List, key_fct: Callable) dict[source]#

Collect and aggregate a bunch of json metrics.

Parameters:
  • metrics_jsons (List) – list of filepaths to json metric files

  • key_fct (Callable) – Callable which creates key function of a single filepath

Returns:

Aggregated metrics dictionary with outer key defined by key_fct

Return type:

dict

Raises:

AssertionError: – If key should be overwritten, but value would change.

vaep.models.compare_indices(first_index: Index, second_index: Index) Index[source]#

Show difference of indices in other index wrt. to first. First should be the larger collection wrt to the second. This is the set difference of two Index objects.

If second index is a superset of indices of the first, the set will be empty, although there are differences (default behaviour in pandas).

Parameters:
  • first_index (pd.Index) – Index, should be superset

  • second_index (pd.Index) – Index, should be the subset

Returns:

Return a new Index with elements of the first index not in second.

Return type:

pd.Index

vaep.models.get_df_from_nested_dict(nested_dict, column_levels=('data_split', 'model', 'metric_name'), row_name='subset')[source]#
vaep.models.plot_loss(recorder: Recorder, norm_train: int64 = 1, norm_val: int64 = 1, skip_start: int = 5, with_valid: bool = True, ax: Optional[Axes] = None) Axes[source]#

Adapted Recorder.plot_loss to accept matplotlib.axes.Axes argument. Allows to build combined graphics.

Parameters:
  • recorder (learner.Recorder) – fastai Recorder object, learn.recorder

  • norm_train (np.int64, optional) – Normalize epoch loss by number of training samples, by default 1

  • norm_val (np.int64, optional) – Normalize epoch loss by number of validation samples, by default 1

  • skip_start (int, optional) – Skip N first batch metrics, by default 5

  • with_valid (bool, optional) – Add validation data loss, by default True

  • ax (plt.Axes, optional) – Axes to plot on, by default None

Returns:

[description]

Return type:

plt.Axes

vaep.models.plot_training_losses(learner: Learner, name: str, ax=None, norm_factors=array([1, 1]), folder='figures', figsize=(15, 8))[source]#
vaep.models.split_prediction_by_mask(pred: DataFrame, mask: DataFrame, check_keeps_all: bool = False) Tuple[DataFrame, DataFrame][source]#

[summary]

Parameters:
  • pred (pd.DataFrame) – prediction DataFrame

  • mask (pd.DataFrame) – Mask with same indices as pred DataFrame.

  • check_keeps_all (bool, optional) – if True, perform sanity checks, by default False

Returns:

prediction for inversed mask, and predicitions for mask

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

Submodules#

vaep.models.ae module#

Autoencoder model trained using denoising procedure.

Variational Autencoder model adapter should be moved to vaep.models.vae. Or model class could be put somewhere else.

class vaep.models.ae.AutoEncoderAnalysis(train_df: DataFrame, val_df: DataFrame, model: Module, model_kwargs: dict, transform: Pipeline, decode: List[str], bs=64)[source]#

Bases: ModelAnalysis

dls: DataLoaders#
get_preds_from_df(df_wide: DataFrame) DataFrame[source]#
get_test_dl(df_wide: DataFrame, bs: int = 64) DataFrame[source]#
learn: Learner#
model: Module#
params: dict#
transform: VaepPipeline#
class vaep.models.ae.Autoencoder(n_features: int, n_neurons: Union[int, List[int]], activation=LeakyReLU(negative_slope=0.1), last_decoder_activation=None, dim_latent: int = 10)[source]#

Bases: Module

Autoencoder base class.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool#
class vaep.models.ae.DatasetWithTargetAdapter(*, after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, after_cancel_backward=None, after_backward=None, before_step=None, after_cancel_step=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None)[source]#

Bases: Callback

after_pred()[source]#
before_batch()[source]#

Remove cont. values from batch (mask)

class vaep.models.ae.ModelAdapter(p=0.1)[source]#

Bases: DatasetWithTargetAdapter

Models forward only expects on input matrix. Apply mask from dataloader to both pred and targets.

Keep original dimension, i.e. also predictions for NA.

after_loss()[source]#
after_pred()[source]#
before_batch()[source]#

Remove cont. values from batch (mask)

class vaep.models.ae.ModelAdapterVAE(*, after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, after_cancel_backward=None, after_backward=None, before_step=None, after_cancel_step=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None)[source]#

Bases: DatasetWithTargetAdapter

Models forward method only expects one input matrix. Apply mask from dataloader to both pred and targets.

after_loss()[source]#
after_pred()[source]#
before_batch()[source]#

Remove cont. values from batch (mask)

vaep.models.ae.get_missing_values(df_train_wide: DataFrame, val_idx: Index, test_idx: Index, pred: Series) Series[source]#

Build missing value predictions based on a set of prediction and splits.

Parameters:
  • df_train_wide (pd.DataFrame) – Training data in wide format.

  • val_idx (pd.Index) – Indices (MultiIndex of Sample and Feature) of validation split

  • test_idx (pd.Index) – Indices (MultiIndex of Sample and Feature) of test split

  • pred (pd.Series) – Mulitindexed Series of all predictions.

Returns:

Multiindex series of missing values in training data which are not in validiation and test split.

Return type:

pd.Series

vaep.models.ae.get_preds_from_df(df: ~pandas.core.frame.DataFrame, learn: ~fastai.learner.Learner, transformer: ~vaep.transform.VaepPipeline, position_pred_tuple: ~typing.Optional[int] = None, dataset: ~torch.utils.data.dataset.Dataset = <class 'vaep.io.datasets.DatasetWithTarget'>)[source]#

Get predictions for specified DataFrame, using a fastai learner and a custom sklearn Pipeline.

Parameters:
  • df (pd.DataFrame) – DataFrame to create predictions from.

  • learn (fastai.learner.Learner) – fastai Learner with trained model

  • transformer (vaep.transform.VaepPipeline) – Pipeline with separate encode and decode

  • position_pred_tuple (int, optional) – In that the model returns multiple outputs, select the one which contains the predictions matching the target variable (VAE case), by default None

  • dataset (torch.utils.data.Dataset, optional) – Dataset to build batches from, by default vaep.io.datasets.DatasetWithTarget

Returns:

tuple of pandas DataFrames (prediciton and target) based on learn.get_preds

Return type:

tuple

vaep.models.analysis module#

class vaep.models.analysis.ModelAnalysis[source]#

Bases: Analysis

Class describing what an ModelAnalysis is supposed to have as attributes.

dls: DataLoaders#
learn: Learner#
model: Module#
params: dict#
transform: VaepPipeline#

vaep.models.collab module#

class vaep.models.collab.CollabAnalysis(datasplits: DataSplits, sample_column: str = 'Sample ID', item_column: str = 'peptide', target_column: str = 'intensity', model_kwargs: Optional[dict] = None, batch_size: int = 64)[source]#

Bases: ModelAnalysis

dls: DataLoaders#
learn: Learner#
model: Module#
params: dict#
transform: VaepPipeline#
vaep.models.collab.combine_data(train_df: DataFrame, val_df: DataFrame) Tuple[DataFrame, float][source]#

Helper function to combine training and validation data in long-format. The training and validation data will be mixed up in CF training as the sample embeddings have to be trained for all samples. The returned frac can be used to have the same number of (non-missing) validation samples as before.

Parameters:
  • train_df (pd.DataFrame) – Consecutive training data in long-format, each row having (unit, feature, value)

  • val_df (pd.DataFrame) – Consecutive training data in long-format, each row having (unit, feature, value)

Returns:

Pandas DataFrame of concatenated samples of training and validation data. Fraction of samples originally in validation data.

Return type:

Tuple[pd.DataFrame, List[list, list]]

vaep.models.collab.get_missing_values(df_train_long: DataFrame, val_idx: Index, test_idx: Index, analysis_collab: CollabAnalysis) Series[source]#

Helper function to get missing values from predictions. Excludes simulated missing values from validation and test data.

Parameters:
  • df_train_long (pd.DataFrame) – Training data in long-format, each row having (unit, feature, value)

  • val_idx (pd.Index) – Validation index (unit, feature)

  • test_idx (pd.Index) – Test index (unit, feature)

  • analysis_collab (CollabAnalysis) – CollabAnalysis object

Returns:

Predicted values for missing values in training data (unit, feature, value)

Return type:

pd.Series

vaep.models.collect_dumps module#

Collects metrics and config files from the experiment directory structure.

vaep.models.collect_dumps.collect(paths: Iterable, load_fn: Callable[[Path], dict]) dict[source]#
vaep.models.collect_dumps.collect_configs(paths: Iterable, load_fn: Callable[[Path], dict]) dict#
vaep.models.collect_dumps.collect_metrics(paths: Iterable, load_fn: Callable[[Path], dict]) dict#
vaep.models.collect_dumps.load_config_file(fname: Path, first_split='config_') dict[source]#
vaep.models.collect_dumps.load_metric_file(fname: Path, first_split='metrics_') dict[source]#
vaep.models.collect_dumps.select_content(s: str, first_split)[source]#

vaep.models.vae module#

VAE implementation based on ronaldiscool/VAETutorial

Adapted to the setup of learning missing values.

  • funnel architecture (or fixed hidden layer layout)

  • loss is adapted to Dataset and FastAI adaptions

  • batchnorm1D for now (not weight norm)

class vaep.models.vae.VAE(n_features: int, n_neurons: List[int], activation=LeakyReLU(negative_slope=0.1), last_decoder_activation=None, dim_latent: int = 10)[source]#

Bases: Module

decode(z)[source]#
encode(x)[source]#
forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_mu_and_logvar(x, detach=False)[source]#
reparameterize(mu, logvar)[source]#
training: bool#
vaep.models.vae.compute_kld(z_mu, z_logvar)[source]#
vaep.models.vae.gaussian_log_prob(z, mu, logvar)[source]#
vaep.models.vae.loss_fct(pred, y, reduction='sum', results: Optional[List] = None, freebits=0.1)[source]#