vaep.models package#
- class vaep.models.RecorderDump(recorder, name)[source]#
Bases:
object
Simple Class to hold fastai Recorder Callback data for serialization using pickle.
- filename_tmp = 'recorder_{}.pkl'#
- plot_loss(norm_train: int64 = 1, norm_val: int64 = 1, skip_start: int = 5, with_valid: bool = True, ax: Optional[Axes] = None) Axes #
Adapted Recorder.plot_loss to accept matplotlib.axes.Axes argument. Allows to build combined graphics.
- Parameters:
recorder (learner.Recorder) – fastai Recorder object, learn.recorder
norm_train (np.int64, optional) – Normalize epoch loss by number of training samples, by default 1
norm_val (np.int64, optional) – Normalize epoch loss by number of validation samples, by default 1
skip_start (int, optional) – Skip N first batch metrics, by default 5
with_valid (bool, optional) – Add validation data loss, by default True
ax (plt.Axes, optional) – Axes to plot on, by default None
- Returns:
[description]
- Return type:
plt.Axes
- vaep.models.calculte_metrics(pred_df: ~pandas.core.frame.DataFrame, true_col: ~typing.Optional[~typing.List[str]] = None, scoring: ~typing.List[~typing.Tuple[str, ~typing.Callable]] = [('MSE', <function mean_squared_error>), ('MAE', <function mean_absolute_error>)]) dict [source]#
Create metrics based on predictions, a truth reference and a list of scoring function with a name.
- Parameters:
pred_df (pd.DataFrame) – Prediction DataFrame containing true_col.
true_col (List[str], optional) – Column of ground truth values, by default None
scoring (List[Tuple[str, Callable]], optional) – List of tuples. A tuple is a set of (key, funtion) pairs. The function take y_true and y_pred - as for all sklearn metrics, by default scoring
- Returns:
[description]
- Return type:
pd.DataFrame
- Raises:
ValueError – [description]
- vaep.models.collect_metrics(metrics_jsons: List, key_fct: Callable) dict [source]#
Collect and aggregate a bunch of json metrics.
- Parameters:
metrics_jsons (List) – list of filepaths to json metric files
key_fct (Callable) – Callable which creates key function of a single filepath
- Returns:
Aggregated metrics dictionary with outer key defined by key_fct
- Return type:
- Raises:
AssertionError: – If key should be overwritten, but value would change.
- vaep.models.compare_indices(first_index: Index, second_index: Index) Index [source]#
Show difference of indices in other index wrt. to first. First should be the larger collection wrt to the second. This is the set difference of two Index objects.
If second index is a superset of indices of the first, the set will be empty, although there are differences (default behaviour in pandas).
- Parameters:
first_index (pd.Index) – Index, should be superset
second_index (pd.Index) – Index, should be the subset
- Returns:
Return a new Index with elements of the first index not in second.
- Return type:
pd.Index
- vaep.models.get_df_from_nested_dict(nested_dict, column_levels=('data_split', 'model', 'metric_name'), row_name='subset')[source]#
- vaep.models.plot_loss(recorder: Recorder, norm_train: int64 = 1, norm_val: int64 = 1, skip_start: int = 5, with_valid: bool = True, ax: Optional[Axes] = None) Axes [source]#
Adapted Recorder.plot_loss to accept matplotlib.axes.Axes argument. Allows to build combined graphics.
- Parameters:
recorder (learner.Recorder) – fastai Recorder object, learn.recorder
norm_train (np.int64, optional) – Normalize epoch loss by number of training samples, by default 1
norm_val (np.int64, optional) – Normalize epoch loss by number of validation samples, by default 1
skip_start (int, optional) – Skip N first batch metrics, by default 5
with_valid (bool, optional) – Add validation data loss, by default True
ax (plt.Axes, optional) – Axes to plot on, by default None
- Returns:
[description]
- Return type:
plt.Axes
- vaep.models.plot_training_losses(learner: Learner, name: str, ax=None, norm_factors=array([1, 1]), folder='figures', figsize=(15, 8))[source]#
- vaep.models.split_prediction_by_mask(pred: DataFrame, mask: DataFrame, check_keeps_all: bool = False) Tuple[DataFrame, DataFrame] [source]#
[summary]
- Parameters:
pred (pd.DataFrame) – prediction DataFrame
mask (pd.DataFrame) – Mask with same indices as pred DataFrame.
check_keeps_all (bool, optional) – if True, perform sanity checks, by default False
- Returns:
prediction for inversed mask, and predicitions for mask
- Return type:
Tuple[pd.DataFrame, pd.DataFrame]
Submodules#
vaep.models.ae module#
Autoencoder model trained using denoising procedure.
Variational Autencoder model adapter should be moved to vaep.models.vae. Or model class could be put somewhere else.
- class vaep.models.ae.AutoEncoderAnalysis(train_df: DataFrame, val_df: DataFrame, model: Module, model_kwargs: dict, transform: Pipeline, decode: List[str], bs=64)[source]#
Bases:
ModelAnalysis
- dls: DataLoaders#
- learn: Learner#
- transform: VaepPipeline#
- class vaep.models.ae.Autoencoder(n_features: int, n_neurons: Union[int, List[int]], activation=LeakyReLU(negative_slope=0.1), last_decoder_activation=None, dim_latent: int = 10)[source]#
Bases:
Module
Autoencoder base class.
- forward(x)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class vaep.models.ae.DatasetWithTargetAdapter(*, after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, after_cancel_backward=None, after_backward=None, before_step=None, after_cancel_step=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None)[source]#
Bases:
Callback
- class vaep.models.ae.ModelAdapter(p=0.1)[source]#
Bases:
DatasetWithTargetAdapter
Models forward only expects on input matrix. Apply mask from dataloader to both pred and targets.
Keep original dimension, i.e. also predictions for NA.
- class vaep.models.ae.ModelAdapterVAE(*, after_create=None, before_fit=None, before_epoch=None, before_train=None, before_batch=None, after_pred=None, after_loss=None, before_backward=None, after_cancel_backward=None, after_backward=None, before_step=None, after_cancel_step=None, after_step=None, after_cancel_batch=None, after_batch=None, after_cancel_train=None, after_train=None, before_validate=None, after_cancel_validate=None, after_validate=None, after_cancel_epoch=None, after_epoch=None, after_cancel_fit=None, after_fit=None)[source]#
Bases:
DatasetWithTargetAdapter
Models forward method only expects one input matrix. Apply mask from dataloader to both pred and targets.
- vaep.models.ae.get_missing_values(df_train_wide: DataFrame, val_idx: Index, test_idx: Index, pred: Series) Series [source]#
Build missing value predictions based on a set of prediction and splits.
- Parameters:
df_train_wide (pd.DataFrame) – Training data in wide format.
val_idx (pd.Index) – Indices (MultiIndex of Sample and Feature) of validation split
test_idx (pd.Index) – Indices (MultiIndex of Sample and Feature) of test split
pred (pd.Series) – Mulitindexed Series of all predictions.
- Returns:
Multiindex series of missing values in training data which are not in validiation and test split.
- Return type:
pd.Series
- vaep.models.ae.get_preds_from_df(df: ~pandas.core.frame.DataFrame, learn: ~fastai.learner.Learner, transformer: ~vaep.transform.VaepPipeline, position_pred_tuple: ~typing.Optional[int] = None, dataset: ~torch.utils.data.dataset.Dataset = <class 'vaep.io.datasets.DatasetWithTarget'>)[source]#
Get predictions for specified DataFrame, using a fastai learner and a custom sklearn Pipeline.
- Parameters:
df (pd.DataFrame) – DataFrame to create predictions from.
learn (fastai.learner.Learner) – fastai Learner with trained model
transformer (vaep.transform.VaepPipeline) – Pipeline with separate encode and decode
position_pred_tuple (int, optional) – In that the model returns multiple outputs, select the one which contains the predictions matching the target variable (VAE case), by default None
dataset (torch.utils.data.Dataset, optional) – Dataset to build batches from, by default vaep.io.datasets.DatasetWithTarget
- Returns:
tuple of pandas DataFrames (prediciton and target) based on learn.get_preds
- Return type:
vaep.models.analysis module#
vaep.models.collab module#
- class vaep.models.collab.CollabAnalysis(datasplits: DataSplits, sample_column: str = 'Sample ID', item_column: str = 'peptide', target_column: str = 'intensity', model_kwargs: Optional[dict] = None, batch_size: int = 64)[source]#
Bases:
ModelAnalysis
- dls: DataLoaders#
- learn: Learner#
- transform: VaepPipeline#
- vaep.models.collab.combine_data(train_df: DataFrame, val_df: DataFrame) Tuple[DataFrame, float] [source]#
Helper function to combine training and validation data in long-format. The training and validation data will be mixed up in CF training as the sample embeddings have to be trained for all samples. The returned frac can be used to have the same number of (non-missing) validation samples as before.
- Parameters:
train_df (pd.DataFrame) – Consecutive training data in long-format, each row having (unit, feature, value)
val_df (pd.DataFrame) – Consecutive training data in long-format, each row having (unit, feature, value)
- Returns:
Pandas DataFrame of concatenated samples of training and validation data. Fraction of samples originally in validation data.
- Return type:
- vaep.models.collab.get_missing_values(df_train_long: DataFrame, val_idx: Index, test_idx: Index, analysis_collab: CollabAnalysis) Series [source]#
Helper function to get missing values from predictions. Excludes simulated missing values from validation and test data.
- Parameters:
df_train_long (pd.DataFrame) – Training data in long-format, each row having (unit, feature, value)
val_idx (pd.Index) – Validation index (unit, feature)
test_idx (pd.Index) – Test index (unit, feature)
analysis_collab (CollabAnalysis) – CollabAnalysis object
- Returns:
Predicted values for missing values in training data (unit, feature, value)
- Return type:
pd.Series
vaep.models.collect_dumps module#
Collects metrics and config files from the experiment directory structure.
vaep.models.vae module#
VAE implementation based on ronaldiscool/VAETutorial
Adapted to the setup of learning missing values.
funnel architecture (or fixed hidden layer layout)
loss is adapted to Dataset and FastAI adaptions
batchnorm1D for now (not weight norm)
- class vaep.models.vae.VAE(n_features: int, n_neurons: List[int], activation=LeakyReLU(negative_slope=0.1), last_decoder_activation=None, dim_latent: int = 10)[source]#
Bases:
Module
- forward(x)[source]#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.