pimmslearn.pandas package#
- pimmslearn.pandas.calc_errors_per_feat(pred: DataFrame, freq_feat: Series, target_col='observed') DataFrame[source]#
Calculate absolute errors and sort by freq of features.
- pimmslearn.pandas.flatten_dict_of_dicts(d: dict, parent_key: str = '') dict[source]#
Build tuples for nested dictionaries for use as pandas.MultiIndex.
- pimmslearn.pandas.get_columns_accessor_from_iterable(cols: Iterable[str], all_lower_case=False) OmegaConf[source]#
- pimmslearn.pandas.get_columns_namedtuple(df: DataFrame) namedtuple[source]#
Create namedtuple instance of column names. Spaces in column names are replaced with underscores in the look-up.
- Parameters:
df (pd.DataFrame) – A pandas DataFrame
- Returns:
NamedTuple instance with columns as attributes.
- Return type:
namedtuple
- pimmslearn.pandas.get_counts_per_bin(df: DataFrame, bins: range, columns: List[str] | None = None) DataFrame[source]#
Return counts per bin for selected columns in DataFrame.
- pimmslearn.pandas.get_last_index_matching_proportion(df_counts: DataFrame, prop: float = 0.25, prop_col: str = 'proportion') object[source]#
df_counts needs to be sorted by “prop_col” (descending).
- Parameters:
- Returns:
Index value for cutoff
- Return type:
- pimmslearn.pandas.get_unique_non_unique_columns(df: DataFrame) SimpleNamespace[source]#
Get back a namespace with an column.Index both of the unique and non-unique columns.
- Parameters:
df (pd.DataFrame)
- Returns:
SimpleNamespace with unique and non_unique column names indices.
- Return type:
- pimmslearn.pandas.highlight_min(s: Series) list[source]#
Highlight the min in a Series yellow for using in pandas.DataFrame.style
- Parameters:
s (pd.Series) – Pandas Series
- Returns:
list of strings containing the background color for the values speciefied. To be used as pandas.DataFrame.style.apply(highlight_min)
- Return type:
- pimmslearn.pandas.interpolate(wide_df: DataFrame, name='interpolated') DataFrame[source]#
Interpolate NA values with the values before and after. Uses n=3 replicates. First rows replicates are the two following. Last rows replicates are the two preceding.
- Parameters:
wide_df (pd.DataFrame) – rows are sample, columns are measurements
name (str, optional) – name for measurement in columns, by default ‘replicates’
- Returns:
pd.DataFrame in long-format
- Return type:
pd.DataFrame
- pimmslearn.pandas.length(x)[source]#
Len function which return 0 if object (probably np.nan) has no length. Otherwise return length of list, pandas.Series, numpy.array, dict, etc.
- pimmslearn.pandas.parse_query_expression(s: str, printable: str = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ') str[source]#
Parse a query expression for pd.DataFrame.query to a file name. Removes all characters not listed in printable.
- pimmslearn.pandas.replace_with(string_key: str, replace: str = '()/', replace_with: str = '') str[source]#
- pimmslearn.pandas.select_max_by(df: DataFrame, grouping_columns: list, selection_column: str) DataFrame[source]#
- pimmslearn.pandas.unique_cols(s: Series) bool[source]#
Check all entries are equal in pandas.Series
Ref: https://stackoverflow.com/a/54405767/968487
- Parameters:
s (pandas.Series) – Series to check uniqueness
- Returns:
Boolean on if all values are equal.
- Return type:
Submodules#
pimmslearn.pandas.calc_errors module#
- pimmslearn.pandas.calc_errors.calc_errors_per_bin(pred: DataFrame, target_col='observed') DataFrame[source]#
Calculate absolute errors. Bin by integer value of simulated NA and provide count.
pimmslearn.pandas.missing_data module#
Functionality related to analyzing missing values in a pandas DataFrame.
- pimmslearn.pandas.missing_data.decompose_NAs(data: DataFrame, level: int | str, label: int = 'summary') DataFrame[source]#
Decompose missing values by a level into real and indirectly imputed missing values. Real missing value have missing for all samples in a group. Indirectly imputed missing values are in MS-based proteomics data that would be imputed by the mean (or median) of the observed values in a group if the mean (or median) is used for imputation.
- Parameters:
- Returns:
One column DataFrame with summary information about missing values.
- Return type:
pd.DataFrame
- pimmslearn.pandas.missing_data.get_record(data: DataFrame, columns_sample=False) dict[source]#
Get summary record of data.