vaep.pandas package#
- vaep.pandas.flatten_dict_of_dicts(d: dict, parent_key: str = '') dict [source]#
Build tuples for nested dictionaries for use as pandas.MultiIndex.
- vaep.pandas.get_columns_accessor_from_iterable(cols: Iterable[str], all_lower_case=False) OmegaConf [source]#
- vaep.pandas.get_columns_namedtuple(df: DataFrame) namedtuple [source]#
Create namedtuple instance of column names. Spaces in column names are replaced with underscores in the look-up.
- Parameters:
df (pd.DataFrame) – A pandas DataFrame
- Returns:
NamedTuple instance with columns as attributes.
- Return type:
namedtuple
- vaep.pandas.get_counts_per_bin(df: DataFrame, bins: range, columns: Optional[List[str]] = None)[source]#
Return counts per bin for selected columns in DataFrame.
- vaep.pandas.get_last_index_matching_proportion(df_counts: DataFrame, prop: float = 0.25, prop_col: str = 'proportion') object [source]#
df_counts needs to be sorted by “prop_col” (descending).
- Parameters:
- Returns:
Index value for cutoff
- Return type:
- vaep.pandas.get_unique_non_unique_columns(df: DataFrame) SimpleNamespace [source]#
Get back a namespace with an column.Index both of the unique and non-unique columns.
- Parameters:
df (pd.DataFrame) –
- Returns:
SimpleNamespace with unique and non_unique column names indices.
- Return type:
- vaep.pandas.highlight_min(s: Series) list [source]#
Highlight the min in a Series yellow for using in pandas.DataFrame.style
- Parameters:
s (pd.Series) – Pandas Series
- Returns:
list of strings containing the background color for the values speciefied. To be used as pandas.DataFrame.style.apply(highlight_min)
- Return type:
- vaep.pandas.interpolate(wide_df: DataFrame, name='interpolated') DataFrame [source]#
Interpolate NA values with the values before and after. Uses n=3 replicates. First rows replicates are the two following. Last rows replicates are the two preceding.
- Parameters:
wide_df (pd.DataFrame) – rows are sample, columns are measurements
name (str, optional) – name for measurement in columns, by default ‘replicates’
- Returns:
pd.DataFrame in long-format
- Return type:
pd.DataFrame
- vaep.pandas.length(x)[source]#
Len function which return 0 if object (probably np.nan) has no length. Otherwise return length of list, pandas.Series, numpy.array, dict, etc.
- vaep.pandas.parse_query_expression(s: str, printable: str = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ') str [source]#
Parse a query expression for pd.DataFrame.query to a file name. Removes all characters not listed in printable.
- vaep.pandas.replace_with(string_key: str, replace: str = '()/', replace_with: str = '') str [source]#
- vaep.pandas.select_max_by(df: DataFrame, grouping_columns: list, selection_column: str) DataFrame [source]#
- vaep.pandas.unique_cols(s: Series) bool [source]#
Check all entries are equal in pandas.Series
Ref: https://stackoverflow.com/a/54405767/968487
- Parameters:
s (pandas.Series) – Series to check uniqueness
- Returns:
Boolean on if all values are equal.
- Return type:
Submodules#
vaep.pandas.calc_errors module#
- vaep.pandas.calc_errors.calc_errors_per_bin(pred: DataFrame, target_col='observed') DataFrame [source]#
Calculate absolute errors. Bin by integer value of simulated NA and provide count.