Package ubiquerg Documentation
Package Overview
Ubiquerg is a utility package with a collection of helpful universally useful functions. The name means work (erg) everywhere (ubique), indicating the intention for these to be low-level functions that can be used in lots of different places.
Installation
pip install ubiquerg
API Reference
CLI Tools
cli_tools
Functions for working with command-line interaction
VersionInHelpParser
VersionInHelpParser(version=None, **kwargs)
Bases: ArgumentParser
Overwrites the inherited init. Saves the version as an object attribute for further use.
Source code in ubiquerg/cli_tools.py
20 21 22 23 24 25 26 27 28 29 | |
arg_defaults
arg_defaults(subcommand=None, unique=False, top_level=False)
Get argument defaults by subcommand from a parser.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subcommand
|
str | None
|
subcommand to get defaults for |
None
|
unique
|
bool
|
whether only unique flat dict of dests and defaults mapping should be returned |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict[str, Any] | dict[str, dict[str, Any]]
|
defaults by subcommand |
Source code in ubiquerg/cli_tools.py
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |
dests_by_subparser
dests_by_subparser(subcommand=None, top_level=False)
Get argument dests by subcommand from a parser.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
subcommand
|
str | None
|
subcommand to get dests for |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
dict |
list[str] | dict[str, list[str]]
|
dests by subcommand |
Source code in ubiquerg/cli_tools.py
66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 | |
format_help
format_help()
Add version information to help text.
Source code in ubiquerg/cli_tools.py
31 32 33 34 | |
subcommands
subcommands()
Get subcommands defined by a parser.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: subcommands defined within this parser |
Source code in ubiquerg/cli_tools.py
58 59 60 61 62 63 64 | |
subparsers
subparsers()
Get the subparser associated with a parser.
Returns:
| Type | Description |
|---|---|
_SubParsersAction
|
argparse._SubparsersAction: action defining the subparsers |
Source code in ubiquerg/cli_tools.py
36 37 38 39 40 41 42 43 44 45 | |
suppress_defaults
suppress_defaults()
Remove parser change defaults to argparse.SUPPRESS.
This prevents them from showing up in the argparse.Namespace object after argument parsing.
Source code in ubiquerg/cli_tools.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 | |
top_level_args
top_level_args()
Get actions not associated with any subparser.
Help and version are also excluded.
Returns:
| Type | Description |
|---|---|
list[Any]
|
list[argparse. |
Source code in ubiquerg/cli_tools.py
47 48 49 50 51 52 53 54 55 56 | |
convert_value
convert_value(val)
Convert string to the most appropriate type.
Converts to one of: bool, str, int, None or float
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
val
|
Any
|
the string to convert |
required |
Returns:
| Type | Description |
|---|---|
bool | str | int | float | None
|
bool | str | int | float | None: converted string to the most appropriate type |
Source code in ubiquerg/cli_tools.py
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 | |
query_yes_no
query_yes_no(question, default='no')
Ask a yes/no question via input() and return their answer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
question
|
str
|
a string that is presented to the user. |
required |
default
|
str
|
the presumed answer if the user just hits |
'no'
|
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True for "yes" or False for "no" |
Source code in ubiquerg/cli_tools.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | |
Collection Utilities
collection
Tools for working with collections
deep_update
deep_update(old, new)
Recursively update nested dict, modifying in place.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
old
|
dict[Any, Any]
|
dict to update |
required |
new
|
Mapping[Any, Any]
|
dict with new values |
required |
Source code in ubiquerg/collection.py
37 38 39 40 41 42 43 44 45 46 47 48 | |
is_collection_like
is_collection_like(c)
Determine whether an object is collection-like.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
c
|
Any
|
Object to test as collection |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
Whether the argument is a (non-string) collection |
Source code in ubiquerg/collection.py
51 52 53 54 55 56 57 58 59 60 | |
merge_dicts
merge_dicts(x, y)
Merge dictionaries.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
dict[Any, Any]
|
dict to merge |
required |
y
|
dict[Any, Any]
|
dict to merge |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Mapping |
dict[Any, Any]
|
merged dict |
Source code in ubiquerg/collection.py
22 23 24 25 26 27 28 29 30 31 32 33 34 | |
powerset
powerset(items, min_items=None, include_full_pop=True, nonempty=False)
Build the powerset of a collection of items.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
items
|
Iterable[T]
|
"Pool" of all items, the population for which to build the power set |
required |
min_items
|
int | None
|
Minimum number of individuals from the population to allow in any given subset |
None
|
include_full_pop
|
bool
|
Whether to include the full population in the powerset (default True to accord with genuine definition) |
True
|
nonempty
|
bool
|
force each subset returned to be nonempty |
False
|
Returns:
| Type | Description |
|---|---|
list[tuple[T, ...]]
|
list[object]: Sequence of subsets of the population, in nondecreasing size order |
Raises:
| Type | Description |
|---|---|
TypeError
|
if minimum item count is specified but is not an integer |
ValueError
|
if minimum item count is insufficient to guarantee nonempty subsets |
Source code in ubiquerg/collection.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 | |
uniqify
uniqify(seq)
Return only unique items in a sequence, preserving order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq
|
list[T]
|
List of items to uniqify |
required |
Returns:
| Type | Description |
|---|---|
list[T]
|
list[object]: Original list with duplicates removed |
Source code in ubiquerg/collection.py
63 64 65 66 67 68 69 70 71 72 73 74 75 | |
Environment Utilities
environment
Environment-related utilities
TmpEnv
TmpEnv(overwrite=False, **kwargs)
Bases: object
Temporary environment variable setting.
Source code in ubiquerg/environment.py
15 16 17 18 19 20 21 22 23 24 25 26 27 | |
File Operations
files
Functions facilitating file operations
checksum
checksum(path, blocksize=int(2000000000.0))
Generate a md5 checksum for the file contents in the provided path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
path to file for which to generate checksum |
required |
blocksize
|
int
|
number of bytes to read per iteration, default: 2GB |
int(2000000000.0)
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
checksum hash |
Source code in ubiquerg/files.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | |
create_file_racefree
create_file_racefree(file)
Create a file, but fail if the file already exists.
This function will thus only succeed if this process actually creates the file; if the file already exists, it will cause an OSError, solving race conditions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file
|
str
|
File to create |
required |
Raises:
| Type | Description |
|---|---|
OSError
|
if the file to be created already exists |
Source code in ubiquerg/files.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 | |
create_lock
create_lock(filepath, wait_max=10)
Securely create a lock file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
path to a file to lock |
required |
wait_max
|
int
|
max wait time if the file in question is already locked |
10
|
Source code in ubiquerg/files.py
285 286 287 288 289 290 291 292 293 294 295 | |
filesize_to_str
filesize_to_str(size)
Convert the numeric bytes to the size string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
size
|
int | float
|
file size to convert |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str | int | float
|
file size string |
Source code in ubiquerg/files.py
86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | |
make_lock_path
make_lock_path(lock_name_base)
Create a collection of path to locks file with given name as bases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lock_name_base
|
str | list[str]
|
Lock file names |
required |
Returns:
| Type | Description |
|---|---|
str | list[str]
|
str | list[str]: Path to the lock files |
Source code in ubiquerg/files.py
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 | |
remove_lock
remove_lock(filepath)
Remove lock.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
path to the file to remove the lock for. Not the path to the lock! |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
whether the lock was found and removed |
Source code in ubiquerg/files.py
247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 | |
size
size(path, size_str=True)
Get the size of a file or directory or list of them in the provided path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | list[str]
|
path or list of paths to the file or directories to check size of |
required |
size_str
|
bool
|
whether the size should be converted to a human-readable string, e.g. convert B to MB |
True
|
Returns:
| Type | Description |
|---|---|
int | str | None
|
int | str: file size or file size string |
Source code in ubiquerg/files.py
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 | |
untar
untar(src, dst, **kwargs)
Unpack a path to a target folder.
All the required directories will be created. Additional keyword arguments are passed through to tarfile.extractall().
Tarfile filter background (PEP 706):
Python 3.12 added a filter parameter to extractall() with three
options: "fully_trusted" (no restrictions), "tar" (some restrictions),
and "data" (strict: rejects absolute paths, symlinks to absolute
targets, etc.). In 3.12-3.13, the default is "fully_trusted" but
a DeprecationWarning is emitted if no filter is specified. In 3.14,
the default changed to "data".
This matters for refgenie because refgenie server archives contain absolute symlinks (child assets like bwa_index symlink to parent assets like fasta using the build server's absolute path). These symlinks are always broken on the client anyway (the client rewrites them), but the "data" filter crashes with AbsoluteLinkError before extraction even finishes.
Callers extracting refgenie archives should pass filter="fully_trusted" to allow these absolute symlinks through. Once refgenie's archive creation stops including absolute symlinks, callers should switch to filter="data" for security hardening.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
src
|
str
|
path to unpack |
required |
dst
|
str
|
path to output folder |
required |
**kwargs
|
passed to tarfile.extractall (e.g. filter="fully_trusted") |
{}
|
Source code in ubiquerg/files.py
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 | |
wait_for_lock
wait_for_lock(lock_file, wait_max=30)
Just sleep until the lock_file does not exist.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lock_file
|
str
|
Lock file to wait upon |
required |
wait_max
|
int
|
max wait time if the file in question is already locked |
30
|
Source code in ubiquerg/files.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 | |
File Locking
file_locking
OneLocker
OneLocker(filepath, wait_max=10, strict_ro_locks=False)
A simple mutual-exclusion file locker.
Uses a single lock file for exclusive access. Unlike ThreeLocker, this does not distinguish between read and write locks — any lock is exclusive. Simpler and sufficient when concurrent readers are not needed.
Source code in ubiquerg/file_locking.py
390 391 392 393 394 | |
ThreeLocker
ThreeLocker(filepath, wait_max=10, strict_ro_locks=False)
Bases: object
A class to lock files for reading and writing.
It uses a three-lock system, with separate read-lock, write-lock, and universal-lock (or lock-lock). The universal lock is used to lock the locks, to prevent race conditions between read and write locks. It allows multiple simultaneous readers, as long as there is no writer. It creates lock files in the same directory as the file to be locked.
Warning
These locks are NOT re-entrant. If a process already holds a lock on a file and tries to acquire the same lock again, it will deadlock (wait forever for itself to release the lock). Do not nest lock contexts on the same file.
Source code in ubiquerg/file_locking.py
45 46 47 48 49 | |
create_read_lock
create_read_lock(filepath=None, wait_max=None)
Securely create a read lock file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
path to a file to lock |
None
|
wait_max
|
int
|
max wait time if the file in question is already locked |
None
|
Source code in ubiquerg/file_locking.py
111 112 113 114 115 116 117 118 119 120 121 122 123 124 | |
create_write_lock
create_write_lock(filepath=None, wait_max=None)
Securely create a write lock file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
path to a file to lock |
None
|
wait_max
|
int
|
max wait time if the file in question is already locked |
None
|
Source code in ubiquerg/file_locking.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | |
ensure_locked
ensure_locked(lock_type=WRITE)
Decorator to apply to functions to make sure they only happen when locked.
Source code in ubiquerg/file_locking.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | |
locked_read_file
locked_read_file(filepath, create_file=False)
Read a file contents into memory after locking the file.
This will prevent other ThreeLocker-protected processes from writing to the file while it is being read.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
path to the file that should be read |
required | |
create_file
|
bool
|
whether to create the file if it doesn't exist |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
file contents |
Source code in ubiquerg/file_locking.py
302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 | |
make_all_lock_paths
make_all_lock_paths(filepath)
Create a collection of paths to lock files with given name as base.
Source code in ubiquerg/file_locking.py
368 369 370 371 372 373 374 375 376 377 378 | |
read_lock
read_lock(obj)
Read-lock a filepath or object with locker attribute.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
object
|
filepath string or object with locker attribute |
required |
Yields:
| Name | Type | Description |
|---|---|---|
object |
object
|
the locked object |
Warning
Locks are NOT re-entrant. Do not nest lock contexts on the same file, or the process will deadlock waiting for itself::
# WRONG - will deadlock:
with read_lock(cfg):
with read_lock(cfg): # Deadlock!
...
# RIGHT - lock once at the top level:
with read_lock(cfg):
do_work(cfg) # Pass already-locked object
Source code in ubiquerg/file_locking.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 | |
wait_for_locks
wait_for_locks(lock_paths, wait_max=10)
Wait for lock files to be removed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lock_paths
|
list | str
|
path to a file to lock |
required |
wait_max
|
int
|
max wait time if the file in question is already locked |
10
|
Source code in ubiquerg/file_locking.py
327 328 329 330 331 332 333 334 335 336 337 | |
write_lock
write_lock(obj)
Write-lock file path or object with locker attribute.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
object
|
filepath string or object with locker attribute |
required |
Yields:
| Name | Type | Description |
|---|---|---|
object |
object
|
the locked object |
Warning
Locks are NOT re-entrant. Do not nest lock contexts on the same file, or the process will deadlock waiting for itself::
# WRONG - will deadlock:
with write_lock(cfg):
with write_lock(cfg): # Deadlock!
cfg.write()
# RIGHT - lock once at the top level:
with write_lock(cfg):
do_work_and_write(cfg) # Don't re-lock inside
Source code in ubiquerg/file_locking.py
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | |
Path Utilities
paths
Filesystem utility functions
expandpath
expandpath(path)
Expand a filesystem path that may or may not contain user/env vars.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
path to expand |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
expanded version of input path |
Source code in ubiquerg/paths.py
13 14 15 16 17 18 19 20 21 22 | |
mkabs
mkabs(path, reldir=None)
Make sure a path is absolute.
If not already absolute, it's made absolute relative to a given directory (or file). Also expands ~ and environment variables for kicks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | None
|
Path to make absolute |
required |
reldir
|
str | None
|
Relative directory to make path absolute from if it's not already absolute |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str | None
|
Absolute path |
Source code in ubiquerg/paths.py
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 | |
parse_registry_path
parse_registry_path(rpstring, defaults=None)
Parse a 'registry path' string into components.
A registry path is a string that is kind of like a URL, providing a unique
identifier for a particular asset, like
protocol::namespace/item.subitem:tag. You can use the defaults argument to
change the names of the entries in the return dict, and to provide defaults
in case of missing values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rpstring
|
str
|
string to parse |
required |
defaults
|
list[tuple[str, Any]] | None
|
A list of 5 tuples with name of the 5 entries, and a default value in case it is missing (can be 'None') |
None
|
Returns:
| Type | Description |
|---|---|
dict | None
|
dict | None: dict with one element for each parsed entry in the path |
Source code in ubiquerg/paths.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | |
parse_registry_path_strict
parse_registry_path_strict(input_string, require_protocol=False, require_namespace=False, require_item=True, require_subitem=False, require_tag=False)
Parse and validate a registry path with required component checks.
This function parses a registry path and returns the parsed dictionary only if all required components are present. Returns None otherwise. Can be used as a boolean check (truthy/falsy) or to get the parsed components.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_string
|
str
|
String to parse and validate as a registry path |
required |
require_protocol
|
bool
|
If True, protocol component must be present |
False
|
require_namespace
|
bool
|
If True, namespace component must be present |
False
|
require_item
|
bool
|
If True, item component must be present (default: True) |
True
|
require_subitem
|
bool
|
If True, subitem component must be present |
False
|
require_tag
|
bool
|
If True, tag component must be present |
False
|
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
dict | None: Parsed registry path dict if valid and all required components present, else None |
Example
result = parse_registry_path_strict("namespace/item:tag") result['namespace'] 'namespace' parse_registry_path_strict("item", require_namespace=True) None
Can be used as a boolean check
if parse_registry_path_strict("namespace/item", require_namespace=True): ... print("Valid!") Valid!
Get specific components
result = parse_registry_path_strict("protocol::namespace/item.subitem:tag", require_protocol=True) result['protocol'] 'protocol'
Source code in ubiquerg/paths.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
System Utilities
system
System utility functions
is_command_callable
is_command_callable(cmd)
Check if command can be called.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cmd
|
str
|
actual command to check for callability |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
whether given command's call succeeded |
Raises:
| Type | Description |
|---|---|
TypeError
|
if the alleged command isn't a string |
ValueError
|
if the alleged command is empty |
Source code in ubiquerg/system.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | |
is_writable
is_writable(folder, check_exist=False, create=False)
Make sure a folder is writable.
Given a folder, check that it exists and is writable. Errors if requested on a non-existent folder. Otherwise, make sure the first existing parent folder is writable such that this folder could be created.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
folder
|
str | None
|
Folder to check for writeability |
required |
check_exist
|
bool
|
Throw an error if it doesn't exist? |
False
|
create
|
bool
|
Create the folder if it doesn't exist? |
False
|
Source code in ubiquerg/system.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | |
Web Utilities
web
Web-related utilities
has_scheme
has_scheme(maybe_url)
Check whether a string starts with a URI scheme (e.g. s3://, gs://, file://).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
maybe_url
|
str
|
string to check |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
whether string starts with a URI scheme |
Source code in ubiquerg/web.py
22 23 24 25 26 27 28 29 30 31 | |
is_url
is_url(maybe_url)
Determine whether a path is a URL.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
maybe_url
|
str
|
path to investigate as URL |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
whether path appears to be a URL |
Source code in ubiquerg/web.py
34 35 36 37 38 39 40 41 42 43 | |