dandi_compute_code.queue#

dandi_compute_code.queue.aggregate_queue_statistics( *, queue_directory, dandiset_directory, output_file_name='queue_stats.json', )[source]#

Write aggregate queue statistics JSON and return the written payload.

Return type:

dict

Parameters:

queue_directory (Path)
dandiset_directory (Path)
output_file_name (str)

dandi_compute_code.queue.clean_unsubmitted_capsules(*, dandiset_directory, queue_directory)[source]#

Remove all queued (unsubmitted) capsule directories from the dandiset tree.

A capsule is considered queued (prepared but not yet submitted) when its attempt directory has a code/ subdirectory but neither a non-empty logs/ subdirectory nor a derivatives/ subdirectory, and the attempt directory does not contain a submitted-marker file (code/submitted or code/submitted_date-*).

The function reads the queue state, then deletes each matching attempt directory tree from the DANDI archive (via dandi delete) and from the local filesystem. This expects the local Dandiset copy to be up-to-date.

Parameters:

dandiset_directory (Path) – Path to a local clone of the dandiset repository used to resolve and delete matching attempt directories.
queue_directory (Path) – Path to the queue root directory.

Returns:

List of attempt directory paths that were deleted.

Return type:

list[Path]

Raises:

NotADirectoryError – If queue_directory does not exist or is not a directory.
RuntimeError – If the DANDI_API_KEY environment variable is not set or is blank.

dandi_compute_code.queue.dump_issues(*, dandiset_directory, queue_directory, output_file_name='issues_dump.json')[source]#

Scan nextflow/slurm logs and write per-capsule error lines under queue_directory.

Return type:

list[dict]

Parameters:

dandiset_directory (Path)
queue_directory (Path)
output_file_name (str)

dandi_compute_code.queue.has_pending_jobs()[source]#

Report whether any queued jobs are awaiting submission.

This is a lightweight check intended to gate queue dispatch. It inspects the DANDI assets metadata for attempt directories that contain a code/submit.sh asset without an adjacent submitted marker. It does not submit anything and does not require SLURM access.

Returns:: True when at least one job is awaiting submission, False otherwise.
Return type:: bool

class dandi_compute_code.queue.JobEntry( job, content_id, asset_size_bytes, has_code=False, has_been_submitted=False, has_output=False, has_logs=False, created_at=None, job_completion_time=None, dataset_description_path=<factory>, output_paths=<factory>, log_paths=<factory>, )[source]#

Bases: object

A JobInfo (identity) plus the status fields written by write_queue_state and consumed across the queue module.

Parameters:

job (JobInfo)
content_id (str | None)
asset_size_bytes (int | None)
has_code (bool)
has_been_submitted (bool)
has_output (bool)
has_logs (bool)
created_at (str | None)
job_completion_time (str | None)
dataset_description_path (dict[str, str])
output_paths (dict[str, str])
log_paths (dict[str, str])

job: JobInfo#

content_id: str | None#

asset_size_bytes: int | None#

has_code: bool = False#

has_been_submitted: bool = False#

has_output: bool = False#

has_logs: bool = False#

created_at: str | None = None#

job_completion_time: str | None = None#

dataset_description_path: dict[str, str]#

output_paths: dict[str, str]#

log_paths: dict[str, str]#

property is_pending: bool#: Code prepared but never submitted (no logs, no output yet).

property is_running: bool#: Logs present but no output yet — likely still executing.

property is_successful: bool#: Output directory present — job completed successfully.

property is_failed: bool#: Has code and logs but no output — the job ran but did not succeed.

property identity: tuple#

Stable key for matching queue/state/last-submitted entries.

Excludes codebase deliberately: an attempt is the same logical job regardless of which codebase version produced it.

attempt_dir_candidates(base_dir)[source]#

Return (flat_layout_path, legacy_nested_layout_path) for this attempt.

Parameters:: base_dir (Path) – Root of the local Dandiset tree to resolve paths under.
Raises:: ValueError – If this entry’s dandi_path is an empty string.
Return type:: tuple[Path, Path]

resolve_attempt_dir(base_dir)[source]#

Resolve the best on-disk attempt-directory path for this entry.

Return type:: Path
Parameters:: base_dir (Path)

resolve_unsubmitted_attempt_dir(base_dir)[source]#

Resolve the attempt directory only if this entry is queued but unsubmitted.

Returns None when the entry is not pending (see is_pending) or when a submitted marker (code/submitted or code/submitted_date-*) is present on disk.

Return type:: Path | None
Parameters:: base_dir (Path)

classmethod from_dict(data, /)[source]#

Construct from a raw state.jsonl entry dict.

Return type:: JobEntry
Parameters:: data (dict)

to_dict()[source]#

Serialise back to the flat dict format written to state.jsonl.

Return type:: dict

class dandi_compute_code.queue.JobInfo(dandiset_id, dandi_path, pipeline, version, params, config, attempt, codebase)[source]#

Bases: object

Parameters:

dandiset_id (str)
dandi_path (str)
pipeline (str)
version (str)
params (str)
config (str)
attempt (int)
codebase (str)

dandiset_id: str#

dandi_path: str#

pipeline: str#

version: str#

params: str#

config: str#

attempt: int#

codebase: str#

dandi_compute_code.queue.prepare_queue( *, queue_directory, pipeline_directory=None, config_key='default', content_ids=None, limit=None, )[source]#

En-masse preparation of qualifying assets based on the current queue config.

For every pipeline/version/params combination declared in queue_config.json this function determines which content IDs to prepare and calls prepare_aind_ephys_job() for each asset — generating the code/ directory and its parent directories without submitting a job.

The per-pipeline failure cap (max_fail_per_dandiset in queue_config.json) is enforced by reading the existing state.jsonl file inside queue_directory. Entries with has_code=True, has_logs=True, and has_output=False are counted as failures for the relevant pipeline, version, and source Dandiset. Run write_queue_state() beforehand to ensure state.jsonl is up to date.

Parameters:

queue_directory (Path) – Path to the queue root directory.
pipeline_directory (Path | None) – Local path to the AIND pipeline repository. Passed directly to prepare_aind_ephys_job().
config_key (str) – Key for a registered job configuration. Passed directly to prepare_aind_ephys_job().
content_ids (list[str] | None) – Explicit list of content IDs to prepare. When provided, the qualifying content IDs list is not fetched from the network and these IDs are used directly instead. Useful for targeted runs such as testing with one or more known content IDs.
limit (int | None) – If provided, stop after preparing limit assets in total (across all pipeline/version/params combinations). When qualifying IDs are fetched automatically, they are randomized in round-robin order across source Dandisets before this limit is applied. Useful for testing.

Return type:

None

dandi_compute_code.queue.process_queue( *, queue_directory, processing_directory, max_concurrent_aind_jobs=2, jitter_seconds=30.0, test=False, )[source]#

Submit jobs from state.jsonl up to max_concurrent_aind_jobs total running AIND-Ephys-Pipeline SLURM jobs.

If state.jsonl is absent, a FileNotFoundError is raised. If state.jsonl exists but is empty, a warning is emitted and the invocation returns without submitting jobs. Otherwise squeue --me is checked for currently running AIND-Ephys-Pipeline jobs, and up to the difference from max_concurrent_aind_jobs jobs are submitted.

A random delay of up to jitter_seconds is applied before any work is done to spread out concurrent invocations and avoid thundering-herd submission bursts.

Parameters:

queue_directory (Path) – Path to the queue root directory.
processing_directory (Path) – Path to the directory used for temporary working trees during job submission.
max_concurrent_aind_jobs (int) – Maximum number of AIND-Ephys-Pipeline jobs allowed to be running concurrently before new submissions are skipped.
jitter_seconds (float) – Maximum number of seconds to sleep before proceeding. A uniformly random duration between 0 and jitter_seconds is chosen each invocation. Set to 0 to disable jitter entirely.
test (bool) – If True, preserve temporary processing directories on success.

Returns:

"submitted" when one or more jobs were submitted, "no-pending" when no pending jobs were available to submit, and "slots-unavailable" when submission was skipped because no queue slots were available.

Return type:

Literal['submitted', 'no-pending', 'slots-unavailable']

Raises:

FileNotFoundError – If state.jsonl is not found in queue_directory.
ValueError – If jitter_seconds is negative.

class dandi_compute_code.queue.QueueState(entries)[source]#

Bases: object

Container for all entries in state.jsonl.

Replaces the scattered list[dict] reads in _prepare_queue.py, _aggregate_queue_statistics.py, _process_queue.py, and _clean_unsubmitted_capsules.py.

Parameters:: entries (list[JobEntry])

entries: list[JobEntry]#

property pending: list[JobEntry]#: Entries with code prepared but not yet submitted.

property running: list[JobEntry]#: Entries with logs present but no output — likely still executing.

property successful: list[JobEntry]#: Entries whose output directory is present.

property failed: list[JobEntry]#: Entries with code and logs but no output.

property successful_asset_bytes_total: int#: Total source-asset bytes across successful entries with a known size.

content_id_to_dandiset_ids()[source]#

Map each content_id to the set of source Dandiset IDs it appears under.

A content ID is expected to map to a single source Dandiset in normal operation; ambiguous mappings (more than one) are surfaced so callers can handle them conservatively.

Return type:: dict[str, set[str]]

failures_for(*, pipeline, version)[source]#

Failed entries matching a given pipeline and version.

Return type:

list[JobEntry]

Parameters:

pipeline (str)
version (str)

entry_for(*, dandi_path, attempt=1)[source]#

Return the entry with the given dandi_path (and attempt).

Parameters:

dandi_path (str) – The dandi_path recorded on the target entry.
attempt (int) – Disambiguates scenarios that use more than one attempt of the same asset.

Raises:

KeyError – If no entry matches dandi_path and attempt.

Return type:

JobEntry

static pending_code_dirs()[source]#

Identify attempt code directories awaiting submission from DANDI assets metadata.

Loads the DANDI assets.jsonld metadata and collects every attempt directory that contains a code/submit.sh asset but no adjacent submitted-marker asset. An entry is considered submitted when a sibling submitted asset exists, or when a sibling asset whose name starts with submitted_date- exists.

Return type:: list[str]
Returns:: Sorted list of code directory paths (relative to the Dandiset root) that are pending submission. Empty when nothing is awaiting submission.

classmethod has_pending_jobs()[source]#

Report whether any queued jobs are awaiting submission.

Lightweight check intended to gate queue dispatch: it inspects the DANDI assets metadata for attempt directories that contain a code/submit.sh asset without an adjacent submitted marker. It does not submit anything and does not require SLURM access.

Return type:: bool

classmethod submit_next(*, processing_directory, max_submissions=2, test=False)[source]#

Submit the next eligible pending entries from the DANDI assets metadata.

Identifies all attempt directories that contain a code/submit.sh asset but no adjacent submitted-marker asset (see pending_code_dirs()). For each candidate (up to max_submissions), a temporary working directory is created inside processing_directory, the code/ tree is downloaded via dandi download --preserve-tree, the submission script is executed via sbatch, a submitted marker is written adjacent to submit.sh, the marker is pushed back to the archive via dandi upload --allow-any-path, and the temporary directory is removed on success.

Parameters:

processing_directory (Path) – Directory in which temporary per-job working trees are created.
max_submissions (int) – Maximum number of pending jobs to submit.
test (bool) – When True, leave temporary working directories on disk after successful submission for debugging.

Returns:

True if at least one job was submitted, False otherwise.

Return type:

bool

Raises:

RuntimeError – If dandi download, sbatch, or dandi upload returns a non-zero exit code for any candidate.

static count_running_aind_ephys_pipeline_jobs()[source]#

Count currently running AIND Ephys pipeline jobs via the SLURM scheduler.

Calls squeue --me --format=%j and counts jobs whose name is exactly AIND-Ephys-Pipeline.

Raises:: RuntimeError – If the squeue invocation exits non-zero and writes to standard error.
Return type:: int

static load_queue_config(*, queue_directory)[source]#

Read and validate queue_config.json under queue_directory.

Raises:

FileNotFoundError – If queue_config.json is not found.
ValueError – If the queue configuration fails LinkML validation.

Return type:

dict

Parameters:

queue_directory (Path)

static resolve_params_key_to_id(pipeline, params_key)[source]#

Resolve a human-readable parameters key to its 7-character hash ID.

For the aind+ephys pipeline the lookup is performed against the registered params registry. For any other pipeline, or if the key is not found, params_key is returned unchanged so callers that already store raw hash IDs continue to work.

Return type:

str

Parameters:

pipeline (str)
params_key (str)

classmethod from_metadata(metadata, /)[source]#

Build a queue state from indexed DANDI assets metadata.

Each entry represents one attempt capsule inferred from the derivatives/dandiset-*/.../pipeline-*/..._attempt-* path structure, with content_id / asset_size_bytes resolved from the upstream source Dandiset’s assets.jsonld.

Parameters:: metadata (AssetsJsonldMetadata) – Indexed assets metadata, as produced by from_jsonld() or from_dandi().
Return type:: QueueState

classmethod from_jsonld(*, file_path)[source]#

Build a queue state from a local DANDI assets.jsonld file.

The file should be a JSON file whose content is a list of asset dicts with path, contentSize, dateModified, and contentUrl fields (matching the assets.jsonld layout from DANDI). The .jsonld file is preferred over its assets.yaml counterpart at the same S3 location because JSON parsing is many times faster than YAML for identical content.

Parameters:: file_path (Path) – Path to a local assets JSON-LD file.
Raises:: ValueError – If the file content is not a JSON array.
Return type:: QueueState

classmethod from_dandi(*, dandiset_id='001697')[source]#

Build a queue state from a Dandiset’s remote assets.jsonld metadata.

Fetches assets.jsonld for dandiset_id from the DANDI S3 bucket over the network.

Parameters:: dandiset_id (str) – The Dandiset whose assets.jsonld is read. Defaults to the job capsules Dandiset (001697).
Return type:: QueueState

classmethod write_state( *, queue_directory, dandiset_id='001697', state_file_name='state.jsonl', )[source]#

Write a queue state file from DANDI assets.jsonld metadata.

Validates queue_config.json under queue_directory, builds the state via from_dandi(), and writes it to queue_directory/state_file_name.

Parameters:

queue_directory (Path) – Path to the queue root directory.
dandiset_id (str) – The Dandiset whose assets.jsonld portrays the state.
state_file_name (str) – Name of the state file written under queue_directory.

Raises:

FileNotFoundError – If queue_config.json is not found.
ValueError – If the queue configuration fails LinkML validation.

Return type:

None

classmethod write_archive_state(*, queue_directory)[source]#

Write archive_state.jsonl from the failed runs archive assets.jsonld.

The archive counterpart to write_state(); produces an identically structured state file adjacent to state.jsonl portraying the failed runs archive Dandiset (001873) rather than the job capsules Dandiset.

Parameters:: queue_directory (Path) – Path to the queue root directory.
Return type:: None

aggregate_statistics( *, queue_directory, dandiset_directory, output_file_name='queue_stats.json', )[source]#

Write aggregate queue statistics JSON and return the written payload.

Return type:

dict

Parameters:

queue_directory (Path)
dandiset_directory (Path)
output_file_name (str)

clean_unsubmitted_capsules(*, dandiset_directory)[source]#

Remove all queued (unsubmitted) capsule directories from the dandiset tree.

A capsule is queued when its attempt directory has a code/ subdirectory but no logs/ or derivatives/ content and no submitted marker. Each matching attempt directory is deleted from the DANDI archive (via dandi delete) and the local filesystem.

Parameters:: dandiset_directory (Path) – Local clone of the dandiset used to resolve and delete matching attempt directories.
Returns:: Attempt directory paths that were deleted.
Return type:: list[Path]
Raises:: RuntimeError – If DANDI_API_KEY is not set or is blank.

classmethod process_queue( *, queue_directory, processing_directory, max_concurrent_aind_jobs=2, jitter_seconds=30.0, test=False, )[source]#

Submit jobs from state.jsonl up to max_concurrent_aind_jobs total running AIND-Ephys-Pipeline SLURM jobs.

Parameters:

queue_directory (Path) – Path to the queue root directory.
processing_directory (Path) – Directory for temporary working trees during submission.
max_concurrent_aind_jobs (int) – Maximum concurrent AIND-Ephys-Pipeline jobs.
jitter_seconds (float) – Maximum random delay (seconds) before processing; 0 disables.
test (bool) – If True, preserve temporary processing directories on success.

Raises:

FileNotFoundError – If state.jsonl is not found in queue_directory.
ValueError – If jitter_seconds is negative or max_concurrent_aind_jobs < 1.

Return type:

Literal['submitted', 'no-pending', 'slots-unavailable']

classmethod prepare( *, queue_directory, pipeline_directory=None, config_key='default', content_ids=None, limit=None, )[source]#

En-masse preparation of qualifying assets based on the current queue config.

For every pipeline/version/params combination declared in queue_config.json this determines which content IDs to prepare and calls prepare_aind_ephys_job() for each asset. The per-pipeline failure cap (max_fail_per_dandiset) is enforced by reading the existing state.jsonl under queue_directory.

Parameters:

queue_directory (Path) – Path to the queue root directory.
pipeline_directory (Path | None) – Local path to the AIND pipeline repository.
config_key (str) – Key for a registered job configuration.
content_ids (list[str] | None) – Explicit content IDs to prepare; when provided, the qualifying list is not fetched from the network.
limit (int | None) – If provided, stop after preparing limit assets in total.

Return type:

None

static dump_issues( *, dandiset_directory, queue_directory, output_file_name='issues_dump.json', )[source]#

Scan nextflow/slurm logs and write per-capsule error lines under queue_directory.

Return type:

list[dict]

Parameters:

dandiset_directory (Path)
queue_directory (Path)
output_file_name (str)

static summarize_issues( *, dandiset_directory, queue_directory, dump_output_file_name='issues_dump.json', output_file_name='issues_summary.json', )[source]#

Write descending error-frequency summary where keys are counts and values are error strings.

Return type:

dict[str, list[str]]

Parameters:

dandiset_directory (Path)
queue_directory (Path)
dump_output_file_name (str)
output_file_name (str)

classmethod from_jsonl(file_path, /)[source]#

Load from an existing state.jsonl file.

Parameters:: file_path (Path) – Path to the state.jsonl file to read.
Raises:: FileNotFoundError – If file_path does not exist.
Return type:: QueueState

to_file(file_path, /)[source]#

Write all entries to file_path as newline-delimited JSON.

Parameters:: file_path (Path) – Destination path; the file is overwritten if it already exists.
Return type:: None

dandi_compute_code.queue.summarize_issues( *, dandiset_directory, queue_directory, dump_output_file_name='issues_dump.json', output_file_name='issues_summary.json', )[source]#

Write descending error-frequency summary where keys are counts and values are error strings.

Return type:

dict[str, list[str]]

Parameters:

dandiset_directory (Path)
queue_directory (Path)
dump_output_file_name (str)
output_file_name (str)

dandi_compute_code.queue.write_archive_state(*, queue_directory)[source]#

Write archive_state.jsonl from the failed runs archive assets.jsonld.

This is the archive counterpart to write_queue_state(). It produces an identically structured state file that lives adjacent to state.jsonl in queue_directory, but portrays the state of the failed runs archive Dandiset (001873) — where dandicompute archive job moves capsules — rather than the job capsules Dandiset (001697) where jobs run.

Parameters:: queue_directory (Path) – Path to the queue root directory.
Return type:: None

dandi_compute_code.queue.write_queue_state( *, queue_directory, dandiset_id='001697', state_file_name='state.jsonl', )[source]#

Write a queue state file from DANDI assets.jsonld metadata.

Each state entry represents one attempt capsule inferred from the derivatives/dandiset-*/.../pipeline-*/..._attempt-* path structure in assets.jsonld. dandi_path is derived from the path segment between dandiset-* and pipeline-* with .nwb appended. has_code/has_been_submitted/has_output/has_logs are inferred from the assets present under each attempt directory. dataset_description_path maps the root-level dataset_description.json asset path to its blob ID when present. output_paths maps each output asset path to its blob ID when has_output is True; otherwise it is an empty dict. log_paths maps each log asset path to its blob ID when has_logs is True; otherwise it is an empty dict.

For meta-analysis dandisets, the content_id and asset_size_bytes of each attempt’s source NWB are resolved by fetching the upstream dandiset’s assets.jsonld. created_at comes from the local code/submit.sh modification time; job_completion_time is the latest modification time among log files.

Parameters:

queue_directory (Path) – Path to the queue root directory.
dandiset_id (str) – The Dandiset whose assets.jsonld is read to portray the state. Defaults to the job capsules Dandiset (001697).
state_file_name (str) – Name of the state file written under queue_directory. Defaults to state.jsonl.

Return type:

None