Skip to content

Conversation

@Samoed
Copy link
Member

@Samoed Samoed commented Jan 3, 2026

Ref #3498

I’ve started integrating audio statistics. For now, I’ve come up with this format. Do you have any suggestions?

class AudioStatistics(TypedDict):
    """Class for descriptive statistics for audio.

    Attributes:
        total_audio_seconds_length: Total length of all audio clips in total frames
        min_audio_seconds_length: Minimum length of audio clip in seconds
        average_audio_seconds_length: Average length of audio clip in seconds
        max_audio_seconds_length: Maximum length of audio clip in seconds
        unique_audios: Number of unique audio clips
        average_sampling_rate: Average sampling rate
        sampling_rates: Dict of unique sampling rates and their frequencies
    """

    total_audio_seconds_length: float

    min_audio_seconds_length: float
    average_audio_seconds_length: float
    max_audio_seconds_length: float

    unique_audios: int

    average_sampling_rate: float
    sampling_rates: dict[int, int]

@Samoed Samoed added the maeb Audio extension label Jan 3, 2026
@isaac-chung
Copy link
Collaborator

When I see length, I think in seconds. I like the frames approach too, and I'd like it spelled out explicitly (num_frames or whatever). I'd like to see:

  • the max/min/total number of seconds
  • the unique set of sampling rates (specify unit)

Would love to hear other feedback as well while I read into it a bit more.

@Samoed
Copy link
Member Author

Samoed commented Jan 3, 2026

Added seconds and sampling rates

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for adding more. Revisited some papers and maybe we should use the standard measure of audio dataset size.

Copy link
Collaborator

@isaac-chung isaac-chung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to align with HF notation + plus some questions.

Image


for audio in audios:
array = audio["array"]
sampling_rate = audio["sampling_rate"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line assumes there is the sampling_rate key. Based on what you mentioned, this will fail for some datasets then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it's better to fix them to improve benchmark quality overall

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please open an issue to track this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to open. Possible missing sampling rate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all audio should have sampling_rate. So if you say that's not true, then it's an issue.

unique_audios: int

average_sampling_rate: float
sampling_rates: dict[int, int]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be a unique set of sampling rates? OK either way.

Suggested change
sampling_rates: dict[int, int]
sampling_rates: list[int]

@Myahr208

This comment has been minimized.

@Myahr208 Myahr208 mentioned this pull request Jan 3, 2026
@Myahr208

This comment has been minimized.

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things - generally think this looks good (of course Isaac's comments still apply, but nothing more to add)

Co-authored-by: Kenneth Enevoldsen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maeb Audio extension

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants