Skip to content

Structure of the emgfile

What Is an emgfile

The emgfile is the main data structure used by openhdemg. In practical terms, it is a Python dictionary that stores the signals, motor unit discharge times, sampling information, analysis results, and metadata needed by the library functions.

import openhdemg.library as emg

emgfile = emg.emg_from_samplefile()

print(type(emgfile))
print(emgfile.keys())

The exact keys can differ between files. For example, a fully decomposed file can contain raw EMG, reference signals, IPTS, MU discharge times, binary firings, accuracy scores, and extras. A reference-signal-only file can contain only the source, filename, sampling frequency, reference signal, and optional extras.

The v0.2.0 Rule

Since version 0.2.0, the emgfile structure is flexible.

The important rule is:

A standard key can be absent, but when a standard key is present it must keep the standard name and expected data type.

This is different from older releases, where users were encouraged to keep every standard key in the dictionary and fill unavailable data with empty objects. In v0.2.0 and later, it is valid for a file without decomposition results to omit decomposition-specific keys such as MUPULSES, IPTS, ACCURACY, BINARY_MUS_FIRING, and NUMBER_OF_MUS.

This flexibility makes it easier to represent:

  • raw EMG files before decomposition;
  • reference-signal-only files;
  • files with multiple reference-signal channels;
  • files containing decomposition metadata;
  • files containing ground-truth or reference MU discharge times;
  • collections where shared information is stored outside individual modules.

Standard Keys

The following keys are recognised by openhdemg. Not every file needs every key.

Key Expected content Typical type
SOURCE Origin of the file, such as OPENHDEMG, DEMUSE, CUSTOMCSV, etc... str
FILENAME Name of the source file or saved module. str
RAW_SIGNAL Raw HDsEMG signal, samples by channels. pandas.DataFrame
REF_SIGNAL Reference signal, samples by reference-signal channels. pandas.DataFrame
ACCURACY MU accuracy score, usually SIL for files where it can be computed. pandas.DataFrame
IPTS Innervation pulse trains or decomposed source signals. pandas.DataFrame
MUPULSES MU discharge times in samples. list of 1-D numpy.ndarray
FSAMP Sampling frequency in Hz. float
IED Interelectrode distance in mm. float
EMG_LENGTH Number of samples in the EMG file. int
NUMBER_OF_MUS Number of motor units represented in the file. int
BINARY_MUS_FIRING Binary representation of MU discharge times, samples by MUs. pandas.DataFrame with np.uint8 values
EXTRAS Additional custom tabular information. pandas.DataFrame

New Optional Keys

Version 0.2.0-beta.1 adds support for optional keys used by the new decomposition, validation, and cleaning workflows.

Key Purpose Typical type
GOOD_CHANNELS Stores channel-quality information selected with select_bad_channels(). The decomposition pipeline can use it to exclude bad channels. dict
MU_LABELS Optional labels associated with MUs. These labels are preserved or updated when MUs are sorted or deleted. list-like object
REFERENCE_MUPULSES Reference discharge times, for example ground truth or an alternative detection method. list of 1-D numpy.ndarray
ROA_WITH_REFERENCE_MUPULSES Rate of agreement between MUPULSES and REFERENCE_MUPULSES. pandas.DataFrame
DECOMPOSITION_PARAMETERS Metadata generated by the decomposition pipeline, including method and filtering settings. dict

Custom keys can be added when needed. The safest pattern is to use custom keys for project-specific metadata and to avoid changing the meaning of standard keys.

Inspect an emgfile

The info class remains the easiest way to inspect a file:

import openhdemg.library as emg

emgfile = emg.emg_from_samplefile()

info = emg.info()
info.data(emgfile)

You can also check keys directly:

if "REF_SIGNAL" in emgfile:
    print(emgfile["REF_SIGNAL"].head())

Checking key presence before using optional data is now the recommended pattern.

Standardise Data Types

Use standardise_emgfile_dtypes() when you create or modify an emgfile manually. It returns a deep copy of the file with recognised keys converted to the expected data types.

import openhdemg.library as emg

emgfile = emg.emg_from_samplefile()
standard_emgfile = emg.standardise_emgfile_dtypes(emgfile)

The function standardises recognised keys such as:

  • RAW_SIGNAL, REF_SIGNAL, ACCURACY, IPTS, and ROA_WITH_REFERENCE_MUPULSES as pandas DataFrames with numeric values;
  • MUPULSES and REFERENCE_MUPULSES as lists of 1-D NumPy arrays;
  • FSAMP, IED, EMG_LENGTH, and NUMBER_OF_MUS as numeric scalars;
  • BINARY_MUS_FIRING as a pandas DataFrame with np.uint8 values;
  • GOOD_CHANNELS as a dictionary with string keys and integer values.

Additional custom keys are preserved but not type-checked.

Multiple Reference Signals

REF_SIGNAL is a pandas DataFrame. In v0.2.0, many functions can work with a selected reference-signal channel.

import openhdemg.library as emg

emgfile = emg.emg_from_samplefile()

# Plot channel 0 of REF_SIGNAL.
emg.plot_refsig(emgfile, refsig_channel=0)

# Use channel 0 when calculating basic MU properties.
results = emg.basic_mus_properties(
    emgfile=emgfile,
    mvc=634,
    refsig_channel=0,
)

Functions that transform the reference signal can process selected channels through refsig_channels:

filtered = emg.filter_refsig(
    emgfile=emgfile,
    cutoff=15,
    refsig_channels=[0],
)

offset_removed = emg.remove_offset(
    emgfile=filtered,
    auto=1024,
    refsig_channels=[0],
)

See Multiple reference signals for a dedicated workflow.

Files With No Motor Units

Some files legitimately contain no MUs. For example, a raw EMG file before decomposition can contain RAW_SIGNAL, REF_SIGNAL, and FSAMP, but no MUPULSES. A decomposition attempt can also return an emgfile with NUMBER_OF_MUS == 0.

When writing custom code, avoid assuming that MU-specific keys are always present:

if emgfile.get("NUMBER_OF_MUS", 0) > 0 and "MUPULSES" in emgfile:
    emg.plot_mupulses(emgfile)
else:
    print("No motor units are available in this file.")

Plotting functions in v0.2.0 are more robust with empty MU files, but checking your workflow explicitly makes scripts easier to understand and debug.

Create Your Own emgfile

You can create a custom emgfile when importing data from an unsupported source. The minimum content depends on what you want to do.

A real example can be found in the tutorial Import from other software, while below you can understand the theoretical framework.

For a raw file intended for decomposition:

import pandas as pd
import openhdemg.library as emg

raw_signal = pd.DataFrame(raw_signal_array)
raw_signal.columns = range(raw_signal.shape[1])  # columns must be base-0 integers

ref_signal = pd.DataFrame(ref_signal_array, columns=[0])

emgfile = {
    "SOURCE": "CUSTOM",
    "FILENAME": "participant_01_trial_01",
    "RAW_SIGNAL": raw_signal,
    "EMG_LENGTH": raw_signal.shape[0],
    "REF_SIGNAL": ref_signal,
    "FSAMP": 2048.0,
    "IED": 10.0
}

emgfile = emg.standardise_emgfile_dtypes(emgfile)

For a decomposed file intended for MU analysis, include also:

  • MUPULSES;
  • EMG_LENGTH;
  • NUMBER_OF_MUS;
  • IPTS;
  • BINARY_MUS_FIRING;
  • ACCURACY;

You can create BINARY_MUS_FIRING from MUPULSES (or viceversa):

emgfile["BINARY_MUS_FIRING"] = emg.create_binary_firings(
    emg_length=emgfile["EMG_LENGTH"],
    number_of_mus=emgfile["NUMBER_OF_MUS"],
    mupulses=emgfile["MUPULSES"],
)

emgfile = emg.standardise_emgfile_dtypes(emgfile)

DataFrame column names

All standard DataFrames except REF_SIGNAL and EXTRAS should have columns named as base-0 integers: 0, 1, 2, ....

If no reference signal is available, it is good practice to include REF_SIGNAL as a one-column DataFrame named 0 and filled with zeros. This improves compatibility with functions that expect a reference signal to be present.

emgfile["REF_SIGNAL"] = pd.DataFrame(0, index=raw_signal.index, columns=[0])

Save the File

For new v0.2.0 workflows, save edited data as a binary module:

import openhdemg.library as emg

emg.save_openhdemg_module(
    emgfile=emgfile,
    path="C:/Users/.../Desktop/openhdemg_modules",
    module_name="participant_01_trial_01",
    compresslevel=1,
    add_checksum=True,
)

JSON functions are still available for compatibility, but binary modules and collections are recommended for new analyses. See Save and load binary modules and Manage collections for details.

More Questions?

We hope that this tutorial was useful. If you need additional information, read the answers or ask a question in the openhdemg discussion section. If you are not familiar with GitHub discussions, please read this post. This will allow the openhdemg community to answer your questions.