Structure of the emgfile
What Is an emgfile
The emgfile is the main data structure used by openhdemg. In practical terms, it is a Python dictionary that stores the signals, motor unit discharge times, sampling information, analysis results, and metadata needed by the library functions.
import openhdemg.library as emg
emgfile = emg.emg_from_samplefile()
print(type(emgfile))
print(emgfile.keys())
The exact keys can differ between files. For example, a fully decomposed file can contain raw EMG, reference signals, IPTS, MU discharge times, binary firings, accuracy scores, and extras. A reference-signal-only file can contain only the source, filename, sampling frequency, reference signal, and optional extras.
The v0.2.0 Rule
Since version 0.2.0, the emgfile structure is flexible.
The important rule is:
A standard key can be absent, but when a standard key is present it must keep the standard name and expected data type.
This is different from older releases, where users were encouraged to keep every standard key in the dictionary and fill unavailable data with empty objects. In v0.2.0 and later, it is valid for a file without decomposition results to omit decomposition-specific keys such as MUPULSES, IPTS, ACCURACY, BINARY_MUS_FIRING, and NUMBER_OF_MUS.
This flexibility makes it easier to represent:
- raw EMG files before decomposition;
- reference-signal-only files;
- files with multiple reference-signal channels;
- files containing decomposition metadata;
- files containing ground-truth or reference MU discharge times;
- collections where shared information is stored outside individual modules.
Standard Keys
The following keys are recognised by openhdemg. Not every file needs every key.
| Key |
Expected content |
Typical type |
SOURCE |
Origin of the file, such as OPENHDEMG, DEMUSE, CUSTOMCSV, etc... |
str |
FILENAME |
Name of the source file or saved module. |
str |
RAW_SIGNAL |
Raw HDsEMG signal, samples by channels. |
pandas.DataFrame |
REF_SIGNAL |
Reference signal, samples by reference-signal channels. |
pandas.DataFrame |
ACCURACY |
MU accuracy score, usually SIL for files where it can be computed. |
pandas.DataFrame |
IPTS |
Innervation pulse trains or decomposed source signals. |
pandas.DataFrame |
MUPULSES |
MU discharge times in samples. |
list of 1-D numpy.ndarray |
FSAMP |
Sampling frequency in Hz. |
float |
IED |
Interelectrode distance in mm. |
float |
EMG_LENGTH |
Number of samples in the EMG file. |
int |
NUMBER_OF_MUS |
Number of motor units represented in the file. |
int |
BINARY_MUS_FIRING |
Binary representation of MU discharge times, samples by MUs. |
pandas.DataFrame with np.uint8 values |
EXTRAS |
Additional custom tabular information. |
pandas.DataFrame |
New Optional Keys
Version 0.2.0-beta.1 adds support for optional keys used by the new decomposition, validation, and cleaning workflows.
| Key |
Purpose |
Typical type |
GOOD_CHANNELS |
Stores channel-quality information selected with select_bad_channels(). The decomposition pipeline can use it to exclude bad channels. |
dict |
MU_LABELS |
Optional labels associated with MUs. These labels are preserved or updated when MUs are sorted or deleted. |
list-like object |
REFERENCE_MUPULSES |
Reference discharge times, for example ground truth or an alternative detection method. |
list of 1-D numpy.ndarray |
ROA_WITH_REFERENCE_MUPULSES |
Rate of agreement between MUPULSES and REFERENCE_MUPULSES. |
pandas.DataFrame |
DECOMPOSITION_PARAMETERS |
Metadata generated by the decomposition pipeline, including method and filtering settings. |
dict |
Custom keys can be added when needed. The safest pattern is to use custom keys for project-specific metadata and to avoid changing the meaning of standard keys.
Inspect an emgfile
The info class remains the easiest way to inspect a file:
import openhdemg.library as emg
emgfile = emg.emg_from_samplefile()
info = emg.info()
info.data(emgfile)
You can also check keys directly:
if "REF_SIGNAL" in emgfile:
print(emgfile["REF_SIGNAL"].head())
Checking key presence before using optional data is now the recommended pattern.
Standardise Data Types
Use standardise_emgfile_dtypes() when you create or modify an emgfile manually. It returns a deep copy of the file with recognised keys converted to the expected data types.
import openhdemg.library as emg
emgfile = emg.emg_from_samplefile()
standard_emgfile = emg.standardise_emgfile_dtypes(emgfile)
The function standardises recognised keys such as:
RAW_SIGNAL, REF_SIGNAL, ACCURACY, IPTS, and ROA_WITH_REFERENCE_MUPULSES as pandas DataFrames with numeric values;
MUPULSES and REFERENCE_MUPULSES as lists of 1-D NumPy arrays;
FSAMP, IED, EMG_LENGTH, and NUMBER_OF_MUS as numeric scalars;
BINARY_MUS_FIRING as a pandas DataFrame with np.uint8 values;
GOOD_CHANNELS as a dictionary with string keys and integer values.
Additional custom keys are preserved but not type-checked.
Multiple Reference Signals
REF_SIGNAL is a pandas DataFrame. In v0.2.0, many functions can work with a selected reference-signal channel.
import openhdemg.library as emg
emgfile = emg.emg_from_samplefile()
# Plot channel 0 of REF_SIGNAL.
emg.plot_refsig(emgfile, refsig_channel=0)
# Use channel 0 when calculating basic MU properties.
results = emg.basic_mus_properties(
emgfile=emgfile,
mvc=634,
refsig_channel=0,
)
Functions that transform the reference signal can process selected channels through refsig_channels:
filtered = emg.filter_refsig(
emgfile=emgfile,
cutoff=15,
refsig_channels=[0],
)
offset_removed = emg.remove_offset(
emgfile=filtered,
auto=1024,
refsig_channels=[0],
)
See Multiple reference signals for a dedicated workflow.
Files With No Motor Units
Some files legitimately contain no MUs. For example, a raw EMG file before decomposition can contain RAW_SIGNAL, REF_SIGNAL, and FSAMP, but no MUPULSES. A decomposition attempt can also return an emgfile with NUMBER_OF_MUS == 0.
When writing custom code, avoid assuming that MU-specific keys are always present:
if emgfile.get("NUMBER_OF_MUS", 0) > 0 and "MUPULSES" in emgfile:
emg.plot_mupulses(emgfile)
else:
print("No motor units are available in this file.")
Plotting functions in v0.2.0 are more robust with empty MU files, but checking your workflow explicitly makes scripts easier to understand and debug.
Create Your Own emgfile
You can create a custom emgfile when importing data from an unsupported source. The minimum content depends on what you want to do.
A real example can be found in the tutorial Import from other software, while below you can understand the theoretical framework.
For a raw file intended for decomposition:
import pandas as pd
import openhdemg.library as emg
raw_signal = pd.DataFrame(raw_signal_array)
raw_signal.columns = range(raw_signal.shape[1]) # columns must be base-0 integers
ref_signal = pd.DataFrame(ref_signal_array, columns=[0])
emgfile = {
"SOURCE": "CUSTOM",
"FILENAME": "participant_01_trial_01",
"RAW_SIGNAL": raw_signal,
"EMG_LENGTH": raw_signal.shape[0],
"REF_SIGNAL": ref_signal,
"FSAMP": 2048.0,
"IED": 10.0
}
emgfile = emg.standardise_emgfile_dtypes(emgfile)
For a decomposed file intended for MU analysis, include also:
MUPULSES;
EMG_LENGTH;
NUMBER_OF_MUS;
IPTS;
BINARY_MUS_FIRING;
ACCURACY;
You can create BINARY_MUS_FIRING from MUPULSES (or viceversa):
emgfile["BINARY_MUS_FIRING"] = emg.create_binary_firings(
emg_length=emgfile["EMG_LENGTH"],
number_of_mus=emgfile["NUMBER_OF_MUS"],
mupulses=emgfile["MUPULSES"],
)
emgfile = emg.standardise_emgfile_dtypes(emgfile)
DataFrame column names
All standard DataFrames except REF_SIGNAL and EXTRAS should have columns named as base-0 integers: 0, 1, 2, ....
If no reference signal is available, it is good practice to include REF_SIGNAL as a one-column DataFrame named 0 and filled with zeros. This improves compatibility with functions that expect a reference signal to be present.
emgfile["REF_SIGNAL"] = pd.DataFrame(0, index=raw_signal.index, columns=[0])
Save the File
For new v0.2.0 workflows, save edited data as a binary module:
import openhdemg.library as emg
emg.save_openhdemg_module(
emgfile=emgfile,
path="C:/Users/.../Desktop/openhdemg_modules",
module_name="participant_01_trial_01",
compresslevel=1,
add_checksum=True,
)
JSON functions are still available for compatibility, but binary modules and collections are recommended for new analyses. See Save and load binary modules and Manage collections for details.
More Questions?
We hope that this tutorial was useful. If you need additional information, read the answers or ask a question in the openhdemg discussion section. If you are not familiar with GitHub discussions, please read this post. This will allow the openhdemg community to answer your questions.