A Python toolkit for parsing and analyzing metadata from DICOM files.

This project utilizes the pydicom and fastcore libraries. It borrows ideas (and some code) from the fastai.medical.imaging library (source).

The metadata preprocessing and series selection algorithm are recreated from the paper by Gauriau et al. (reference below), in which a Random Forest classifier is trained to predict the sequence type (e.g. T1, T2, FLAIR, ...) of series of images from brain MRI. Such a tool may be used to select the appropriate series of images for input into a machine learning pipeline.

Reference: Gauriau R, et al. Using DICOM Metadata for Radiological Image Series Categorization: a Feasibility Study on Large Clinical Brain MRI Datasets. Journal of Digital Imaging. 2020 Jan; 33:747–762. (link to paper)

Install

  1. git clone the repository
  2. cd into the repo
  3. pip install . (include the -e flag for an editable install)

How to use

Read a DICOM file:

from pydicom.data import get_testdata_file

path = Path(get_testdata_file("MR_truncated.dcm"))
ds = path.dcmread()
ds.file_meta
(0002, 0000) File Meta Information Group Length  UL: 190
(0002, 0001) File Meta Information Version       OB: b'\x00\x01'
(0002, 0002) Media Storage SOP Class UID         UI: MR Image Storage
(0002, 0003) Media Storage SOP Instance UID      UI: 1.3.6.1.4.1.5962.1.1.4.1.1.20040826185059.5457
(0002, 0010) Transfer Syntax UID                 UI: Explicit VR Little Endian
(0002, 0012) Implementation Class UID            UI: 1.3.6.1.4.1.5962.2
(0002, 0013) Implementation Version Name         SH: 'DCTOOL100'
(0002, 0016) Source Application Entity Title     AE: 'CLUNIE1'

Import a select subset of DICOM metadata into a pandas.DataFrame. The subset is defined in dicomtools.core and is based on the metadata used for the series selection algorithm in the paper referenced above.

df = pd.DataFrame.from_dicoms([path]).drop('fname', axis=1)
df.T
0
ImageType [DERIVED, SECONDARY, OTHER]
SOPClassUID MR Image Storage
PatientID 4MR1
ContrastBolusAgent
ScanningSequence SE
SequenceVariant NONE
ScanOptions
MRAcquisitionType 3D
SliceThickness 0.8
RepetitionTime 4000
EchoTime 240
EchoTrainLength None
StudyInstanceUID 1.3.6.1.4.1.5962.1.2.4.20040826185059.5457
SeriesInstanceUID 1.3.6.1.4.1.5962.1.3.4.1.20040826185059.5457
StudyID 4MR1
SeriesNumber 1
AcquisitionNumber 0
InstanceNumber 1
ImageOrientationPatient [1.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000]
PhotometricInterpretation MONOCHROME2
PixelSpacing [0.3125, 0.3125]

class Finder[source]

Finder(path)

A class for finding DICOM files of a specified sequence type from a specific .

Finder.predict[source]

Finder.predict()

Obtains predictions from the model specified in model_path

Finder.find[source]

Finder.find(plane='ax', seq='t1', contrast=True, thresh=0.8, **kwargs)

Returns a pandas.DataFrame with predicted sequences matching the query at the specified threshold