In this post, we will outline some of the challenges we had and the solutions we have devised while writing the code for the deep learning model trained for classifying knee anterior cruciate ligament (ACL) injuries. If you haven’t already, please read the previous two blog posts detailing the process of coming up with a functional deep learning (DL) model.


As explained in the previous blog post, the segmentation preprocessing is done in several steps:

Each of the preprocessing steps were implemented separately, as a class which encompassed the algorithm it was created to run. All the preprocessing steps were then combined into a single preprocessor class which was included in our image processing pipeline.

Nonnegative matrix factorization basis

To generate the nonnegative matrices, we used the library scikit.learn. For better understanding of the parameters used in the NMF function, refer to the scikit-learn documentation.

# reshape image
X = np.transpose(np.array(list(map(lambda x: x.reshape(-1), X))))
# compute NMF from scikit.learn toolkit
model = NMF(n_components=self.nb_components, init='nndsvd', max_iter= 600,  random_state=0, l1_ratio = 0.8, alpha = 10)
Y = model.fit_transform(X)
Z = model.components_ # factorization matrix/dictionary
# get the most relevant values from the result.

basis = []
for i, comp in enumerate(most_relevant):
    ... # fill the basis list
basis = np.array(basis) # return the basis as a numpy array

The result is a new matrix which will be used as the input set of data for the clustering preprocessing step.

Foreground mask

The algorithm for creating the foreground mask is outlined in the code below. At this point, the premask is already created. The mask itself is a set of voxels (s,x,y) where s is slice, and (x,y) is a pixel of a foreground.

# initialize mask
mask = np.zeros((img.shape[0],img.shape[1])).astype(np.uint8)

# fill gaps between leftmost and rightmost knee border
H,W = mask.shape
for i in range(H):
    bounds = [] # store leftmost, rightmost pixel per row
    interior = 0 # check if interior, i.e. inside the bounds of the knee in the image
    for j in range(W):        
        ... # add the value to the bounds list
    if bounds:
        ... # if the bound are detected, create a mask from the bounds values

K-means clustering

After creating the nonnegative matrix factorization basis and the foreground mask, we normalize the image values and cluster the values by pixel intensity. We used the scikit-learn KMeans algorithm implementation.

# normalization
means      = np.mean(px_intensity,0)
stds       = np.std(px_intensity,0)
pixels_std = (px_intensity - means)/stds

# cluster into intensity segments
segments   = KMeans(n_clusters=self.K, random_state=self.random_state, n_jobs = -1).fit(pixels_std)
centroids  = stds*(segments.cluster_centers_) + means
labels     = segments.labels_

# add pixels to clusters            
clusters = [[] for i in range(self.K)]
for i in range(num_fg_voxels):


The theoretical side of segmentation along with the idea behind it was explained in more detail in the previous blog post entry of the series. Here, we will describe how it was implemented in code.

Following the same pattern as with preprocessing, the segmentation algorithm was also implemented as a class with each individual piece being an input parameter, thus achieving modularity.

The segmentation class uses the previously mentioned preprocessing techniques - NMF, and clustering. It combines them into a single algorithm which adds segmentation on top of it.

# get NMF basis
basis = self.NMFactorizer(X)
# cluster FG pixels of basis                
clusters,X = self.MRIClusterer(X)

# invert image -> segmentation
S,H,W = X.shape
Xtrans = np.zeros((S,H,W))
MAX_PIX_VALUE = np.max(X)
for k in range(self.kmin, self.kmax + 1, 1): # number of clusters
    for idx in clusters[k]:
        ... # invert the image

Since segmentation outputs sparse data (the only part of the image we are interested is the patient’s knee), we decided to use sparse logistic regression as our classification algorithm.

Working with the DICOM data format

DICOM or Digital Imaging and Communications in Medicine is a standard encompassing various parts related to the management of modern radiological imaging. The data format module focuses on setting a standard for storing the medical images along with all their relevant metadata.

The metadata is stored in the form of tags. While the tags have an extensive set of predefined groups and tag names, custom tags are widely used, which proved to be a problem when trying to write generic code applicable to images obtained from different MRI machines.


The first issue we ran into was having to anonymize the data contained in the files. The personally identifiable data (PII) is often stored in custom data tags so we had to create a way to delete all the predefined tags, as well as all the custom tags that hold such information.

While trawling through the DICOM standard and the datasets we had, we found a set of tags which contain PII. This set of tags was compiled into a list which was later used in the user-facing application - all tags from the list were deleted from all DICOM files before any processing or storage of the DICOM files containing those tags. The issue with PII being embedded in the image itself, or in a variety of custom tags, still remains, so we manually went through the datasets that were available to us - we anonymized the files or removed them from the dataset. The DICOM file format is very flexible, but sometimes that causes issues, like we found out with anonymization.

Detecting image plane

Another issue we came upon was having a reliable way of determining the image plane. An MRI usually consists of images in three planes - sagittal, coronal, and transverse. Medical experts we consulted suggested the use of images in the sagittal plane, as it most clearly shows the ACL pathology. We needed a way of filtering out unwanted images from our datasets (and later, user-uploaded sets).

The solution was found in the Image Orientation (Patient) tag. As the standard says: Image Orientation (0020,0037) specifies the direction cosines of the first row and the first column with respect to the patient. This means that we could detect the orientation of the patient by looking at the values in the aforementioned tag.

The Image Orientation tag value is actually two vectors: X and Y which are located in 3d space. X determines the direction of the image rows in the reference coordinate system, while Y determines the direction of the image columns in the reference coordinate system.

As and example the tag value looks like this: -0.17\0.98\0\0\0-1

We can show the values more clearly in a table:

Xx Xy Xz Yx Yy Yz
-0.17 0.98 0 0 0 -1


  • x - points from the patients right to their left
  • y - points from front to back of the patient
  • z - points from the patients feet to their head

Using the directions of the two vectors, we can determine the plane the image was taken in. In the example above, the image is in the sagittal plane, as vector X points from the patients front to their back (with a slight angle in the x direction), while the y vector points from the patients feet to their head.

Special mention to the DicomIsEasy blog. It was a huge help for us to better our understanding of the DICOM file format.

“Framework” for logging and running experiments

As running a large number of experiments is tedious, and time-consuming, we “organically” developed a framework for running the experiments.

This framework consisted of multiple “modules”, so to say, allowing us to swap out parts of the training process, such as preprocessing techniques, networks used for running the experiments, hyperparameters like the number of epochs, and whether we wanted to train the network, fine-tune it, or just extract features from it.

We found the logging module to be the most important part of the framework. Having automated, detailed logging embedded in the framework, enabled us to easily digest the information acquired from the experiment.

Since the focus of our project was on improving the deep learning model, the framework we developed was a basic one, as it fits our needs, and more importantly, our workflow.

The framework was a simple series of Python 3 scripts, and the modules were organized as follows:

  • Entry script - takes all the input parameters and runs the entire training/feature extraction process
  • Preprocessing module - allowed us to plug-in code that preprocessed the images. Examples include: image resize, image normalization, and finally, image segmentation
  • “Training” module - could be either the training module consisting of the architecture specified in the input parameters, or a feature extraction module with the specified architecture
  • Logging modules - logs AUC, loss, and precision for both training and validation sets and draws a graph of those parameters through training epochs


While deep learning is a powerful tool by itself, developing a better understanding of the problem and the tools at your disposal is the key to solving that problem when working with real-life constraints.

A quick development iteration cycle allowed us to focus on solving the problem at hand - improving the accuracy of the model. We achieved this by creating an image segmentation module with isolated the relevant image features. Although the segmentation solution increased the accuracy of the model, it can still be improved upon.

More articles

Related posts