Building an AI-based image-classifier application
Image classification models are frequently used in applied settings like machine vision, yet online resources often tend to focus on the modeling part only. Here, we are going to look at how a convolutional neural network can be embedded into a full image-classification application with graphical user interface.
Image classification models have gained widespread attention since the successful implementation of a deep convolutional neural network for large-scale image-classification tasks. Courses and textbooks tend to focus on the coding of the models themselves, but not so often do they treat the exercise of embedding them in a program that can actually implement a classification task by incorporating a classifier model. A program like this should ideally enable a user to pick a computer folder of image for classification, run the classifier model on it, and retrieve another folder with the same images sorted into sub-folders based on the model' predictions. Ideally, such a program would have a graphical user interface, freeing the user from handling / manipulating code. Extended features of the program could include i.a. the possibility of choosing between different classifier models (sets of parameters and / or model architectures), the handling of training an existing model on new or updated training sets, or calculating performance statistics on a set of model classifications before and after manual validation (assuming that such validation is a key part of an image classification pipeline and that the classifier model is used to speed up classification but not to completely eliminate the human component).
The simplest form of a classifier program would require performing the following steps: i) acquiring and organizing training- and test data, ii) fitting a classifier model on these data and saving the fitted parameters to disk, iii) writing a classification script requiring certain input parameters (e.g. the path to an image folder) and iv) writing a user-interface script calling the classification script as its final step. All these tasks can be implemented with relative ease using the Python programming language due to its focus on applied programming. Steps i) to iii) can be implemented with standard packages like os and numpy, and with Tensorflow and Keras for training and making predictions with the classifier model. Step iv) requires software that allows us to generate and design a graphical user interface; the Python package Tkinter is well suited for this purpose.
We will here work with the MNIST data set of images of hand-written digits. For step i), we can download them using the mnist.load_data() function from the "datasets" module of the keras package (a package tailored towards quick and easy implementation of neural-network-based models). Using specific Python synthax, we can assign the list components of this data set directly to a training-set object and to a test-set object. Each of them is a list of two objects: an array of images, each in gray-scale and with dimensions 28x28 pixels, and a vector whose length equals the number of images, with each element being a digit, i.e. the class label corresponding to the image of same index. We will split the test data set into two subsets: A real test set used for validating model performance during training, and a "sample" set used to test the final classifier application on. To that end, we shuffle the test images and test labels randomly (using the same vector of random indices for both), and assign the first half of test data to the final test set and the second half to the sample set.
import numpy as np import os from PIL import Image from keras import datasets train, test = datasets.mnist.load_data() inds = np.random.choice(np.arange(len(test[1])), len(test[1]), replace = False) test_img = test[0][inds] test_lbl = test[1][inds] test = (test_img[:5000], test_lbl[:5000]) sample = (test_img[5000:], test_lbl[5000:])
We then create a folder "MNIST" with sub-folders "Train", "Test" and "Sample". Then we create a class folder for each unique class (each unique digit from 0 to 9) in the "Train" and "Test" folders. We save the training- and test images to disk into the class folders according to their respective labels. Of course, we would not necessarily need to do that to train a classifier model on the MNIST data, as, as seen above, they can be loaded directly into the Python session using the keras package. But, this demonstration is meant to be reflective of the more common case where custom training data sets are to be used which of course are not part of some package, or are too large to be loaded into the Python session all at once. Finally, we save all "sample" images to the "sample" folder on disk (which, of course, does not contain class sub-folders). For the purpose of using the later class-prediction function, we move them into an "dummy", as that function requires that all image folders are contained in a specific, otherwise empty folder.
os.mkdir('MNIST') os.mkdir('MNIST/Train') os.mkdir('MNIST/Test') os.mkdir('MNIST/Sample') digits = np.unique(train[1]) for i in range(len(digits)): os.mkdir('MNIST/Train/' + str(digits[i])) os.mkdir('MNIST/Test/' + str(digits[i])) for i in range(len(train[0])): img_jpg = Image.fromarray(train[0][i]) img_jpg.save('MNIST/Train/' + str(train[1][i]) + '/img_' + str(i) + '.jpg') for i in range(len(test[0])): img_jpg = Image.fromarray(test[0][i]) img_jpg.save('MNIST/Test/' + str(test[1][i]) + '/img_' + str(i) + '.jpg') for i in range(len(sample[0])): img_jpg = Image.fromarray(sample[0][i]) img_jpg.save('MNIST/Sample/img_' + str(i) + '.jpg')
Next, for step ii), we will train the classifier model. We make use of the Keras package for Python, with Tensorflow backend. In our training script, we first import the required packages and package modules. Then, we define the directory for the image folders containing the training- and testing images. Next, we set up the architecture of our classifier model. We here work with a fairly simple convolutional neural network with two convolutional layers (l2 and l3) and two fully-connected layers (l5 and l6). Given that the classification of the MNIST letters is a fairly easy task to learn, we should be able to achieve high accuracy and generalization ability (test accuracy) even with this simple architecture. We set an input image size of 27x27 pixels, which we define in the input layer, and incorporate a maxpooling layer to more quickly scale down the size of the hidden representations without the need for additional layers and parameters. The classifier model is built by using the models.Model() function from the Keras package, providing the name of the input- and of the output layers (the layers as defined above are only an empty shell of code; the models.Model() function builds an actual computational graph out of this network information and sets initial parameter values).
"""Import packages""" import pandas as pd from tensorflow.keras import layers, models from tensorflow.keras.preprocessing.image import ImageDataGenerator """Set image directories""" img_dir = 'MNIST' train_dir = img_dir + '/Train' test_dir = img_dir + '/Test' """Define model architecture""" n_digits = 10 l1 = layers.Input(shape = [27,27,3]) l2 = layers.Conv2D(30, [3,3], activation = 'relu')(l1) l2a = layers.MaxPool2D([2,2])(l2) l3 = layers.Conv2D(30, [3,3], activation = 'relu')(l2a) l4 = layers.Flatten()(l3) l5 = layers.Dense(300, activation = 'relu')(l4) l6 = layers.Dense(50, activation = 'relu')(l5) l7 = layers.Dense(n_digits, activation = 'softmax')(l6) """Initialize model""" classifier = models.Model([l1], [l7])
Then we set up a so-called "image-data generator", i.e. a function that loads batches of images from disk into the Python session. The ides behind using an image-data generator is that in many cases, the number of training images is too large, and / or the resolution of the images is too high, to be able to load them into memory all at once. As we saw above, we were actually able to do just that with the MNIST data, but it is a specificity of just that data set, and we cannot assume that every data set can be loaded entirely into memory. The arguments we provide to the image-data generator are the directory containing the training-image class folders, the image size, and the batch size (both of the latter must be adjusted depending on what is optimal for the training set and classification task at hand). We also define a similar function for the validation images. We then compile the classifier model (providing the name of the optimizer to be used for the gradient-descent algorithm and the loss function to be computed from the class labels and the predicted labels). Finally we fit ("train") the classifier model using the fit_generator function, providing the names of the image-data-generator functions, the number of steps required to cover all training images in one training epoch (this is equal to the total number of training images divided by the batch size) and the number of training epochs (basically the number of parameter updates calculated - again this depends on the data and classification task at hand, and is often a trade-off between performance and feasibility (with respect to required training time).
"""Set up image-data generators""" datagen = ImageDataGenerator(rescale=1./255) train_generator = datagen.flow_from_directory( directory = train_dir, target_size = [27,27], batch_size = 10, class_mode = 'categorical') n_train_imgs = len(train_generator.filenames) test_generator = datagen.flow_from_directory( directory = test_dir, target_size = [27,27], batch_size = 1, class_mode = 'categorical') n_test_imgs = len(test_generator.filenames) """Compile model""" classifier.compile( optimizer = 'Adam', loss = 'categorical_crossentropy', metrics = ['accuracy']) """Fit model""" history = classifier.fit_generator( generator = train_generator, steps_per_epoch = int(n_train_imgs / 10), validation_data = test_generator, validation_steps = n_test_imgs, epochs = 5)
Finally we save the fitted classifier model to disk as a .h5 file using the save() functional attribute. We also save the training history, i.e. the trajectory of training- and validation losses, to disk as a .csv file for later reference.
classifier.save('MNIST/classifier.h5') history = pd.DataFrame(history.history) history.to_csv('MNIST/train_history.csv', index = False)
For step iii) we write a script that will classify unsorted images from a given folder using a given classifier model. This script will take certain input arguments from our graphical-user-interface script which we are going to write in step iv), so when writing the classifier script we need to be aware of this and ignore warning messages about undefined objects in our code editor (e.g. Spyder). The input arguments that will be provided "from outside" are the number of classes ( "n_digits"), the directory where the classifier weights are stored on disk ("weights_dir"), the name of the file containing the classifier weights ("weights_name") and the directory containing the image folder with images to be classified ("sample_dir"). Certainly it would be possible to provide even more flexibility by requiring even more external inputs, e.g. the classifier architecture, but for illustrative purpose we will work with the four inputs described above for now.
We start again by loading all required packages and modules. Note that in this case we also load the os and shutil packages to later enable creating directories and copying images between folders on disk (the procedures described here apply to a Linux operating system and may required modification for use with different operating systems). We then set up the classifier architecture and construct a model (i.e. a computational graph) from that. The architecture should be identical to that used in the training process, as otherwise the fitted weights we wish to utilize will not match in number and structure. We then load the fitted weights into the model using the load_weights() functional attribute. Here we require the directory to the weights file (.h5 file) provided by the user in the GUI script (step iv)).
import numpy as np import pandas as pd import os import shutil from tensorflow.keras import layers, models from tensorflow.keras.preprocessing.image import ImageDataGenerator """User input""" # n_digits = 10 # weights_dir='MNIST' # weights_name = 'classifier.h5' # sample_dir = 'MNIST/Sample' """End user input""" n_digits = int(n_digits) l1 = layers.Input(shape = [27,27,3]) l2 = layers.Conv2D(30, [3,3], activation = 'relu')(l1) l2a = layers.MaxPool2D([2,2])(l2) l3 = layers.Conv2D(30, [3,3], activation = 'relu')(l2a) l4 = layers.Flatten()(l3) l5 = layers.Dense(300, activation = 'relu')(l4) l6 = layers.Dense(50, activation = 'relu')(l5) l7 = layers.Dense(n_digits, activation = 'softmax')(l6) classifier = models.Model([l1], [l7]) classifier.load_weights(weights_dir + '/' + weights_name)
We then proceed by moving all the images to be classified into a sub-folder which we name dummy_class. The reason is that the Keras classifier function needs to be able to find the images to be classified in a sub-folder (which does not mean that this folder must be named "dummy_class"). The alternative, simply providing the super-folder of the images folder as the target directory, is not useful as the classifier function might then find image- and non-image-files folders and stop with an error message. Therefore we copy all the images into the dummy_class sub-folder using the os.rename() function (renaming a path is often equivalent to moving a file). Here, we require the directory of the image folder provided by the user in the GUI script ("sample_dir"). (Note that we do not actually move the image into the dummy_class folder right now - we only write a script to do this for every image folder we want to apply the GUI on - this way, we alleviate this task from the GUI user who might not be knowledgeable or interested in technicalities of the classification process from this task).
os.mkdir(sample_dir + '/dummy_class') for i in os.listdir(sample_dir): if i != 'dummy_class': os.rename(sample_dir + '/' + i, sample_dir + '/dummy_class/' + i)
We then set up an ImageDataGenerator function that will successively load batches of the images to be classified into the Python session (i.e. into memory). This is done since in many instances the total number of images to be classified is too large to be loaded into memory all at once. The function also divides all pixel values in every image by 255 (the maximum number of gray-scale levels on the computer) in order to normalize them (the classifier model was trained on such normalized values), and rescales them to a resolution of 27x27 pixels (the optimum resolution will vary for the type of image at hand, but also affects processing speed). We set the batch size to one (in order to avoid confusion about the order in which the images are loaded from disk, which is later important for applying the classification for moving the images on disk). For the same reason we set the shuffle argument to False. Working with batch sizes larger than one image and shuffling the images is indeed only important during the model-fitting phase in order to try to make the model more generalizable, but are not of further relevance during the predictions-making / classification step.
global datagen datagen = ImageDataGenerator(rescale=1./255) global sample_generator sample_generator = datagen.flow_from_directory( directory = sample_dir, target_size = [27,27], batch_size = 1, class_mode = 'categorical', shuffle = False)
We then use the classifier model to make a class prediction for each image using the predict_generator() function (the function accesses the image-data-generator function to access the images). We "interpret" the class prediction of a given image as the index with highest value of the model"s output vector (the number of elements of this vector is equal to the number of classes). Hence we use the np.argmax() function to derive the class prediction from the model output vector (the actual values are the probabilities that the model assigns to each class; these are not directly indicative of the probability of a prediction being correct or wrong, though).
global n_sample_imgs n_sample_imgs = len(sample_generator.filenames) global preds preds = classifier.predict_generator( sample_generator ) global preds_argmax preds_argmax = np.argmax(preds, axis = 1)
Next, we want to use the class predictions to sort the images into class folders. To this end, we first create a new sub-folder named classified, which will be in the same directory as the dummy_class folder. We create a sub-folder for each class within the classified folder. Then, we obtain the names of the image files from the image-data-generator function (you can now see why it was necessary to set all arguments in this function to non-random). Finally we use the copy2() function from the shutil package to copy each image from the original image folder (dummy_class) to the appropriate class sub-folder in the classified folder, based on the class prediction (alternatively, especially when hard-drive space is an issue, we could move the images using the rename() function from the os package).
os.mkdir(sample_dir + '/classified') for i in range(len(np.unique(preds_argmax))): os.mkdir(sample_dir + '/classified/' + str(np.unique(preds_argmax)[i])) global image_names image_names = sample_generator.filenames for i in range(len(image_names)): shutil.copy2(sample_generator.filepaths[i], sample_dir + '/classified/' + str(preds_argmax[i]) + '/' + (image_names[i]).split('/')[1])
Now that we have our classification script ready, we can turn our attention to the final step iv), writing a script to build a graphical user interface (GUI). This script should, on one hand, generate a dialogue that prompts the user to write down their specific input for the classification script (i.e. directory names, names of the parameter file for the classifier model), and, on the other hand, start the classification script once this input has been obtained. We will write this script again in Python but will make extensive use of the Tkinter package which was specifically designed for writing GUIs and comes with its proper synthax. The basic idea behind the Tkinter logic is that we are writing lines of code that are executed but interrupted by demand for user input, with the interruption only being resolved by the user pushing a button or performing a similar action, typically after having entered a written input into an input field. We will ask for four items of user input: a) the directory containing the dummy_class folder of images to be classified, b) the directory containing the parameter file for the classifier model (a .h5 file), c) the name of that file and d) the number of classes. Items (a) and (b) will be determined by letting the user utilize a file browser to click on the directory name, while items (c) and (d) will be written directly into a text field by the user. The input for items (c) and (d) will be confirmed by the user by clicking a green button, and the start of the execution of the classification script (the final step of the GUI process) will also be confirmed by the user via clicking a button.
Let us now go through the programming of the GUI script step by step. First, we import the contents of the tkinter package, so we have all the functions for building the GUI available.
from tkinter import filedialog from tkinter import *
Next, we will write five functions that recursively reference each-other. We start with the function that is called first, which is here called clicked_0 (because we assume that the GUI will be started as a program from the file browser via a double-click). This function first sets up a GUI window. We use the so-called global declaration many times throughout writing the functions in order to make the objects defined within the functions (like the window) accessible to all the other functions. We define the size of the window (here 700x200 pixels) and give it a title. Next we define a folder-selection request using the Tkinter function filedialog.askdirectory(). This function, when called, will open an additional window prompting the user to navigate through the files tree on their computer and select a folder containing images to be classified. We provide two arguments here: The title for this extra window (here an explanation to the user of what is being asked for) and the so-called parent GUI object, which here is the original window defined before (right now, we have not done much with that window, but we will do so later on). We transform the output of the folder-selection request into a character string and assign it to a global object named "sample_dir". This makes the output accessible to the classification script written above, so that script knows where to look for the images to be classified. We then call the funtion clicked_1, which defines the next step in the user-GUI dialogue and which we will look at next. Finally, we call the functional attribute window.mainloop() (which is more or less just a technicality).
def clicked_0(): if 'window' not in globals(): global window window = Tk() window.geometry('700x200') window.title = 'Classification_GUI' global sample_dir sample_dir = str(filedialog.askdirectory(title='Select folder of images to be classified', parent = window))+'/' clicked_1() window.mainloop()
The second function, clicked_1, is similarly structured as clicked_0. Another folder-selection request is invoked using the Tkinter function filedialog.askdirectory(). This time, the path to the parameter (or weights) file is requested, from which the classifier script will load the fitted parameters of the classifier model. Again, the user will select the folder containing the parameter file via a file-browser window opened by the filedialog function. The clicked_1 function then calls the clicked_2 function, which continues the GUI-user dialogue.
def clicked_1(): global weights_dir weights_dir = str(filedialog.askdirectory(title='Select model-weights directory', parent = window))+'/' clicked_2()
The third function, clicked_2, asks the user to supply the name of the parameter (weights) file to be used by the classifier model. Here, we don't work with a folder request function, but will instead add to the original Tkinter window: We define a "label" (using the Tkinter function Label()), which will be a line of text written into the window that explains to the user what type of input is asked of them. Then, we define an entry field (using the Tkinter function Entry()). This function will generate an input field like the ones you are familar with from online search engines or online order- or registry forms. Using the function' arguments we define the width and background color of the entry field and also the data type as which the user input will be interpreted (here the type is character string). We assign both the label and the input field to an object name each so we can adjust their positioning in the window (using the grid() functional attribute). In the case of the input field, it will also enable us to actually extract the user input in the next function. Finally we define a button (using the Tkinter function Button()), which, when clicked by the user, envokes the next GUI function, clicked_3. The idea behind using the button is that the user can determine when to proceed with the GUI-user dialogue, which can be important when there is need to ascertain the correctness of the input.
def clicked_2(): global lbl_0 lbl_0 = Label(window, text = 'Enter model weights file name') lbl_0.grid(column = 0, row = 0) global txt_0 txt_0 = Entry(window, width = 40, bg = 'orange', textvariable = StringVar()) txt_0.grid(column = 1, row = 1) global btn_0 btn_0 = Button(window, text = 'Confirm', bg = 'green', command = clicked_3) btn_0.grid(column=1, row=2)
The fourth function, clicked_3, asks the user to state the number of classes expected (this enables the set-up of classifier-model architectures for a variety of classification tasks, which can be conducted provided that matching parameter sets are available). The structure of this function almost exactly matches that of the previous function, clicked_2, but a few initial lines of code are added that remove (here called "destroy") the label and button from before (using their .destroy() functional attribute). This is done in order to avoid overlapping text and buttons in the GUI window, which would create an unpleasant and confusing look. The new button calls the function clicked_4, which is described next.
def clicked_3(): btn_0.destroy() lbl_0.destroy() global weights_name weights_name = str(txt_0.get()) txt_0.destroy() global lbl_1 lbl_1 = Label(window, text = 'State number of classes') lbl_1.grid(column = 0, row = 0) global txt_1 txt_1 = Entry(window, width = 40, bg = 'orange', textvariable = StringVar()) txt_1.grid(column = 1, row = 1) global btn_1 btn_1 = Button(window, text = 'Set', bg = 'green', command = clicked_4) btn_1.grid(column=1, row=2)
The second-to-final function, clicked_4, which is actually the last function serving the user-GUI dialogue, asks the user to confirm that the classification process can be started. This is a useful feature enabling the user to consider the correctness of the input made, in case wrong input could lead to undesirable side effects, e.g. excessively long running time or classification of an already classified set of images. In the present simplistic classification, the user would have to close the GUI and start from the beginning to correct the input. As with the previous function, we first "destroy" the window label and button to make space for a new label and button. Then, we extract the number of classes from the input field set up in the previous function (using the .get() functional attribute of the input field). Next we create a new label asking the user to confirm the starting of the classification procedure. And finally we define a button that when clicked calls the final function defined in this script.
def clicked_4(): btn_1.destroy() lbl_1.destroy() global n_digits n_digits = str(txt_1.get()) txt_1.destroy() global lbl_2 lbl_2 = Label(window, text = 'Start classification. Application will close \n automatically once all images are classified.') lbl_2.grid(column = 0, row = 0) global btn_2 btn_2 = Button(window, text = 'Confirm', bg = 'green', command = clicked_5) btn_2.grid(column=1, row=2)
The final function, clicked_5(), executes the classification script described above, using the functional combination exec(open("script.py").read()). The execution of the classification script should not lead to errors, since all the required input has been provided by the user in the context of the previous user-GUI dialogue (as set up by the functions clicked_0() to clicked_4()). Only if the input variables do not make sense in the context of the classification script, e.g. because the user has misunderstood the input asked for, will the classification process be aborted (the GUI window may have to be closed manually in such an event). It is therefore important to be clear in the formulation of labels displayed in the GUI window, or to provide a user-manual document along the GUI. In case the job of running the classification script finishes normally, the GUI window will be "destroyed" by invoking its .destroy() functional attribute.
def clicked_5(): btn_2.destroy() lbl_2.destroy() exec(open("Classify.py").read()) print('Classification finished') window.destroy()
The script structure should now look like this: i) import of the Tkinter package functions, ii) definition of the functions clicked_0 to clicked_5. In the final line, we need to invoke the function clicked_0 to start the GUI, which we do by simply calling it, i.e. by writing clicked_zero(). As the functions recursively invoke each-other, we do not need to write anything else. The frequent use of the global statement in the functions ensures that all variables set in the function context (i.e. user input) do not only exist there but also in the global context of the Python session and are thus available to the classification script. This style of coding is sometimes considered somewhat unelegant, but serves its purpose very well here.
# package import # function definitions clicked_0()
Now, we do want to be able to run the GUI without needing to open an editor like Spyder, or without having to launch a Python environment in a terminal window and then calling the GUI script via a bash command (bash is a programming language used to implement operating-system commands on Linux computers). Instead, we ideally want to be able to start the GUI by simply double-clicking on the file name of the GUI script (similar to how many programs are started on modern computers). To this end, it is important that the script knows the version of Python that it should be run with, and where that version is located on the computer. Depending on which Python editor you use, it is thus necessary to modify or add the so-called shebang line, which should be the very first line in the script (no empty line should preceed it). When writing Python scripts with the Spyder editor, this line is automatically added, but needs to be modified here. The shebang line starts with the symbol combination #!. This should be followed by the location of the Python version that the GUI scripts should be run with. This could, for example, be the location of a Python working environment, e.g. a conda environment, that also contains all the required packages (e.g. Tensorflow). These packages will be easily found when the correct Python location is given in the shebang line. The final line could thus look like this: #!/home/username/gui_env/bin/python. Of course, the synthax of the path will greatly depend on your operating system, and on the type of Python- / working-environment installation you are using. Some knowledge of program architecture is required to determined the correct path on your computer.
#!/home/username/gui_env/bin/python # package import # function definitions clicked_0()
Finally, we need to make the GUI script executable, i.e. allow the computer to run the script as a program. This can be easily done in Linux by right-clicking on the script file name, going to "Properties" and then "Permissions", and check-marking the line Allow executing as program. Depending on the file browser you use, you may in addition need to give it the permission to run any text file as a program (in the Nautilus file browser on Ubuntu, it can be necessary). This can be done in the file-browser settings menu.
And thus we are done with writing the GUI! As a recap, we have i) donwloaded and organized the training data (and set aside a trial sample), ii) fitted a classification model on the training data, and saved it to disk, iii) written a classification script that depends on user input and classifies and sorts images provided by the user, and iv) written a script performing the user-program dialogue via graphical user interface for obtaining user input and handing it to the classification script. Below you find a link to the final version of each script: