Ex Data, Scientia
Welcome to the exciting world of data science!
Here, you will find information on topics covering data analysis, programming, statistics and visualization.
The articles will primarily concern the ideas behind the various concepts, and their application.
The mathematical side will be treated less intensely, so don't expect to find a lot of formulas or
text-book-conform definitions. Instead, this site is primarily intended to be a fun "playground" to share
all kinds of astonishing insights, in the hope that they may be of interest or use to the reader.
This website is not complete from the start. Rather, articles will be added over time, to hopefully become
a comprehensive resource for applied data science at some point. This also means that the sequence in which
articles are released follows a tutorial-like schedule. Some articles will require the reader to have more
pre-existing knowledge than others. Articles published later may explain a topic only briefly addressed in a
So, without further ado, grab a cup of coffee, lean back and enjoy browsing the site. And don't forget to come
back regularly to find new exciting insights!
12-11-2022: Support-vector machine for data classification
The support-vector machine is a common algorithm for the classification of data. Here, we take a closer look at the properties of a linear binary support-vector machine.
30-10-2022: Building classification- and regression trees from scratch
Classification- and regression trees are a common technique for "mining" complex data sets for information. Here, in order to shine some light on these often blackbox-like algorithms, we have a thorough look at some custom-written trees.
24-10-2022: Fitting numerical models with Template Model Builder
Fitting the parameters of a complex numerical model can be a daunting task. The Template Model Builder (TMB) package for R is designed for optimizing the parameters in an efficient way. Here, we will look at its impementation on an easy artificial example.
27-03-2022: Working with expressions in R
Working with expressions in R can be a powerful tool when the application of loops or functions is not useful. Here, we will explore the usage of expressions in an easy-to-follow example.
13-02-2022: Fitting non-linear regression models
Fitting non-linear regression models can be quite a daunting task from a programming perspective, especially when their complexity increases. Here, we are going to look at four methods of fitting such models.
06-02-2022: Building a deep neural network from scratch
Deep neural networks for classification or regression are typically constructed via the Keras API in Python, which is so user-friendly that it is essentially a blackbox. Here we will look at a much less opaque approach using the deriv() function in R.
01-01-2022: Hierarchical clustering
Hierarchical clustering is an attractive method for assigning data to multiple clusters simultaneously, and thereby overcomes constraints posed by more traditional approaches.
01-01-2022: The domains of data science
Data science is an umbrella term for several domains of analytical or prediction-oriented techniques whose common grounds may not be immediately visible. Here, we are going to take a broad look at these domains and their relationships.
01-01-2022: 3D plots with rgl
Three-dimensional (3D) plots have a bad standing in scientific literature due to the difficulty of their interpretation, but can be useful to visualize complex relationships in an educational context. Here, we look at creating 3D surface plots in R.
02-09-2021: E-Learning with swirl and swirlify
E-Learning has been an important cornerstone in teaching programs on programming languages and staistics, not just since the Covid pandemic. Here, we are going to look at how to design e-learning lessons with the swirlify package in R.
14-08-2021: Preventing RStudio from Freezing
Sometimes, you may encounter a situation in which you open the IDE RStudio with scripts still opened, and it becomes unresponsive immediately. Read on to find out how to solve this issue!
13-06-2021: Analyzing and Visualizing Classifier Predictions - Step by Step
Designing and training a Deep Neural Network is one part in the process of developing a classifier application. However, it is also important to visualize its performance to judge its quality.
24-05-2021: Implementing a Deep Neural Network in Keras - Step by Step
Deep Neural Networks are on the way to dominate the field of Machine Learning, seeing increased use in classification, regression and optimization tasks. Their implementation might appear as a mystery to some, yet the implmentation in the Keras API is actually fairly straightforward.
30-03-2021: Convolutional Neural Networks
Convolutional Neural Networks (CNNs) are today's gold standard for image classification and Machine Vision in general. By simulating the procedures in which visual input is processed in the human brain, CNNs often outperfrom traditional Deep Neural Networks.
16-03-2021: Using Reticulate for R-Python interaction
The programming languages R and Python have very complimentary strengths and weaknesses. Integrating the functions of both languages for working on a specific task can thus be a beneficial venture, and is enabled through the R package reticulate.
09-03-2021: Dimensionality of Data vs Structure of Data
When starting to work with complex data like images, it is often not easy to recognize the dimensionality of the data, and the structure of the data, and to tell apart one from the other.
07-03-2021: Cluster Analysis with Auto-Encoders
While cluster analysis has traditionally been implemented with relatively simple algorithms like K-Means and Expectation-Maximization, the relatively recent emergence of Deep Neural Networks in applied data science has brought a new, more complex method to the field: the auto-encoder.
28-02-2021: Data Investigation with Kernel-Density Estimation
Kernel-density estimation (KDE) is a methodology to detect patterns in (often multi-variate) data without imposing the constraint of pre-defining the existence of a certain number of clusters. Basically speaking, KDE tries to detect "commonness" in the data.
28-02-2021: Clustering with the Expectation-Maximization Algorithm
Expectation-Maximization (EM) is a common clustering algorithm based on probability-density calculations. It is a common alernative to the K-means clustering algorithm
28-02-2021: How to get started with Python
Finding the right entry-way into programming Python is not as straghtforward as one might think.
There are a number of tricks that make working with Python really convenient, though.
27-02-2021: Customize your computer with bash scripts
Bash scripts − that is, scripts bearing the file-name ending ".sh" offer a convenient
way of writing executable protocols or even customizing your computer to your needs.
25-02-2021: Three ways of implementing a loop
Loops are an essential part of many programming applications, from simple file-operation algorithms
to complex numerical models. While inefficient, some operations clearly depend on the use of loops.
21-02-2021: K-means clustering
K-Means clustering is one of the most intuitive clustering techniques due to the simplicity
and elegance of its design. ...read on
Ex Data Scientia − what does that actually mean?
Ex data, scientia is Latin and translates to "from the data, knowledge"
(to be fair, the case form "data" is likely not correct in Latin grammar,
but the term "data science" is so common today that a different formulation
would have been less understandable to non-Latin speakers). Essentially, it
means that we can discover a whole lot of information by just analyzing data
in the right ways. This can reduce the amount to data to be gathered to gain insight,
i.e. by research surveys, or open up entire new business fields, as in the branch of Machine Vision.