Life After Vectum - An alumni article by Herman Sontrop December 23rd, 2017

My name is Herman Sontrop, I was born in Nijmegen in 1975 and I studied Operations Research at the University of Maastricht. A few weeks ago, I bumped into one of my old math teachers i.e. Hans de Graaff, who at the time taught me the fundamentals of analysis and linear algebra. We had an invigorating talk about how we’re both doing after such a long time, after which Hans asked me to write a piece on how studying econometrics was beneficial to my future career, hence this article. 

In short, studying econometrics had a massive contribution to my career and my personal development, which resulted in many solid job opportunities and lots of possibilities for future development. Even though my experience turned out great, it wasn’t always smooth sailing when I began my studies. 

I started studying econometrics in late 1996. At the time, Microsoft Windows 95 was just released, Apple was close to being dead, Steve Jobs was still alive and just re-joined Apple, Google was a start-up and Facebook didn’t even exist!

My first-year experience was quite different than I expected. Back then, the university had only a small computer room, which was completely packed with students. Especially for econometrics students, the performance difference between students was large, with many dropouts. I remember my first year as quite intimidating and challenging. After a year and a half of struggling and not doing so well, I decided econometrics was not for me and I quit, only to pick it up a few years later. The second half of my studies surely was much more pleasant and enjoyable. 

During my absent years I gained more confidence in life in general and about my skills and decided to give econometrics a second chance. My motivation to do well this time around was much greater and consequently my grades improved substantially. I especially liked my final year in which I could put much of the theory I learned into practice. 

During my master thesis, I worked at CQM, a small firm based in Eindhoven that creates software for solving complex scheduling problems. There I worked on creating advanced heuristics to tackle vehicle routing problems. In mid-2005, I graduated based on my thesis titled ‘Fast Ejection Chain Algorithms for Vehicle Routing with Time Windows’, under supervision of prof. dr. Stan van Hoesel and prof. dr. Marc Uetz, who later moved to the University of Twente.  Ejection chains, by the way, are methods that in order to solve hard discrete optimization problems, move to intermediate infeasible solutions first in order to have access to more flexible, efficient neighbourhood structures during local search, which are made feasible later, ejecting out the infeasibility so to speak, hence the name. 

During my time at CQM I met prof. dr. Emile Aarts, the current rector magnificus of the Tilburg University. At the time, Emile was both an external advisor for CQM and vice director of scientific research at Philips Research in Eindhoven. Based on my work at CQM, he offered me a job in a bioinformatics start-up group at Philips Research, which I accepted. In the next 6 years, I worked at the high tech campus mostly on gene expression breast cancer prediction problems. 

Gene expression measurements essentially attempt to measure how active genes are in your body. Based on the human genome project, in the early 2000s, a new type of measurement technology i.e. microarrays, burst on the scene. For the first time, this technology allowed biologists to measure the activity of all genes simultaneously, instead of one by one, which was way more time consuming and restrictive. Back then it was believed that this type of data would allow for unparalleled breakthroughs in solving diseases and that within a decade, diseases like cancer would be solved. Unfortunately, as we all know, at present, a disease like cancer is still far from being solved.

In our group, based on gene expression measurements extracted from a primary tumour, we tried to predict if a person would develop a metastasis. In other words, if the cancer of a patient who already was diagnosed with breast cancer would spread to other parts of the body. For various reasons, it turned out that this type of prediction problem was difficult to solve. 

For one, obtaining high quality gene expression data is extremely challenging. The measurement process is very complex and before data can be sensibly compared between samples, data must be strongly pre-processed and undergoes numerous systematic transformations and corrections based on complex models. Unfortunately, the specific choice for a processing method often directly influences subsequent analysis. 

Second, in these scenarios, the number of observations (n) is often much smaller than the number of variables (p) i.e. the number of genes. This poses a variety of statistical challenges, mostly dealing with the potential to overfit a predictor. When a predictor is overfit, it performs very well on training data, yet does not validate on unseen data i.e. it generalises poorly. It turns out that when p >> n, you can always construct a perfect predictor on training data. In such scenarios, proper evaluation schemes are paramount.
In late 2009, I joined forces with the TU Delft and the Academic Medical Center (AMC) to obtain a deeper understanding of the complexities of this type of data and how to evaluate it. This work formed the basis of my PhD dissertation ‘A critical perspective on microarray breast cancer gene expression profiling’, which I completed in early 2015 under supervision of prof. dr. Marcel Reinders (TU Delft) and dr. Perry Moerland (AMC).

During this work I heavily used R, an open source statistical computing framework. While I was still doing my PhD I often worried whether these skills would come in handy one day. Little did I know that it was. Somewhere early 2012 I was looking on a job site and I noticed an Utrecht based company called FRISS, asking for people with R skills. FRISS specializes in detecting fraud, identifying risk and securing compliance for insurance firms. After a successful job interview I joined FRISS as a consultant in 2012, while simultaneously still working on my PhD dissertation, mainly in Amsterdam.

In the same year, a new web platform for R launched, called Shiny. In short, Shiny enables you to develop advanced interactive analytic web applications, without having a deep understanding of building web pages per se. It was during this time when I discovered that the combination of doing data science, visualization and web building would become my new passion. After my initial introduction into building web pages, I spend much time improving my skillset and became quite good at it.

At present, I’m a senior data scientist, the product owner and team leader of the analytics department at FRISS. In my daily work with my team we work hard to bring the best of web visualization, web design and machine learning to the web products FRISS offers. 

Our latest mission is to bring more Artificial Intelligence (AI) into the hands of insurers such that insurance can become more honest and less impacted by fraudsters. Doing AI properly, getting clean data and picking the right techniques is quite hard at times. Separating fact from fiction and knowing the boundaries of what’s possible and what’s not is equally important. Even more difficult is to provide insight and reason to those without a solid data science background.

However, perhaps the best skill I learned while being at the university is the ability to decompose a big problem that I don’t yet fully understand into ever smaller blocks that I eventually can work out and learn to understand, which ultimately helps me to understand the bigger problem I started with. Studying econometrics surely helped me to hone those skills and I can recommend it to anyone!