Getting Started

Might have a minimal bias towards Deep Learning

Have you ever asked yourself where you currently are on your Machine Learning journey? And what’s there that you can still learn about?

This checklist helps you answer such questions. It provides an outline of the field, divided into three broad levels: Entry level (where everybody starts), intermediate level (where you quickly get to), and advanced level (where you stay for a long time). It does not list specific courses or software but focuses on the general concepts:

An overview of the checklist, image by the author. It’s available on Notion here and on GitHub here.

Let’s cover the levels in more detail, starting with the entry level.

Entry level


Examine the effect of augmentations in your browser

When working with image data, practitioners often use augmentations. Augmentations are techniques that artificially and randomly alter the data to increase diversity. Applying such transformations to the training data makes the model more robust. For image data, frequently used candidates are rotating, resizing, or blurring. The effects of the transformations are easy to see and comprehend. Even multiple augmentations can be grasped quickly, as the following example shows:

Exemplary augmentations. Image created by the author after a TensorFlow tutorial.

Such augmentations are not restricted to images (though they are pretty popular there). For audio data, there are similar ways to modify the training data. The downside is that one cannot observe…


Getting Started

A guide to the evolution of generative networks

Photo by Anders Jildén on Unsplash.

Introduction

In short, the core idea behind generative networks is capturing the underlying distribution of the data. This distribution can not be observed directly, but has to be approximately inferred from the training data. Over the years, many techniques have emerged that aim to generate data similar to the input samples.

This post intends to give you an overview of the evolution, beginning with the AutoEncoder, describing its descendant, the Variational AutoEncoder, then taking a look at the GAN, and ending with the CycleGAN as an extension to the GAN setting.

AutoEncoders

Early deep generative approaches used AutoEncoders [1]. These networks aim…


Steps you can use to pass, too

After preparing for a long time, I recently took and passed the TensorFlow Developer Certificate exam. This exam tests your proficiency in using TensorFlow for image, time-series, and text data. Beyond these domains, it also covers strategies to reduce overfitting, such as augmentations and dropout layers.

Photo by Lewis Keegan on Unsplash

Why should you get certified?

There are two reasons why you should attempt the exam. First, getting this certificate is a great incentive to learn TensorFlow. Secondly, it’s also an excellent opportunity to certify and showcase your skills.

If you do not have any previous experience with Machine Learning, then it might be better to learn about it first…


Use this template to write custom TensorFlow algorithms quickly

Custom training loops offer great flexibility. You can quickly add new functionality and gain deep insight into how your algorithm works under the hood. However, setting up custom algorithms over and over is tedious. The general layout often is the same; it’s only tiny parts that change.

This is where the following template comes into play: It outlines a custom and distributed training loop. All places that you have to modify to fit your task are highlighted with TODO notes.

Photo by Chris Ried on Unsplash

The general layout of custom distributed loops

A custom training loop — as opposed to calling model.fit() — is a mechanism that iterates over the datasets, updates…


Create a simple GUI to browse large datasets

Image datasets can be explored easily. Even if we have hundreds of images, we can scroll through directories and glance at the data. This way, we can quickly notice interesting properties: Colours, locations, time of the day. However, would we use the same strategy for audio datasets, we would not get far. Instead, we had to listen or skip through a single file at a time. For a large number of samples, this approach is prohibitive.

A solution is to visualize the audio files. Rather than listening to the samples sequentially, we plot different characteristics. …


From augmentation to hyperparameter selection

Training neural networks is a complex procedure. Many variables work with each other, and often it’s unclear what works.

The following selection of tips aims to make things easier for you. It’s not a must-do list but should be seen as an inspiration. You know the task at hand and can thus best select from the following techniques. They cover a wide area: from augmentation to selecting hyperparameters; many topics are touched upon. Use this selection as a starting point for future research.

An overview of the techniques. The list is available on Notion here. Image by the author

Overfit a single batch

Use this technique to test your network’s capacity. First, take a single data batch, and make sure…


It is trillion parameters, actually.

Human language is ambiguous. Speaking (or writing), we convey the individual words, tone, humour, metaphors, and many more linguistic characteristics. For computers, such properties are hard to detect in the first place and even more challenging to understand in the second place. Several tasks have emerged to address these challenges:

  • Classification: This task aims to classify text into one or more of several predefined categories
  • Speech recognition and speech-to-text: These tasks deal with detecting speech in audio signals and transcribing it into textual form
  • Sentiment analysis: In this task, the sentiment of the text is determined
  • Natural language generation: This…


Use online courses to create your curriculum

The internet is full of courses and offers many learning materials. You can use these resources to replicate the curriculum of an ML Master’s degree.

Photo by MD Duran on Unsplash

A Bachelor’s study usually takes six semesters; a Master’s study takes four. But this is only an outline. I’ve witnessed people doing their BA in three semesters and some taking nine semesters. Sometimes there are so many exciting courses that you voluntarily stay longer to learn it all. Therefore, I’ve loosely structured the recreated curriculum into four semesters. The primary focus is on Machine Learning and Deep Learning.

First semester

Artificial Intelligence

As the first course in your curriculum…


From entry to expert level

The field of Machine Learning is huge. You can easily be overwhelmed by the amount of information out there. To not get lost, the following list helps you estimate where you are. It provides an outline of the vast Deep Learning space and does not emphasize certain resources. Where appropriate, I have included clues to help you orientate.

An excerpt of the list, by the author. The list is available at GitHub here and on Notion here.

Since the list has gotten rather long, I have included an excerpt above; the full list is at the bottom of this post.

Entry level

The entry-level is split into 5 categories:

  • Data handling introduces you to small datasets
  • Classic Machine Learning covers key…

Pascal Janetzky

CS student.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store