Knowledge distillation is a common way to train compressed models by transferring the knowledge learned from a large model into a smaller model. Today we’ll be taking a look at using knowledge distillation to train a model that screens for pneumonia in chest x-rays.

What is the point?

Let’s say you’re working in an area where privacy is extremely important, like healthcare. It may be the case that we cannot send patient data to the cloud where all our fancy GPU’s live. This creates the need to train a model that can be downloaded and run on a low power machine.

So… what are we working with?

Let’s use the…

Statistics is the science of answering questions through data. Statisticians try to be as objective as possible, by drawing conclusions solely from data, but in doing so, they often obscure the very question they are trying to answer.

The gender pay gap debate falls victim to this. Part of the reason why this debate is still on-going, despite the abundance of data that exists, is that statisticians can’t seem to agree on how to interpret the data.

Before going further, I’d like to say that I’m not taking a position on the gender wage gap. I just want to highlight…

Helpbot is a customer support chatbot powered by machine learning, which replaces the contact flow for guests trying to message Airbnb for support. We used Helpbot to provide intelligent help regarding COVID-19 questions, and developed workflows for our Extenuating Circumstances policy. Over 50% of users engaged with the COVID specific flow, which resulted in a significant number of customers being able to solve their problem instantly.

Guests rely on Airbnb every day to assist them through all stages of a trip. From finding a place to stay, all the way to checking out, and returning home safely.

At the same…

In past conferences, I always wish that someone attending would give an overview of a few interesting papers. Since I went this year, I thought I could give it a shot. This is basically a quick summary of papers that I still remember from EMNLP, and my understanding of what they cover. I wanted to get this written while its still fresh in my mind. The caveat is that I didn’t get a chance to thoroughly review them, so probably the overview is incorrect.

Phrase-Based & Neural Unsupervised Machine Translation

Won best paper. Admittedly, this talk went a little over…

Light Bulb is a tool to help you label, train, test and deploy machine learning models without any coding.

Go directly to the Github project here.

Lets say you want to build a photo-sharing app called Snapcat, that only allows users to send pictures of cats, and nothing else.


How would you go about starting this? It’ll probably looks something like this:

  1. Collect a large set of cat and not cat photos.
  2. Manually label the posts as cat or not cat.
  3. Split the dataset into train, test, and validation sets.
  4. Train some model (lets say a convolutional neural network) on…

Recently, some of my work involved working with knowledge graphs. I was somewhat surprised to discover how sparse resources were on working with knowledge graphs. Most of the literature was locked in research papers that are relatively inaccessible unless you have a fair amount of time on your hands.

What is a knowledge graph?

Simply put, a knowledge graph is a collection of facts, in the form of two entities and a relationship: (e1, r, e2). For instance, a representation of the concept that “Tom Cruise acted in Mission Impossible” would be represented as:

(“Tom Cruise”, acted_in, “Mission Impossible”)

Here is an example knowledge graph…

In the last post, we built a framework that can define a computation graph, and perform a forward pass. In this post, we will work on the core of most deep learning frameworks: the backwards pass.

Working off the code from the last post, every type of node knew how to compute its own value by evaluating the computation graph, now we just have to teach each one how to pass gradients backwards. Lets think about how this could work by revisiting our computational graph for our linear model:

Our goal is to find the value of ∂Error / ∂w

Somehow I found myself reading through the relationships between the various countries and factions that occupy the Middle East. I quickly found there were too many different relationships to understand in an afternoon of reading.

I came across this article by Slate that contained this helpful graphic:

Even this simplified graphic is difficult to understand, but one thing that is noticeable is that there are clear factions. Israel and the United State for instance, have very similar relationships, which makes sense. Al-Qaida and ISIS also have similar relationships, which also makes sense.

I figured we can distill this information to…

Because back propagation gives us a generic learning algorithm, a large part of deep learning can be distilled down to the loss function used to represent the objective. Lets take a look at a few:

Generative Adversarial Networks

Images generated with GAN’s

Loss Function:

Chris Zhu

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store