Sign in

Data Scientist, developer, teacher and writer. Author of "Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide"

Update (May 18th, 2021): Today I’ve finished my book: Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide.


PyTorch is the fastest growing Deep Learning framework and it is also used by in its MOOC, Deep Learning for Coders and its library.

PyTorch is also very pythonic, meaning, it feels more natural to use it if you already are a Python developer.

Besides, using PyTorch may even improve your health, according to Andrej Karpathy :-)


There are many many PyTorch tutorials around and its documentation is quite complete and extensive. So, why should you keep reading this step-by-step tutorial?


As I write these lines, I'm a few days away from releasing the last two chapters of my book, "Deep Learning with PyTorch Step-by-Step: A Beginner's Guide". The first two chapters were released ten months ago, in July 2020, and I've been releasing one or two chapters every month since then.

In this post, I will tell you the whole story: how it all started, how I've managed to get this far, which tools and services I used, and how it's been working for me.

Buckle up, because it's one heck of a ride :-)

How It All Started

January 2018: back then, I…

The content of this post is a partial reproduction of a chapter from the book: Deep Learning with PyTorch Step-by-Step: A Beginner’s Guide”.


What do gradient descent, the learning rate, and feature scaling have in common? Let's see…

Every time we train a deep learning model, or any neural network for that matter, we're using gradient descent (with backpropagation). We use it to minimize a loss by updating the parameters/weights of the model.

The parameter update depends on two values: a gradient and a learning rate. The learning rate gives you control of how big (or small) the updates are…


HandySpark is a Python package designed to improve PySpark user experience, especially when it comes to exploratory data analysis, including visualization capabilities and, now, extended evaluation metrics for binary classifiers.

Try it yourself using Google Colab:

Check the repository:


In my previous post, I introduced HandySpark, a package for PySpark I developed to help closing the gap between pandas and Spark dataframes.

Today, I am pleased to announce the release of a new version which, not only solves some performance issues with stratified operations (which should be several times faster now!), but also makes evaluating binary classifiers much easier.

A Binary Classification Task


Através da comida?!

Isso mesmo, através da comida! :-)

Imagine que você está pedindo uma pizza e, em pouco tempo, você recebe aquela pizza linda, quentinha e deliciosa entregue na sua casa.

Alguma vez você já pensou em todos os passos envolvidos no processo que termina com a pizza entregue em sua casa? Tipo, o processo todo mesmo, desde a semeadura dos tomates até o entregador apitando no interfone! No fim das contas, não é tão diferente de um projeto de Machine Learning.

Sério mesmo! Confere aí!

Esse artigo foi inspirado por uma apresentação de , Cassie Kozyrkov, Chief Decision Scientist do Google…

Through food?!

Yes, you got that right, through food! :-)

Imagine yourself ordering a pizza and, after a short while, getting that nice, warm and delicious pizza delivered to your home.

Have you ever wondered the workflow behind getting such a pizza delivered to your home? I mean, the full workflow, from the sowing of tomato seeds to the bike rider buzzing at your door! It turns out, it is not so different from a Machine Learning workflow.

Really! Let’s check it out!

This post draws inspiration from a talk given by Cassie Kozyrkov, Chief Decision Scientist at Google, at the Data…


Se você já treinou um classificador binário, você provavelmente usou a função de custo “binary cross-entropy” (entropia cruzada) ou “log loss”.

Mas você já pensou a respeito do que essa função de custo realmente significa? Hoje em dia, as bibliotecas e frameworks são tão fáceis de usar, que costumamos ignorar o verdadeiro significado da função de custo utilizada.


Eu estava em busca de um artigo que explicasse de uma maneira clara, visual e breve os conceitos por trás da “binary cross-entropy” ou “log loss”, para mostrar aos meus alunos do Data Science Retreat. …


If you are training a binary classifier, chances are you are using binary cross-entropy / log loss as your loss function.

Have you ever thought about what exactly does it mean to use this loss function? The thing is, given the ease of use of today’s libraries and frameworks, it is very easy to overlook the true meaning of the loss function used.


I was looking for a blog post that would explain the concepts behind binary cross-entropy / log loss in a visually clear and concise manner, so I could show it to my students at Data Science Retreat


HandySpark is a new Python package designed to improve PySpark user experience, especially when it comes to exploratory data analysis, including visualization capabilities.

Update (March 9th, 2019): version 0.2.0 was released today, including performance improvements in stratified operations and an extended version of BinaryClassificationMetrics— for more details, please check release notes here.

Try it yourself using Google Colab:

Check the repository:


Apache Spark is the most popular cluster computing framework. It is listed as a required skill by about 30% of job listings (link).

The majority of Data Scientists uses Python and Pandas, the de facto standard for manipulating data…


This is the second post of my series on hyper-parameters. In this post, I will show you the importance of properly initializing the weights of your deep neural network. We will start with a naive initialization scheme and work out its issues, like the vanishing / exploding gradients, till we (re)discover two popular initialization schemes: Xavier / Glorot and He.

I am assuming you’re already familiar with some key concepts (Z-values, activation functions and its gradients) which I covered on my first post of this series.

The plots illustrating this post were generated using my package, DeepReplay, which you can…

Daniel Godoy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store