Not me fanboying over the HDFS filesystem

The purpose of this article is to provide a simple, working, step-by-step tutorial on how to test for fault tolerance on a distributed system by setting up a multi node Hadoop cluster as an example and examining the contents of its HDFS, simulated through Docker on a Mac using a publicly available Docker repository, followed by a short evaluation of how this Hadoop’s HDFS Fault Tolerance mechanism performs under the CAP Theorem. Phew.

Let’s break that statement down, shall we?

First, read through all the topics under the Relevant Concepts section. Feel free to refer to the further reading links…


A 4-person team NLP Pipeline proposal on the “Real or Not? NLP with Disaster Tweets” Kaggle Competition

“Real or Not? NLP with Disaster Tweets” Kaggle Competition Home Page

0. Introduction

Overview

This paper will be based on the Kaggle competition Real or Not? NLP with Disaster Tweets. It’s an introductory challenge to serve as practice for Natural Language Processing with focus on Text Classification. The competition creators gathered 10875 tweets that are reporting an emergency or some man-made/natural disaster — the selection process is left unspecified.


“Real or Not? NLP with Disaster Tweets” Kaggle Competition Home Page

0. Introduction

Overview

This paper will be based on the Kaggle competition Real or Not? NLP with Disaster Tweets. It’s an introductory challenge to serve as practice for Natural Language Processing with focus on Text Classification. The competition creators gathered 10875 tweets that are reporting an emergency or some man-made/natural disaster — the selection process is left unspecified.


Hint: It required a bit of effort from the class participants in terms of preparation, but required even more from me to prepare the class.

Creating an NLP pipeline on Python through the NLTK package for simple sentiment analysis of movie reviews

Screenshot of me teaching the class

There’s nothing that says “I know a lot about this topic” more than creating an entire class on it and receiving overall positive feedback from the class #humblebrag.

Here’s a breakdown of my lesson plan to teach an introductory class on NLP, with emphasis on practical application and long-term retention by getting the class to create an NLP pipeline. Keep in mind that this was tailored to fit my Practical Data Science Tutorial class for…


This is a stock image of a farmer overlaid with a generic “data” background. The intention is to evoke sentiments of manual labor and intentional nurturing essential to growing your precious data science skills.

You’re a fledgling data scientist finally getting into… what data science topic is it this time?

Image Classification. Ahh yes, how practical of you.

You’ve also heard of Kaggle countless times. This time, you actually want to join one of their competitions… so you scroll through the most recent challenges in Image Classification, and select a challenge that piques your interest.

Cancer detection. Ahh yes, how humanitarian of you.

Now seems like the time. Time to fatten your scrawny body of applicable data science skills. Time to get through an entire Kaggle challenge alone, perhaps with a team of friends…

Nathan Torento

An optimistic bro who happens to be really into data science and sharing love.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store