Saturday, March 19, 2016

Getting started with Data Science : Python

There is lot of buzz about data science being a super cool stream. So, i decided to do a write up on how to get started with it. Without wasting much time, here you go with the details:

Step 1: Setting up your machine
Download Anaconda here and install it. I am sure this will be a pretty simple installation on any of the platform.

Step 2: Learn the basics of Python language

If you want to get started with data science in Python, you need to know atleast the basics of python. Atleast an hello world in python is sufficient to get started. You can get hang of the language later.

And trust me, Python is the most easiest langauge to learn. Python 2.7 will not be used post 2020 and hence its better you take things with Python 3.
Some referance to Python 3:
Free e-book on python:

Step 3: Learn Scientific libraries in Python – NumPy, Matplotlib and Pandas

Basics for exploratory data analysis and data handling:
Numpy - Array like structure for scientific computing
Pandas - Data structure to easily manipulate data.
Matplotlib - One of the famous Plotting library for SciPy stack.

Step 4: Learn Scikit-learn and Machine Learning

Here the fun part begins:
Scikit learn library provide most of the machine learning algorithms  for Python.
Web site:

Step 5: Practice, practice and Practice

I would rather say this is the most important step which can distinguish you from the rest.

You have to accept the fact that you wont turn into data scientist overnight, it will take time and dedication to become one.

There are frequent question like how much time it will take? 
The answer really depends on how much time you devote to this subject and how quickly you understand the concepts.

Fortunately there are many sites which can be used as testing ground for your machine learning skills like or which keep doing this awesome competitions, in which you can participate and test your knowledge

No comments:

Post a Comment