2. Setup your enviroment#

Author: Tue Nguyen

2.1. Outline#

  • Anaconda

  • Git

  • Create a workspace

2.2. Overview#

Before you can write your first Python code, you need to install necessary software

  • There are several ways to start working on a Python project

  • You can choose to install pure Python and use its IDLE interface to write and run your code

  • However, a better alternative is to use Anaconda and Git Bash.

2.3. Anaconda#

2.3.1. What is Anaconda?#

Anaconda is a distribution of Python and R for data science and scientific computing. The standard installation of Anaconda consists of

  • 250+ popular packages

  • A package manager

  • An environment manager

  • Jupyter Lab - a powerful IDE for coding

2.3.2. Why Anaconda?#

Anaconda makes it much easier for you to work on your data science project

  • Often, your project will use different packages, many of which already come with Anaconda’s standard installation

  • You can also use Anaconda to install additional packages from a repository of 7,500+ open source packages

  • Anaconda’s package manager will take care of all the dependencies to make sure everything is compatible with each other

  • You can also create multiple environments with different configurations for different projects you want.

2.3.3. Download Anaconda#

  • Use Google with the keyword “download Anaconda”

  • For now, you can download Anaconda at https://www.anaconda.com

  • Remember to choose the appropriate installer for your operating system

2.3.4. Install Anaconda#

  • Just follow the instructions but pay attention to the Advanced installation options stage

  • You should select both checkboxes as show in the picture below (although the first one says not recommended)

2.4. Git#

2.4.1. What is Git?#

  • Git is a version control system used to track modifications to a source code repository

  • However, it’s not the reason we install it here

  • We install Git to get Git Bash, a great command-line app that makes launching Jupyter Lab much easier

  • Note that only Windows users need to download and install Git (macOS and Linux have great native command-line apps already)

2.4.2. Download Git#

  • Use Google with the keyword “download Git”

  • For now, you can download Anaconda at https://git-scm.com

  • Remember to choose the appropriate installer for your operating system

2.4.3. Install Git#

  • Just follow the instructions and accept all default options

  • No special modification is needed

2.5. Double check installation#

  • Type git bash into the search window, and if you see Git Bash is available then it was installed correctly

  • Hit Enter to open Git Bash. You will see a black window with a blinking cursor. This window is the Git Bash terminal

  • Now type conda --version into the terminal, and hit Enter. If you see something like conda 4.10.0 printed out, then Anaconda was installed correctly

2.6. Create a workspace#

  • A workspace is just a folder/directory on your computer where you host stuff for a given project such as code files, data, output, documentation, etc

  • The workspace is sometimes called the root directory

  • Often, you will work on multiple projects, so it’s best to have a parent folder for them, for example, D:/ds_projects/

  • Go inside D:/ds_projects/ and create a folder example/ for your first project

  • You can organize your project like the following structure

example/
    |____ data/
    |____ nb/
    |____ lib/
    |____ out/
    |____ docs/

Here we have

  • data/ for your data files (Ex: CSV, Excel)

  • nb/ for your analytics notebooks (will learn later)

  • lib/ for your custom modules (will learn later)

  • out/ for output (Ex: exported Excel files, graphs)

  • docs/ for related documents (Ex: Word, PDF)

  • For simplicity, you only need the nb/ folder in this tutorial

  • The structure suggested above is a good guideline when you work on more complex projects in the future

2.7. Summary#

A good way to start a Python project is using Anaconda and Git Bash

Anaconda

  • Anaconda is a distribution of Python and R for data related tasks

  • Anaconda consists of

    • 250+ popular packages

    • A package manager

    • An environment manager

    • Jupyter Lab

Git

  • Git is a version control system used to track modifications to a source code repository

  • However, the reason we install Git is to get Git Bash, a great command-line app that makes launching Jupyter Lab much easier

Workspace

  • A workspace is just a regular folder on your computer where you host stuff for a given project such as code files, data, output, documentation, etc.

  • Here is an example structure to organize your workspace

example/
    |____ data/
    |____ nb/
    |____ lib/
    |____ out/
    |____ docs/