I prefer to structure my code the same way as an article, and if have academic background as well, you can relate. Hence, I usually start with preamble, where I put all the packages and toolkits I would like to use. Then, the main part follows and subsequently the rest (for example, where the result of the model should go, all the connections). Especially if you are getting started and learn python, I recommend to structure and comment your code as clear as possible. With Jupiter you have several options for headings and comments already implemented in the notebook. Clear structure and sufficient comments would help you to write the code which you can easily recap (and follow) in months or even years. My experience is that good codes are often (partly) recycled.
In this blog post, I summarized what I call, (1) general preamble and (2) visualization preamble. Although I post (3) my map and (4) analysis preamble as well, I will be more detailed about them in the forthcoming posts.
1. General preamble
import pandas as pd
import numpy as np
import datetime as dt
I need pandas and numpy always. Therefore, I always start with these two. I work often with time series. That’s why I need the third line. With datetime you can define the date, set an index on date and do all other date relevant manipulations.
2. Visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
These lines are the very general, with seaborn and matplotlib we will cover most of visualizations, of any kind.
Then, we can become more specific and define the style and context. Refer to seaborn page for details. I prefer ‘whitegrid’ and ‘talk’ and set them in the beginning as my personized default framework. Refer to the seaborn for further options.
sns.set_style(‘whitegrid’)
sns.set_context(‘talk’)
Furthermore, if you work for the company which have corporate identity with predefined colors, it is useful to create your own palette with company colors. The graphs would automatically get these colors (as long as you not overrule it). The corporate identity colors at my company are ordered in an agreed way. So, by the ordering in the list you define the order in which the colors are applied automatically.
mycolors = [‘#0076a7′,’#ae5b3a’,’#CD997C’,’#EDD195′] #list of colors, I prefer hex numbers for colors
sns.set_palette(mycolors) # define the palette
sns.palplot(sns.color_palette()) # display
3. Maps
Maps are special kind of visualization, they worth a separate blog post (forthcoming). You will find my preamble for them below.
import folium
from folium.plugins import MarkerCluster #if you want to cluster
from folium.plugins import MiniMap # cool minimap in the low right corner of the big map
from folium import plugins
from folium import FeatureGroup #if you customize
4. Analysis
I am time series econometrician. Therefore, my analysis preamble is very time series biased. You start time series analysis with tests to understand the autocorrelation structure of your variables. I would be more detailed on data analysis with python in my forthcoming blogs.
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller #Augmented Dickey-Fuller unit root test
from statsmodels.graphics.tsaplots import plot_acf #autocorrelation function
from statsmodels.graphics.tsaplots import plot_pacf #partial autocorrelation
from statsmodels.tsa.api import VAR, DynamicVAR #for time series analysis
You will find a lot useful explanations and relavant packages on statmodels webpage.