Udacity Software Engineering

Home Page


References

Udacity Data Scientist Nanodegree


Introduction


Corse Overview

How this Course is Organized

  • Software Engineering Practices Part 1 covers how to write well documented, modularized code.
  • Software Engineering Practices Part 2 discusses testing your code and logging.
  • Introduction to Object-Oriented Programming gives you an overview of this programming style and prepares you to write your own Python package.
  • Introduction to Web Development covers building a web application data dashboard.

Course Portfolio Exercises

The software engineering course has two portfolio exercises: building a Python package and developing a web data dashboard. These exercises are NOT reviewed and are NOT required to graduate from the data scientist nanodegree program. In other words, you will not submit either of the portfolio projects to the Udacity review system. Instead, you can use these projects to practice your software engineering skills and then add the projects to your professional portfolio.

Having said that, the skills covered in this course will set you up for success in other Udacity courses with required projects. For example, the data engineering for data scientists course has a required project where you are expected to write clean, concise and well-documented code. You will also have an easier time with that project if you understand the fundamentals of object-oriented programming and a basic understanding of how the backend and frontend of a website works.


Software Engineering Practices, Part I

Python Tutorial

Introduction

In this lesson, you’ll learn about the following software engineering practices and how they apply in data science.

  • Writing clean and modular code
  • Writing efficient code
  • Code refactoring
  • Adding meaningful documentation
  • Using version control

In the lesson following this one (part 2), you’ll also learn about the following software engineering practices:

  • Testing
  • Logging
  • Code reviews

Clean and Modular Code

  • Production code: Software running on production servers to handle live users and data of the intended audience. Note that this is different from production-quality code, which describes code that meets expectations for production in reliability, efficiency, and other aspects. Ideally, all code in production meets these expectations, but this is not always the case.
  • Clean code: Code that is readable, simple, and concise. Clean production-quality code is crucial for collaboration and maintainability in software development.
  • Modular code: Code that is logically broken up into functions and modules. Modular production-quality code that makes your code more organized, efficient, and reusable.
  • Module: A file. Modules allow code to be reused by encapsulating them into files that can be imported into other files.

Which of the following describes code that is clean? Select all the answers that apply.
Repetitive
Simple
Readable
Vague
Concise

Making your code modular makes it easier to do which of the following things? There may be more than one correct answer.
Reuse your code
Write less code
Read your code
Collaborate your code


Refactoring Code


Refactoring Code

  • Refactoring: Restructuring your code to improve its internal structure without changing its external functionality. This gives you a chance to clean and modularize your program after you’ve got it working.
  • Since it isn’t easy to write your best code while you’re still trying to just get it working, allocating time to do this is essential to producing high-quality code. Despite the initial time and effort required, this really pays off by speeding up your development time in the long run.
  • You become a much stronger programmer when you’re constantly looking to improve your code. The more you refactor, the easier it will be to structure and write good code the first time.

Writing Clean Code


Writing clean code: Meaningful names

Use meaningful names

  • Be descriptive and imply type: For booleans, you can prefix with is_ or has_ to make it clear it is a condition. You can also use parts of speech to imply types, like using verbs for functions and nouns for variables.
  • Be consistent but clearly differentiate: age_list and age is easier to differentiate than ages and age.
  • Avoid abbreviations and single letters: You can determine when to make these exceptions based on the audience for your code. If you work with other data scientists, certain variables may be common knowledge. While if you work with full stack engineers, it might be necessary to provide more descriptive names in these cases as well. (Exceptions include counters and common math variables.)
  • Long names aren’t the same as descriptive names: You should be descriptive, but only with relevant information. For example, good function names describe what they do well without including details about implementation or highly specific uses.

Try testing how effective your names are by asking a fellow programmer to guess the purpose of a function or variable based on its name, without looking at your code. Coming up with meaningful names often requires effort to get right.


Writing clean code: Nice whitespace

Use whitespace properly.

  • Organize your code with consistent indentation: the standard is to use four spaces for each indent. You can make this a default in your text editor.
  • Separate sections with blank lines to keep your code well organized and readable.
  • Try to limit your lines to around 79 characters, which is the guideline given in the PEP 8 style guide. In many good text editors, there is a setting to display a subtle line that indicates where the 79 character limit is.

For more guidelines, check out the code layout section of PEP 8 in the following notes.


References

PEP 8 guidelines for code layout


Quiz: Clean Code

Quiz: Categorizing tasks

Imagine you are writing a program that executes a number of tasks and categorizes each task based on its execution time. Below is a small snippet of this program. Which of the following naming changes could make this code cleaner? There may be more than one correct answer.

Python
1
2
3
t = end_time - start  # compute execution time
c = category(t) # get category of task
print('Task Duration: {} seconds, Category: {}'.format(t, c)

None
Rename the variable start to start_time to make it consistent with end_time
Rename the variable t to execution_time to make it more descriptive.
Rename the function category to categorize_task to math the part of speech.
Rename the variable c to category to make it more descriptive.


Quiz: Buying stocks

Imagine you analyzed several stocks and calculated the ideal price, or limit price, at which you’d want to buy each stock. You write a program to iterate through your stocks and buy it if the current price is below or equal to the limit price you computed. Otherwise, you put it on a watchlist. Below are three ways of writing this code. Which of the following is the most clean?

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Choice A
stock_limit_prices = {'LUX': 62.48, 'AAPL': 127.67, 'NVDA': 161.24}
for stock_ticker, stock_limit_price in buy_prices.items():
if stock_limit_price <= get_current_stock_price(ticker):
buy_stock(ticker)
else:
watchlist_stock(ticker)
# Choice B
prices = {'LUX': 62.48, 'AAPL': 127.67, 'NVDA': 161.24}
for ticker, price in prices.items():
if price <= current_price(ticker):
buy(ticker)
else:
watchlist(ticker)
# Choice C
limit_prices = {'LUX': 62.48, 'AAPL': 127.67, 'NVDA': 161.24}
for ticker, limit in limit_prices.items():
if limit <= get_current_price(ticker):
buy(ticker)
else:
watchlist(ticker)

Choice A
Choice B
Choice C


Writing Modular Code


Writing Modular Code

Follow the tips below to write modular code.

Tip: DRY (Don’t Repeat Yourself)
Don’t repeat yourself! Modularization allows you to reuse parts of your code. Generalize and consolidate repeated code in functions or loops.

Tip: Abstract out logic to improve readability
Abstracting out code into a function not only makes it less repetitive, but also improves readability with descriptive function names. Although your code can become more readable when you abstract out logic into functions, it is possible to over-engineer this and have way too many modules, so use your judgement.

Tip: Minimize the number of entities (functions, classes, modules, etc.)
There are trade-offs to having function calls instead of inline logic. If you have broken up your code into an unnecessary amount of functions and modules, you’ll have to jump around everywhere if you want to view the implementation details for something that may be too small to be worth it. Creating more modules doesn’t necessarily result in effective modularization.

Tip: Functions should do one thing
Each function you write should be focused on doing one thing. If a function is doing multiple things, it becomes more difficult to generalize and reuse. Generally, if there’s an “and” in your function name, consider refactoring.

Tip: Arbitrary variable names can be more effective in certain functions
Arbitrary variable names in general functions can actually make the code more readable.

Tip: Try to use fewer than three arguments per function
Try to use no more than three arguments when possible. This is not a hard rule and there are times when it is more appropriate to use many parameters. But in many cases, it’s more effective to use fewer arguments. Remember we are modularizing to simplify our code and make it more efficient. If your function has a lot of parameters, you may want to rethink how you are splitting this up.


Exercise: Refactoring - Wine quality

In this exercise, you’ll refactor code that analyzes a wine quality dataset taken from the UCI Machine Learning Repository. Each row contains data on a wine sample, including several physicochemical properties gathered from tests, as well as a quality rating evaluated by wine experts.

Download the notebook file refactor_wine_quality.ipynb and the dataset winequality-red.csv. Open the notebook file using the Jupyter Notebook. Follow the instructions in the notebook to complete the exercise.

Supporting Materials

Exercise - Refactoring – Wine quality


Solution: Refactoring – Wine quality

The following code shows the solution code. You can download the solution notebook file that contains the solution code.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import pandas as pd
df = pd.read_csv('winequality-red.csv', sep=';')
df.head()

# Renaming Columns

df.columns = [label.replace(' ', '_') for label in df.columns]
df.head()

# Analyzing Features

def numeric_to_buckets(df, column_name):
median = df[column_name].median()
for i, val in enumerate(df[column_name]):
if val >= median:
df.loc[i, column_name] = 'high'
else:
df.loc[i, column_name] = 'low'

for feature in df.columns[:-1]:
numeric_to_buckets(df, feature)
print(df.groupby(feature).quality.mean(), '\n')

My solution.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import pandas as pd
df = pd.read_csv('winequality-red.csv', sep=';')
df.head()

# For each label of df.columns, replace its space to underscore.
df.columns = [label.replace(' ', '_') for label in df.columns]

# Function for convert values to 'low' or 'high' based on the median of the column.
def numeric_to_categorical(df, label):
median = df[label].median() # Gather the median from the label.
df.loc[df[label] < median, label] = 'low' # Replace the values of label to 'low' where the values < median.
df.loc[df[label] != 'low', label] = 'high' # Replace the values of label to 'high' where the values are not 'low'.

return df

# For each label in df, call funtcion numeric_to_categorical
for feature in df.columns[:-1]:
numeric_to_categorical(df, feature)
print(df.groupby(feature).quality.mean(), '\n')

Supporting Materials


Efficient Code

Efficient Code

Knowing how to write code that runs efficiently is another essential skill in software development. Optimizing code to be more efficient can mean making it:

  • Execute faster
  • Take up less space in memory/storage

The project on which you’re working determines which of these is more important to optimize for your company or product. When you’re performing lots of different transformations on large


Optimizing - Common Books


Resources:


Exercise: Optimizing – Common books

We provide the code your coworker wrote to find the common book IDs in books_published_last_two_years.txt and all_coding_books.txt to obtain a list of recent coding books. Can you optimize it?

Download the notebook file optimizing_code_common_books.ipynb and the text files. Open the notebook file using the Jupyter Notebook. Follow the instructions in the notebook to complete the exercise.

You can also take a look at the example notebook optimizing_code_common_books_example.ipynb to help you finish the exercise.


Supporting Materials

Exercise - Optimizing – Common books


Solution: Optimizing - Common books

The following code shows the solution code. You can download the solution notebook file that contains the solution code.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import time
import pandas as pd
import numpy as np

with open('books_published_last_two_years.txt') as f:
recent_books = f.read().split('\n')

with open('all_coding_books.txt') as f:
coding_books = f.read().split('\n')

start = time.time()
recent_coding_books = []

for book in recent_books:
if book in coding_books:
recent_coding_books.append(book)

print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))

# Tip #1: Use vector operations over loops when possible

start = time.time()
recent_coding_books = np.intersect1d(recent_books, coding_books)
print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))

# Tip #2: Know your data structures and which methods are faster

start = time.time()
recent_coding_books = set(recent_books).intersection(coding_books)
print(len(recent_coding_books))
print('Duration: {} seconds'.format(time.time() - start))

Supporting Materials


Exercise: Optimizing - Holiday Gifts

In the last example, you learned that using vectorized operations and more efficient data structures can optimize your code. Let’s use these tips for one more exercise.

Your online gift store has one million users that each listed a gift on a wishlist. You have the prices for each of these gifts stored in gift_costs.txt. For the holidays, you’re going to give each customer their wishlist gift for free if the cost is under $25. Now, you want to calculate the total cost of all gifts under $25 to see how much you’d spend on free gifts.

Download the notebook file optimizing_code_holiday_gifts.ipynb and the gift_costs.txt file. Open the notebook file using the Jupyter Notebook. Follow the instructions in the notebook to complete the exercise.


Supporting Materials


Solution: Optimizing – Holiday gifts

The following code shows the solution code. You can download the solution notebook file that contains the solution code.

Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import time
import numpy as np

with open('gift_costs.txt') as f:
gift_costs = f.read().split('\n')

gift_costs = np.array(gift_costs).astype(int) # convert string to int

start = time.time()

total_price = 0
for cost in gift_costs:
if cost < 25:
total_price += cost * 1.08 # add cost after tax

print(total_price)
print('Duration: {} seconds'.format(time.time() - start))

# Refactor Code

start = time.time()

total_price = (gift_costs[gift_costs < 25]).sum() * 1.08
print(total_price)

print('Duration: {} seconds'.format(time.time() - start))

My Solution

Python
1
2
3
4
5
6
7
8
# Refactoring Solution 1
start = time.time()

gift_costs_taxed = np.where(gift_costs < 25, gift_costs * 1.08, 0)
total_price = np.sum(gift_costs_taxed) # TODO: compute the total price

print(total_price)
print('Duration: {} seconds'.format(time.time() - start))

Supporting Materials


Documentation


Documentation

  • Documentation: Additional text or illustrated information that comes with or is embedded in the code of software.
  • Documentation is helpful for clarifying complex parts of code, making your code easier to navigate, and quickly conveying how and why different components of your program are used.
  • Several types of documentation can be added at different levels of your program:
    • Inline comments - line level
    • Docstrings - module and function level
    • Project documentation - project level

Inline Comments


Inline Comments

  • Inline comments are text following hash symbols throughout your code. They are used to explain parts of your code, and really help future contributors understand your work.
  • Comments often document the major steps of complex code. Readers may not have to understand the code to follow what it does if the comments explain it. However, others would argue that this is using comments to justify bad code, and that if code requires comments to follow, it is a sign refactoring is needed.
  • Comments are valuable for explaining where code cannot. For example, the history behind why a certain method was implemented a specific way. Sometimes an unconventional or seemingly arbitrary approach may be applied because of some obscure external variable causing side effects. These things are difficult to explain with code.

Docstrings


Docstrings

Docstring, or documentation strings, are valuable pieces of documentation that explain the functionality of any function or module in your code. Ideally, each of your functions should always have a docstring.

Docstrings are surrounded by triple quotes. The first line of the docstring is a brief explanation of the function’s purpose.

One-line docstring

Python
1
2
3
def population_density(population, land_area):
"""Calculate the population density of an area."""
return population / land_area

If you think that the function is complicated enough to warrant a longer description, you can add a more thorough paragraph after the one-line summary.


Multi-line docstring

Python
1
2
3
4
5
6
7
8
9
10
11
12
def population_density(population, land_area):
"""Calculate the population density of an area.

Args:
population: int. The population of the area
land_area: int or float. This function is unit-agnostic, if you pass in values in terms of square km or square miles the function will return a density in those units.

Returns:
population_density: population/land_area. The population density of a
particular area.
"""
return population / land_area

The next element of a docstring is an explanation of the function’s arguments. Here, you list the arguments, state their purpose, and state what types the arguments should be. Finally, it is common to provide some description of the output of the function. Every piece of the docstring is optional; however, doc strings are a part of good coding practice.


Resources


Project Documentation

Project documentation is essential for getting others to understand why and how your code is relevant to them, whether they are potentials users of your project or developers who may contribute to your code. A great first step in project documentation is your README file. It will often be the first interaction most users will have with your project.

Whether it’s an application or a package, your project should absolutely come with a README file. At a minimum, this should explain what it does, list its dependencies, and provide sufficiently detailed instructions on how to use it. Make it as simple as possible for others to understand the purpose of your project and quickly get something working.

Translating all your ideas and thoughts formally on paper can be a little difficult, but you’ll get better over time, and doing so makes a significant difference in helping others realize the value of your project. Writing this documentation can also help you improve the design of your code, as you’re forced to think through your design decisions more thoroughly. It also helps future contributors to follow your original intentions.

There is a full Udacity course on this topic.

Here are a few READMEs from some popular projects:


Quiz: Documentation

Which of the following statements about in-line comments are true? There may be more than one correct answer.
Comments are useful for clarifying complex code.
You never have too many comments.
Comments are only for unreadable parts of code.
Readable code is preferable over having comments to make your code readable.

Which of the following statements about docstrings are true?
Multiline docstrings are better than single line docstrings.
Docstrings explain the purpose of a function or module.
Docstrings and comments are interchangeable.
You can add whatever details you want in a docstring.
Not including a docstring will cause an error.


Version Control in Data Science


Version Control In Data Science

If you need a refresher on using Git for version control, check out the course linked in the extracurriculars. If you’re ready, let’s see how Git is used in real data science scenarios!

Version Control with Git


Scenario #1


Scenario #1

Let’s walk through the Git commands that go along with each step in the scenario you just observed in the video.

Step 1: You have a local version of this repository on your laptop, and to get the latest stable version, you pull from the develop branch.

Switch to the develop branch
1
git checkout develop
Pull the latest changes in the develop branch
1
git pull

Step 2: When you start working on this demographic feature, you create a new branch called demographic, and start working on your code in this branch.

Create and switch to a new branch called demographic from the develop branch
1
git checkout -b demographic
Work on this new feature and commit as you go
1
2
3
git commit -m 'added gender recommendations'
git commit -m 'added location specific recommendations'
...

Step 3: However, in the middle of your work, you need to work on another feature. So you commit your changes on this demographic branch, and switch back to the develop branch.

Commit your changes before switching
1
git commit -m 'refactored demographic gender and location recommendations '
Switch to the develop branch
1
git checkout develop

Step 4: From this stable develop branch, you create another branch for a new feature called friend_groups.

Create and switch to a new branch called friend_groups from the develop branch
1
git checkout -b friend_groups

Step 5: After you finish your work on the friend_groups branch, you commit your changes, switch back to the development branch, merge it back to the develop branch, and push this to the remote repository’s develop branch.

Commit your changes before switching
1
git commit -m 'finalized friend_groups recommendations '
Switch to the develop branch
1
git checkout develop
Merge the friend_groups branch into the develop branch
1
git merge --no-ff friends_groups
Push to the remote repository
1
git push origin develop

Step 6: Now, you can switch back to the demographic branch to continue your progress on that feature.

Switch to the demographic branch
1
git checkout demographic

Scenario #2


Scenario #2

Let’s walk through the Git commands that go along with each step in the scenario you just observed in the video.

Step 1: You check your commit history, seeing messages about the changes you made and how well the code performed.

View the log history
1
git log

Step 2: The model at this commit seemed to score the highest, so you decide to take a look.

Check out a commit
1
git checkout bc90f2cbc9dc4e802b46e7a153aa106dc9a88560

After inspecting your code, you realize what modifications made it perform well, and use those for your model.

Step 3: Now, you’re confident merging your changes back into the development branch and pushing the updated recommendation engine.

Switch to the develop branch
1
git checkout develop
Merge the friend_groups branch into the develop branch
1
git merge --no-ff friend_groups
Push your changes to the remote repository
1
git push origin develop

Scenario #3


Scenario #3

Let’s walk through the Git commands that go along with each step in the scenario you just observed in the video.

Step 1: Andrew commits his changes to the documentation branch, switches to the development branch, and pulls down the latest changes from the cloud on this development branch, including the change I merged previously for the friends group feature.

Commit the changes on the documentation branch
1
git commit -m "standardized all docstrings in process.py"
Switch to the develop branch
1
git checkout develop
Pull the latest changes on the develop branch down
1
git pull

Step 2: Andrew merges his documentation branch into the develop branch on his local repository, and then pushes his changes up to update the develop branch on the remote repository.

Merge the documentation branch into the develop branch
1
git merge --no-ff documentation
Push the changes up to the remote repository
1
git push origin develop

Step 3: After the team reviews your work and Andrew’s work, they merge the updates from the development branch into the master branch. Then, they push the changes to the master branch on the remote repository. These changes are now in production.

Merge the develop branch into the master branch
1
git merge --no-ff develop
Push the changes up to the remote repository
1
git push origin master

Resources

Read this great article on a successful Git branching strategy.


Note on merge conflicts

For the most part, Git makes merging changes between branches really simple. However, there are some cases where Git can become confused about how to combine two changes, and asks you for help. This is called a merge conflict.

Mostly commonly, this happens when two branches modify the same file.

For example, in this situation, let’s say you deleted a line that Andrew modified on his branch. Git wouldn’t know whether to delete the line or modify it. You need to tell Git which change to take, and some tools even allow you to edit the change manually. If it isn’t straightforward, you may have to consult with the developer of the other branch to handle a merge conflict.

To learn more about merge conflicts and methods to handle them, see About merge conflicts.


Model versioning

In the previous example, you may have noticed that each commit was documented with a score for that model. This is one simple way to help you keep track of model versions. Version control in data science can be tricky, because there are many pieces involved that can be hard to track, such as large amounts of data, model versions, seeds, and hyperparameters.

The following resources offer useful methods and tools for managing model versions and large amounts of data. These are here for you to explore, but are not necessary to know now as you start your journey as a data scientist. On the job, you’ll always be learning new skills, and many of them will be specific to the processes set in your company.


Conclusion


Software Engineering Practices, part 2

Introduction


Welcome To Software Engineering Practices, Part 2
In part 2 of software engineering practices, you’ll learn about the following practices of software engineering and how they apply in data science.

  • Testing
  • Logging
  • Code reviews

Testing


Testing

Testing your code is essential before deployment. It helps you catch errors and faulty conclusions before they make any major impact. Today, employers are looking for data scientists with the skills to properly prepare their code for an industry setting, which includes testing their code.


Testing and Data Science


Testing And Data Science

  • Problems that could occur in data science aren’t always easily detectable; you might have values being encoded incorrectly, features being used inappropriately, or unexpected data breaking assumptions.
  • To catch these errors, you have to check for the quality and accuracy of your analysis in addition to the quality of your code. Proper testing is necessary to avoid unexpected surprises and have confidence in your results.
  • Test-driven development (TDD): A development process in which you write tests for tasks before you even write the code to implement those tasks.
  • Unit test: A type of test that covers a “unit” of code—usually a single function—independently from the rest of the program.

Resources


Unit Tests


Unit tests

We want to test our functions in a way that is repeatable and automated. Ideally, we’d run a test program that runs all our unit tests and cleanly lets us know which ones failed and which ones succeeded. Fortunately, there are great tools available in Python that we can use to create effective unit tests!


Unit test advantages and disadvantages

The advantage of unit tests is that they are isolated from the rest of your program, and thus, no dependencies are involved. They don’t require access to databases, APIs, or other external sources of information. However, passing unit tests isn’t always enough to prove that our program is working successfully. To show that all the parts of our program work with each other properly, communicating and transferring data between them correctly, we use integration tests. In this lesson, we’ll focus on unit tests; however, when you start building larger programs, you will want to use integration tests as well.

To learn more about integration testing and how integration tests relate to unit tests, see Integration Testing. That article contains other very useful links as well.


Unit Testing Tools


Unit Testing Tools

To install pytest, run pip install -U pytest in your terminal. You can see more information on getting started here.

  • Create a test file starting with test_.
  • Define unit test functions that start with test_ inside the test file.
  • Enter pytest into your terminal in the directory of your test file and it detects these tests for you.

test_ is the default; if you wish to change this, you can learn how in this pytest configuration.

In the test output, periods represent successful unit tests and Fs represent failed unit tests. Since all you see is which test functions failed, it’s wise to have only one assert statement per test. Otherwise, you won’t know exactly how many tests failed or which tests failed.

Your test won’t be stopped by failed assert statements, but it will stop if you have syntax errors.


Exercise: Unit tests

Download README.md, compute_launch.py, and test_compute_launch.py.

Follow the instructions in README.md to complete the exercise.


Supporting Materials

Exercise - Unit tests


Test-driven development and data science


Test-driven development and data science

  • Test-driven development: Writing tests before you write the code that’s being tested. Your test fails at first, and you know you’ve finished implementing a task when the test passes.
  • Tests can check for different scenarios and edge cases before you even start to write your function. When start implementing your function, you can run the test to get immediate feedback on whether it works or not as you tweak your function.
  • When refactoring or adding to your code, tests help you rest assured that the rest of your code didn’t break while you were making those changes. Tests also helps ensure that your function behavior is repeatable, regardless of external parameters such as hardware and time.

Test-driven development for data science is relatively new and is experiencing a lot of experimentation and breakthroughs. You can learn more about it by exploring the following resources.


Logging


Logging

Logging is valuable for understanding the events that occur while running your program. For example, if you run your model overnight and the results the following morning are not what you expect, log messages can help you understand more about the context in those results occurred. Let’s learn about the qualities that make a log message effective.

Logging HOWTO


Log Messages

Logging is the process of recording messages to describe events that have occurred while running your software. Let’s take a look at a few examples, and learn tips for writing good log messages.

Tip: Be professional and clear

1
2
3
Bad: Hmmm... this isn't working???
Bad: idk.... :(
Good: Couldn't parse file.

Tip: Be concise and use normal capitalization

1
2
3
Bad: Start Product Recommendation Process
Bad: We have completed the steps necessary and will now proceed with the recommendation process for the records in our product database.
Good: Generating product recommendations.

Tip: Choose the appropriate level for logging

  • Debug: Use this level for anything that happens in the program.
  • Error: Use this level to record any error that occurs.
  • Info: Use this level to record all actions that are user driven or system specific, such as regularly scheduled operations.

Tip: Provide any useful information

1
2
Bad: Failed to read location data
Good: Failed to read location data: store_id 8324971

Quiz: Logging

What are some ways this log message could be improved? There may be more than one correct answer.

1
ERROR - Failed to compute product similarity. I made sure to fix the error from October so not sure why this would occur again. 

Use the DEBUG level rather the ERROR level for this log message.
Add more details about this error, such as what step or product the program was on when this occurred.
Use title case for the message.
Remove the second sentence.
None of the above: this is a great log message.


Code Reviewers


Code reviews

Code reviews benefit everyone in a team to promote best programming practices and prepare code for production. Let’s go over what to look for in a code review and some tips on how to conduct one.


Questions to ask yourself when conducting a code review

First, let’s look over some of the questions we might ask ourselves while reviewing code. These are drawn from the concepts we’ve covered in these last two lessons.

Is the code clean and modular?

  • Can I understand the code easily?
  • Does it use meaningful names and whitespace?
  • Is there duplicated code?
  • Can I provide another layer of abstraction?
  • Is each function and module necessary?
  • Is each function or module too long?

Is the code efficient?

  • Are there loops or other steps I can vectorize?
  • Can I use better data structures to optimize any steps?
  • Can I shorten the number of calculations needed for any steps?
  • Can I use generators or multiprocessing to optimize any steps?

Is the documentation effective?

  • Are inline comments concise and meaningful?
  • Is there complex code that’s missing documentation?
  • Do functions use effective docstrings?
  • Is the necessary project documentation provided?

Is the code well tested?

  • Does the code high test coverage?
  • Do tests check for interesting cases?
  • Are the tests readable?
  • Can the tests be made more efficient?

Is the logging effective?

  • Are log messages clear, concise, and professional?
  • Do they include all relevant and useful information?
  • Do they use the appropriate logging level?

Tips for conducting a code review

Now that we know what we’re looking for, let’s go over some tips on how to actually write your code review. When your coworker finishes up some code that they want to merge to the team’s code base, they might send it to you for review. You provide feedback and suggestions, and then they may make changes and send it back to you. When you are happy with the code, you approve it and it gets merged to the team’s code base.

As you may have noticed, with code reviews you are now dealing with people, not just computers. So it’s important to be thoughtful of their ideas and efforts. You are in a team and there will be differences in preferences. The goal of code review isn’t to make all code follow your personal preferences, but to ensure it meets a standard of quality for the whole team.

Tip: Use a code linter
This isn’t really a tip for code review, but it can save you lots of time in a code review. Using a Python code linter like pylint can automatically check for coding standards and PEP 8 guidelines for you. It’s also a good idea to agree on a style guide as a team to handle disagreements on code style, whether that’s an existing style guide or one you create together incrementally as a team.

Tip: Explain issues and make suggestions
Rather than commanding people to change their code a specific way because it’s better, it will go a long way to explain to them the consequences of the current code and suggest changes to improve it. They will be much more receptive to your feedback if they understand your thought process and are accepting recommendations, rather than following commands. They also may have done it a certain way intentionally, and framing it as a suggestion promotes a constructive discussion, rather than opposition.

1
2
3
4
5
BAD: Make model evaluation code its own module - too repetitive.

BETTER: Make the model evaluation code its own module. This will simplify models.py to be less repetitive and focus primarily on building models.

GOOD: How about we consider making the model evaluation code its own module? This would simplify models.py to only include code for building models. Organizing these evaluations methods into separate functions would also allow us to reuse them with different models without repeating code.

Tip: Keep your comments objective
Try to avoid using the words “I” and “you” in your comments. You want to avoid comments that sound personal to bring the attention of the review to the code and not to themselves.

1
2
3
4
5
BAD: I wouldn't groupby genre twice like you did here... Just compute it once and use that for your aggregations.

BAD: You create this groupby dataframe twice here. Just compute it once, save it as groupby_genre and then use that to get your average prices and views.

GOOD: Can we group by genre at the beginning of the function and then save that as a groupby object? We could then reference that object to get the average prices and views without computing groupby twice.

Tip: Provide code examples
When providing a code review, you can save the author time and make it easy for them to act on your feedback by writing out your code suggestions. This shows you are willing to spend some extra time to review their code and help them out. It can also just be much quicker for you to demonstrate concepts through code rather than explanations.

Let’s say you were reviewing code that included the following lines:

1
2
3
4
5
6
7
8
9
10
first_names = []
last_names = []

for name in enumerate(df.name):
first, last = name.split(' ')
first_names.append(first)
last_names.append(last)

df['first_name'] = first_names
df['last_names'] = last_names
1
2
BAD: You can do this all in one step by using the pandas str.split method.
GOOD: We can actually simplify this step to the line below using the pandas str.split method. Found this on this stack overflow post: https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns
1
df['first_name'], df['last_name'] = df['name'].str.split(' ', 1).str

Linting Python in Visual Studio Code


Conclusion


Introduction to Object-Oriented Programming

Introduction


Lesson outline

  • Object-oriented programming syntax
    • Procedural vs. object-oriented programming
    • Classes, objects, methods and attributes
    • Coding a class
    • Magic methods
    • Inheritance
  • Using object-oriented programming to make a Python package
    • Making a package
    • Tour of scikit-learn source code
    • Putting your package on PyPi

Why object-oriented programming?

Object-oriented programming has a few benefits over procedural programming, which is the programming style you most likely first learned. As you’ll see in this lesson:

  • Object-oriented programming allows you to create large, modular programs that can easily expand over time.
  • Object-oriented programs hide the implementation from the end user.

Consider Python packages like Scikit-learn, pandas, and NumPy. These are all Python packages built with object-oriented programming. Scikit-learn, for example, is a relatively large and complex package built with object-oriented programming. This package has expanded over the years with new functionality and new algorithms.

When you train a machine learning algorithm with Scikit-learn, you don’t have to know anything about how the algorithms work or how they were coded. You can focus directly on the modeling.

Here’s an example taken from the Scikit-learn website:

Python
1
2
3
4
5
6
from sklearn import svm

X = [[0, 0], [1, 1]]
y = [0, 1]
clf = svm.SVC()
clf.fit(X, y)

How does Scikit-learn train the SVM model? You don’t need to know because the implementation is hidden with object-oriented programming. If the implementation changes, you (as a user of Scikit-learn) might not ever find out. Whether or not you should understand how SVM works is a different question.

In this lesson, you’ll practice the fundamentals of object-oriented programming. By the end of the lesson, you’ll have built a Python package using object-oriented programming.


Lesson files

This lesson uses classroom workspaces that contain all of the files and functionality you need. You can also find the files in the data scientist nanodegree term 2 GitHub repo.


Procedural vs. object-oriented programming

Procedural vs. object-oriented programming


Objects are defined by characteristics and actions

Here is a reminder of what is a characteristic and what is an action.

Objects are defined by their characteristics and their actions

Characteristics and actions in English grammar

You can also think about characteristics and actions is in terms of English grammar. A characteristic corresponds to a noun and an action corresponds to a verb.

Let’s pick something from the real world: a dog. Some characteristics of the dog include the dog’s weight, color, breed, and height. These are all nouns. Some actions a dog can take include to bark, to run, to bite, and to eat. These are all verbs.


Quiz: Characteristics versus actions

Select the characteristics of a tree object. There may be more than one correct answer.
Height
Color
To grow
Width
To fall down
Species

Which of the following would be considered actions for a laptop computer object?
Memory
Width
To turn on
Operating system
To turn off
Thickness
Weight
To erase


Class, object, method, and attribute

Class, object, method, and attribute


Object-oriented programming (OOP) vocabulary

  • Class: A blueprint consisting of methods and attributes.
  • Object: An instance of a class. It can help to think of objects as something in the real world like a yellow pencil, a small dog, or a blue shirt. However, as you’ll see later in the lesson, objects can be more abstract.
  • Attribute: A descriptor or characteristic. Examples would be color, length, size, etc. These attributes can take on specific values like blue, 3 inches, large, etc.
  • Method: An action that a class or object could take.
  • OOP: A commonly used abbreviation for object-oriented programming.
  • Encapsulation: One of the fundamental ideas behind object-oriented programming is called encapsulation: you can combine functions and data all into a single entity. In object-oriented programming, this single entity is called a class.
  • Encapsulation allows you to hide implementation details, much like how the scikit-learn package hides the implementation of machine learning algorithms.

In English, you might hear an attribute described as a property, description, feature, quality, trait, or characteristic. All of these are saying the same thing.

Here is a reminder of how a class, an object, attributes, and methods relate to each other.

A class is a blueprint consisting of attributes and methods.

Match the vocabulary term on the left with the examples on the right.

TERM EXAMPLES
Object Stephen Hawking, Angela Merkel, Brad Pitt
Class Scientist, chancellor, actor
Attribute Color, size, shape
Method To rain, to ring, to ripen
Value Gray, large, round

OOP syntax

Object-oriented programming syntax

In this video, you’ll see what a class and object look like in Python. In the next section, you’ll have the chance to play around with the code. Finally, you’ll write your own class.


Function versus method

In the video above, at 1:44, the dialogue mistakenly calls init a function rather than a method. Why is init not a function?

A function and a method look very similar. They both use the def keyword. They also have inputs and return outputs. The difference is that a method is inside of a class whereas a function is outside of a class.


What is self?

If you instantiate two objects, how does Python differentiate between these two objects?

Python
1
2
shirt_one = Shirt('red', 'S', 'short-sleeve', 15)
shirt_two = Shirt('yellow', 'M', 'long-sleeve', 20)

That’s where self comes into play. If you call the change_price method on shirt_one, how does Python know to change the price of shirt_one and not of shirt_two?

Python
1
shirt_one.change_price(12)

Behind the scenes, Python is calling the change_price method:

Python
1
2
3
def change_price(self, new_price):

self.price = new_price

Self tells Python where to look in the computer’s memory for the shirt_one object. Then, Python changes the price of the shirt_one object. When you call the change_price method, shirt_one.change_price(12), self is implicitly passed in.

The word self is just a convention. You could actually use any other name as long as you are consistent, but you should use self to avoid confusing people.


Exercise: OOP syntax practice, part 1

Exercise: Use the Shirt class

Shirt class exercise

You’ve seen what a class looks like and how to instantiate an object. Now it’s your turn to write code that instantiates a shirt object.

You need to download three files for this exercise. These files are located on this page in the Supporting materials section.

  • Shirt_exercise.ipynb contains explanations and instructions.
  • Answer.py containing solution to the exercise.
  • Tests.py tests for checking your code: You can run these tests using the last code cell at the bottom of the notebook.

Getting started

Open the Shirt Exercise.ipynb notebook file using Jupyter Notebook and follow the instructions in the notebook to complete the exercise.


Supporting Materials


Notes about OOP

Notes about OOP


Set and get methods

The last part of the video mentioned that accessing attributes in Python can be somewhat different than in other programming languages like Java and C++. This section goes into further detail.

The Shirt class has a method to change the price of the shirt: shirt_one.change_price(20). In Python, you can also change the values of an attribute with the following syntax:

Python
1
2
3
4
5
shirt_one.price = 10
shirt_one.price = 20
shirt_one.color = 'red'
shirt_one.size = 'M'
shirt_one.style = 'long_sleeve'

This code accesses and changes the price, color, size, and style attributes directly. Accessing attributes directly would be frowned upon in many other languages, but not in Python. Instead, the general object-oriented programming convention is to use methods to access attributes or change attribute values. These methods are called set and get methods or setter and getter methods.

A get method is for obtaining an attribute value. A set method is for changing an attribute value. If you were writing a Shirt class, you could use the following code:

Python
1
2
3
4
5
6
7
8
9
10
class Shirt:

def __init__(self, shirt_color, shirt_size, shirt_style, shirt_price):
self._price = shirt_price

def get_price(self):
return self._price

def set_price(self, new_price):
self._price = new_price

Instantiating and using an object might look like the following code:

Python
1
2
3
shirt_one = Shirt('yellow', 'M', 'long-sleeve', 15)
print(shirt_one.get_price())
shirt_one.set_price(10)

In the class definition, the underscore in front of price is a somewhat controversial Python convention. In other languages like C++ or Java, price could be explicitly labeled as a private variable. This would prohibit an object from accessing the price attribute directly like shirt_one._price = 15. Unlike other languages, Python does not distinguish between private and public variables. Therefore, there is some controversy about using the underscore convention as well as get and set methods in Python. Why use get and set methods in Python when Python wasn’t designed to use them?

At the same time, you’ll find that some Python programmers develop object-oriented programs using get and set methods anyway. Following the Python convention, the underscore in front of price is to let a programmer know that price should only be accessed with get and set methods rather than accessing price directly with shirt_one._price. However, a programmer could still access _price directly because there is nothing in the Python language to prevent the direct access.

To reiterate, a programmer could technically still do something like shirt_one._price = 10, and the code would work. But accessing price directly, in this case, would not be following the intent of how the Shirt class was designed.

One of the benefits of set and get methods is that, as previously mentioned in the course, you can hide the implementation from your user. Perhaps, originally, a variable was coded as a list and later became a dictionary. With set and get methods, you could easily change how that variable gets accessed. Without set and get methods, you’d have to go to every place in the code that accessed the variable directly and change the code.

You can read more about get and set methods in Python on this Python Tutorial site.


Attributes

There are some drawbacks to accessing attributes directly versus writing a method for accessing attributes.

In terms of object-oriented programming, the rules in Python are a bit looser than in other programming languages. As previously mentioned, in some languages, like C++, you can explicitly state whether or not an object should be allowed to change or access an attribute’s values directly. Python does not have this option.

Why might it be better to change a value with a method instead of directly? Changing values via a method gives you more flexibility in the long-term. What if the units of measurement change, like if the store was originally meant to work in US dollars and now has to handle Euros? Here’s an example:

Example: Dollars versus Euros
If you’ve changed attribute values directly, you’ll have to go through your code and find all the places where US dollars were used, such as in the following:

Python
1
shirt_one.price = 10 # US dollars

Then, you’ll have to manually change them to Euros.

Python
1
shirt_one.price = 8 # Euros

If you had used a method, then you would only have to change the method to convert from dollars to Euros.

Python
1
2
3
4
def change_price(self, new_price):
self.price = new_price * 0.81 # convert dollars to Euros

shirt_one.change_price(10)

For the purposes of this introduction to object-oriented programming, you don’t need to worry about updating attributes directly versus with a method; however, if you decide to further your study of object-oriented programming, especially in another language such as C++ or Java, you’ll have to take this into consideration.


Modularized code

Thus far in the lesson, all of the code has been in Jupyter Notebooks. For example, in the previous exercise, a code cell loaded the Shirt class, which gave you access to the shirt class throughout the rest of the notebook.

If you were developing a software program, you would want to modularize this code. You would put the Shirt class into its own Python script, which you might call shirt.py. In another Python script, you would import the Shirt class with a line like from shirt import Shirt.

For now, as you get used to OOP syntax, you’ll be completing exercises in Jupyter Notebooks. Midway through the lesson, you’ll modularize object-oriented code into separate files.


Exercise: OOP syntax practice, part 2

Exercise: Use the Pants class

Now that you’ve had some practice instantiating objects, it’s time to write your own class from scratch.

This lesson has two parts.

  • In the first part, you’ll write a Pants class. This class is similar to the Shirt class with a couple of changes. Then you’ll practice instantiating Pants objects.
  • In the second part, you’ll write another class called SalesPerson. You’ll also instantiate objects for the SalesPerson.

This exercise requires two files, which are located on this page in the Supporting Materials section.

  • exercise.ipynbcontains explanations and instructions.
  • answer.py contains solution to the exercise.

Getting started

Open the exercise.ipynb notebook file using Jupyter Notebook and follow the instructions in the notebook to complete the exercise.


Supporting Materials


Commenting object-oriented code

Commenting object-oriented code

Did you notice anything special about the answer key in the previous exercise? The Pants class and the SalesPerson class contained docstrings! A docstring is a type of comment that describes how a Python module, function, class, or method works. Docstrings are not unique to object-oriented programming.

For this section of the course, you just need to remember to use docstrings and to comment your code. It will help you understand and maintain your code and even make you a better job candidate.

From this point on, please always comment your code. Use both inline comments and document-level comments as appropriate.

To learn more about docstrings, see Example Google Style Python Docstrings.

Example Google Style Python Docstrings
Example NumPy Style Python Docstrings


Docstrings and object-oriented code

The following example shows a class with docstrings. Here are a few things to keep in mind:

  • Make sure to indent your docstrings correctly or the code will not run. A docstring should be indented one indentation underneath the class or method being described.
  • You don’t have to define self in your method docstrings. It’s understood that any method will have self as the first method input.
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
class Pants:
"""The Pants class represents an article of clothing sold in a store
"""

def __init__(self, color, waist_size, length, price):
"""Method for initializing a Pants object

Args:
color (str)
waist_size (int)
length (int)
price (float)

Attributes:
color (str): color of a pants object
waist_size (str): waist size of a pants object
length (str): length of a pants object
price (float): price of a pants object
"""

self.color = color
self.waist_size = waist_size
self.length = length
self.price = price

def change_price(self, new_price):
"""The change_price method changes the price attribute of a pants object

Args:
new_price (float): the new price of the pants object

Returns: None

"""
self.price = new_price

def discount(self, percentage):
"""The discount method outputs a discounted price of a pants object

Args:
percentage (float): a decimal representing the amount to discount

Returns:
float: the discounted price
"""
return self.price * (1 - percentage)

Gaussian class

Gaussian class


Resources for review

The example in the next part of the lesson assumes you are familiar with Gaussian and binomial distributions.

Here are a few formulas that might be helpful:

Gaussian distribution formulas

probability density function:

$$\displaystyle f(x | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x - \mu)^2/2\sigma^2}$$

  • $\mu$ is the mean
  • $\sigma$ is the standard deviation
  • $\sigma^2$ is the variance

Binomial distribution formulas

  • mean: $\displaystyle \mu = n \times p$

In other words, a fair coin has a probability of a positive outcome (heads) $p = 0.5$. If you flip a coin 20 times, the mean would be $20 * 0.5 = 10$; you’d expect to get 10 heads.

  • variance: $\displaystyle \sigma^2 = np(1 - p)$

Continuing with the coin example, $n$ would be the number of coin tosses and $p$ would be the probability of getting heads.

  • Standard deviation: $\displaystyle \sigma = \sqrt{np(1-p)}$

In other words, the standard deviation is the square root of the variance.

probability density function

$$\displaystyle f(k, n, p) = \frac{n!}{k!(n-k)!}p^k(1-p)^{(n-k)}$$


Further resources

If you would like to review the Gaussian (normal) distribution and binomial distribution, here are a few resources:

This free Udacity course, Intro to Statistics, has a lesson on Gaussian distributions as well as the binomial distribution.

This free course, Intro to Descriptive Statistics, also has a Gaussian distributions lesson.

There are also relevant Wikipedia articles:

Gaussian Distributions Wikipedia
Binomial Distributions Wikipedia


Quiz

How to Use and Create a Z-Table (Standard Normal Table)
Quiz - Gaussian class

Here are a few quiz questions to help you determine how well you understand the Gaussian and binomial distributions. Even if you can’t remember how to answer these types of questions, feel free to move on to the next part of the lesson; however, the material assumes you know what these distributions are and that you know the basics of how to work with them.

Assume the average weight of an American adult male is 180 pounds, with a standard deviation of 34 pounds. The distribution of weights follows a normal distribution. What is the probability that a man weighs exactly 185 pounds?

0.56
0
0.44
0.059

$\mu = 180, \sigma = 34, \sigma^2 = 34^2 = 1156$


Assume the average weight of an American adult male is 180 pounds, with a standard deviation of 34 pounds. The distribution of weights follows a normal distribution. What is the probability that a man weighs somewhere between 120 and 155 pounds?

0
0.23
0.27
0.19


Now, consider a binomial distribution. Assume that 15% of the population is allergic to cats. If you randomly select 60 people for a medical trial, what is the probability that 7 of those people are allergic to cats?

.01
.14
0
.05
.12


How the Gaussian class works


Exercise: Code the Gaussian class

In this exercise, you will use the Gaussian distribution class for calculating and visualizing a Gaussian distribution.

This exercise requires three files, which are located on this page in the Supporting materials section.

  • Gaussian_code_exercise.ipynb contains explanations and instructions.
  • Answer.py contains the solution to the exercise .
  • Numbers.txt can be read in by the read_data_file() method.

Getting started

Open the Gaussian_code_exercise.ipynb notebook file using Jupyter Notebook and follow the instructions in the notebook to complete the exercise.


Supporting Materials


Magic methods

Magic methods


Magic methods in code


Exercise: Code magic methods

Exercise: Code magic methods

Extend the code from the previous exercise by using two new methods, add and repr.

This exercise requires three files, which are located on this page in the Supporting materials section.

  • Magic_methods.ipynb contains explanations and instructions.
  • Answer.py contains the solution to the exercise.
  • Numbers.txt can be read in by the read_data_file() method.

Getting started

Open the Magic_methods.ipynb notebook file using Jupyter Notebook and follow the instructions in the notebook to complete the exercise.


Supporting Materials


Inheritance

Inheritance


Inheritance code

In the following video, you’ll see how to code inheritance using Python.


Check the boxes next to the statements that are true. There may be more than one correct answer.
Inheritance helps organize code with a more general version of a class and then specific children.
Inheritance makes code much more difficult to maintain.
Inheritance can make object-oriented programs more efficient to write.
Updates to a parent class automatically trickle down to its children.


Exercise: Inheritance with clothing

Exercise: Inheritance with clothing

Using the Clothing parent class and two children classes, Shirt and Pants, you will code a new class called Blouse.

This exercise requires two files, which are located on this page in the Supporting materials section.

  • Inheritance_exercise_clothing.ipynb contains explanations and instructions.
  • Answer.py contains the solution to the exercise.

Getting started

Open the Inheritance_exercise_clothing.ipynb notebook file using Jupyter Notebook and follow the instructions in the notebook to complete the exercise.


Supporting Materials


Inheritance Gaussian class


Demo: Inheritance probability distributions

Inheritance with the Gaussian class

This is a code demonstration, so you do not need to write any code.

From the Supporting materials section on this page, download the file calledinheritance_probability_distribution.ipynb


Getting started

Open the file using Jupyter Notebook and follow these instructions:

To give another example of inheritance, read through the code in this Jupyter Notebook to see how the code works.

  • You can see the Gaussian distribution code is refactored into a generic distribution class and a Gaussian distribution class.
  • The distribution class takes care of the initialization and the read_data_file method. The rest of the Gaussian code is in the Gaussian class. You’ll use this distribution class in an exercise at the end of the lesson.
    Run the code in each cell of this Jupyter Notebook.

Supporting Materials


Organizing into modules

Organizing into modules


Windows vs. macOS vs. Linux

Linux, which our Udacity classroom workspaces use, is an operating system like Windows or macOS. One important difference is that Linux is free and open source, while Windows is owned by Microsoft and macOS by Apple.

Throughout the lesson, you can do all of your work in a classroom workspace. These workspaces provide interfaces that connect to virtual machines in the cloud. However, if you want to run this code locally on your computer, the commands you use might be slightly different.

If you are using macOS, you can open an application called Terminal and use the same commands that you use in the workspace. That is because Linux and MacOS are related.

If you are using Windows, the analogous application is called Console. The Console commands can be somewhat different than the Terminal commands. Use a search engine to find the right commands in a Windows environment.

The classroom workspace has one major benefit. You can do whatever you want to the workspace, including installing Python packages. If something goes wrong, you can reset the workspace and start with a clean slate; however, always download your code files or commit your code to GitHub or GitLab before resetting a workspace. Otherwise, you’ll lose your code!


Demo: Modularized code

Demo: Modularized code

This is a code demonstration, so you do not need to write any code.

So far, the coding exercises have been in Jupyter Notebooks. Jupyter Notebooks are especially useful for data science applications because you can wrangle data, analyze data, and share a report all in one document. However, they’re not ideal for writing modular programs, which require separating code into different files.

At the bottom of this page under Supporting materials, download three files.

  • Gaussiandistribution.py
  • Generaldistribution.py
  • example_code.py

Look at how the distribution class and Gaussian class are modularized into different files.

The Gaussiandistribution.py imports the Distribution class from the Generaldistribution.py file. Note the following line of code:

Python
1
from Generaldistribution import Distribution

This code essentially pastes the distribution code to the top of the Gaussiandistribution file when you run the code. You can see in the example_code.py file an example of how to use the Gaussian class.

The example_code.py file then imports the Gaussian distribution class.

For the rest of the lesson, you’ll work with modularized code rather than a Jupyter Notebook. Go through the code in the modularized_code folder to understand how everything is organized.


Supporting Materials


Advanced OOP topics

Inheritance is the last object-oriented programming topic in the lesson. Thus far you’ve been exposed to:

  • Classes and objects
  • Attributes and methods
  • Magic methods
  • Inheritance

Classes, object, attributes, methods, and inheritance are common to all object-oriented programming languages.

Knowing these topics is enough to start writing object-oriented software. What you’ve learned so far is all you need to know to complete this OOP lesson. However, these are only the fundamentals of object-oriented programming.

Use the following list of resources to learn more about advanced Python object-oriented programming topics.


Making a package

Making a package

In the previous section, the distribution and Gaussian code was refactored into individual modules. A Python module is just a Python file containing code.

In this next section, you’ll convert the distribution code into a Python package. A package is a collection of Python modules. Although the previous code might already seem like it was a Python package because it contained multiple files, a Python package also needs an __init__.py file. In this section, you’ll learn how to create this __init__.py file and then pip install the package into your local Python installation.


What is pip?

pip is a Python package manager that helps with installing and uninstalling Python packages. You might have used pip to install packages using the command line: pip install numpy. When you execute a command like pip install numpy, pip downloads the package from a Python package repository called PyPI.

For this next exercise, you’ll use pip to install a Python package from a local folder on your computer. The last part of the lesson will focus on uploading packages to PyPi so that you can share your package with the world.

You can complete this entire lesson within the classroom using the provided workspaces; however, if you want to develop a package locally on your computer, you should consider setting up a virtual environment. That way, if you install your package on your computer, the package won’t install into your main Python installation. Before starting the next exercise, the next part of the lesson will discuss what virtual environments are and how to use them.


Object-oriented programming and Python packages

A Python package does not need to use object-oriented programming. You could simply have a Python module with a set of functions. However, most—if not all—of the popular Python packages take advantage of object-oriented programming for a few reasons:

  1. Object-oriented programs are relatively easy to expand, especially because of inheritance.
  2. Object-oriented programs obscure functionality from the user. Consider scipy packages. You don’t need to know how the actual code works in order to use its classes and methods.

Virtual environments

Python environments

In the next part of the lesson, you’ll be given a workspace where you can upload files into a Python package and pip install the package. If you decide to install your package on your local computer, you’ll want to create a virtual environment. A virtual environment is a silo-ed Python installation apart from your main Python installation. That way you can install packages and delete the virtual environment without affecting your main Python installation.

Let’s talk about two different Python environment managers: conda and venv. You can create virtual environments with either one. The following sections describe each of these environment managers, including some advantages and disadvantages. If you’ve taken other data science, machine learning, or artificial intelligence courses at Udacity, you’re probably already familiar with conda.


Conda

Conda does two things: manages packages and manages environments.

As a package manager, conda makes it easy to install Python packages, especially for data science. For instance, typing conda install numpy installs the numpy package.

As an environment manager, conda allows you to create silo-ed Python installations. With an environment manager, you can install packages on your computer without affecting your main Python installation.

The command line code looks something like the following:

CLI
1
2
3
conda create --name [environmentname]
source activate [environmentname]
conda install numpy

pip and Venv

There are other environmental managers and package managers besides conda. For example, venv is an environment manager that comes preinstalled with Python 3. pip is a package manager.

pip can only manage Python packages, whereas conda is a language agnostic package manager. In fact, conda was invented because pip could not handle data science packages that depended on libraries outside of Python. If you look at the history of conda, you’ll find that the software engineers behind conda needed a way to manage data science packages (such as NumPy and Matplotlib) that relied on libraries outside of Python.

conda manages environments and packages. pip only manages packages.

To use venv and pip, the commands look something like the following:

CLI
1
2
3
python3 -m venv [environmentname]
source [environmentname]/bin/activate
pip install numpy

Which to choose

Whether you choose to create environments with venv or conda will depend on your use case. conda is very helpful for data science projects, but conda can make generic Python software development a bit more confusing; that’s the case for this project.

If you create a conda environment, activate the environment, and then pip install the distributions package, you’ll find that the system installs your package globally rather than in your local conda environment. However, if you create the conda environment and install pip simultaneously, you’ll find that pip behaves as expected when installing packages into your local environment:

CLI
1
conda create --name [environmentname] pip

On the other hand, using pip with venv works as expected. pip and venv tend to be used for generic software development projects including web development. For this lesson on creating packages, you can use conda or venv if you want to develop locally on your computer and install your package.

The following video shows how to use venv, which is what we recommend for this project.


Instructions for venv

For instructions about how to set up virtual environments on a macOS, Linux, or Windows machine using the terminal, see Installing packages using pip and virtual environments.

Refer to the following notes for understanding the tutorial:

  • If you are using Python 2.7.9 or later (including Python 3), the Python installation should already come with the Python package manager called pip. There is no need to install it.
  • env is the name of the environment you want to create. You can call env anything you want.
  • Python 3 comes with a virtual environment package preinstalled. Instead of typing python3 -m virtualenv env, you can type python3 -m venv env to create a virtual environment.

Once you’ve activated a virtual environment, you can then use terminal commands to go into the directory where your Python library is stored. Then, you can run pip install.

In the next section, you can practice pip installing and creating virtual environments in the classroom workspace. You’ll see that creating a virtual environment actually creates a new folder containing a Python installation. Deleting this folder removes the virtual environment.

If you install packages on the workspace and run into issues, you can always reset the workspace; however, you will lose all of your work. Be sure to download any files you want to keep before resetting a workspace.


Exercise: Making a package and pip installing

Exercise: Making a package and pip installing

In this exercise, you will convert modularized code into a Python package.

This exercise requires three files, which are located on this page in the Supporting materials section.

  • Gaussiandistribution.py
  • Generaldistribution.py
  • 3b_answer_python_package.zip contains the solution to the exercise.

Instructions

Following the instructions from the previous video, convert the modularized code into a Python package.

On your local computer, you need to create a folder called 3a_python_package. Inside this folder, you need to create a few folders and files:

  • A setup.py file, which is required in order to use pip install.
  • A subfolder called distributions, which is the name of the Python package.
  • Inside the distributions folder, you need:
    • The Gaussiandistribution.py file (provided).
    • The Generaldistribution.py file (provided).
    • The __init__.py file (you need to create this file).

Once everything is set up, in order to actually create the package, use your terminal window to navigate into the 3a_python_package folder.

Enter the following:

CLI
1
2
3
cd 3a_python_package

pip install .

If everything is set up correctly, pip installs the distributions package into the workspace. You can then start the Python interpreter from the terminal by entering:

CLI
1
python

Then, within the Python interpreter, you can use the distributions package by entering the following:

Python
1
2
3
4
5
6
7
from distributions import Gaussian

gaussian_one = Gaussian(25, 2)

gaussian_one.mean

gaussian_one + gaussian_one

In other words, you can import and use the Gaussian class because the distributions package is now officially installed as part of your Python installation.

If you get stuck, there’s a solution provided in the Supporting materials section called 3b_answer_python_package .

If you want to install the Python package locally on your computer, you might want to set up a virtual environment first. A virtual environment is a silo-ed Python installation apart from your main Python installation. That way you can easily delete the virtual environment without affecting your Python installation.

If you want to try using virtual environments in this workspace first, follow these instructions:

  1. There is an issue with the Ubuntu operating system and Python3, in which the venv package isn’t installed correctly. In the workspace, one way to fix this is by running this command in the workspace terminal: conda update python. For more information, see venv doesn’t create activate script python3. Then, enter y when prompted. It might take a few minutes for the workspace to update. If you are not using Anaconda on your local computer, you can skip this first step.
  2. Enter the following command to create a virtual environment: python -m venv [venv_name] where venv_name is the name you want to give to your virtual environment. You’ll see a new folder appear with the Python installation named venv_name.
  3. In the terminal, enter source venv_name/bin/activate. You’ll notice that the command line now shows (venv_name) at the beginning of the line to indicate you are using the venv_name virtual environment.
  4. Enter pip install python_package/. That should install your distributions Python package.
  5. Try using the package in a program to see if everything works!

Supporting Materials


Binomial class

Binomial class


Binomial class exercise

In the following video, you’ll get an overview of the binomial class exercise.


Exercise: Binomial class

Exercise: Binomial class

In this exercise, you’ll extend the distributions package with a new class called Binomial.

In the Supporting materials section of this page, there is a .zip file called called 4a_binomial_package.zip. Download and unzip this file.

Inside the folder called 4a_binomial_package, there is another folder and these files:

  • distributions, which contains the code for the distributions package including Gaussiandistribution.py and Generaldistribution.py code.
  • setup.py, a file needed for building Python packages with pip.
  • test.py unit tests to help you debug your code.
  • numbers.txt and numbers_binomial.txt, which are data files used as part of the unit tests.
  • Binomialdistribution.py and Binomialdistribution_challenge.py. Choose one of these files for completing the exercise. Binomialdistribution.py includes more of the code already set up for you. In Binomialdistribution_challenge.py, you’ll have to write all of the code from scratch. Both files contain instructions with TODOS to fill out.

In these files, you only need to change the following:

  • __init__.py, inside the distributions folder. You need to import the binomial package.
  • Either Binomialdistribution.py or Binomialdistribution_challenge.py You also need to put your Binomialdistribution.py file into the distributions folder.

When you’re ready to test out your code, follow these steps:

  1. pip install your distributions package. In the terminal, make sure you are in the 4a_binomial_package directory. If not, navigate there by entering the following at the command line:
CLI
1
2
cd 4a_binomial_package
pip install .
  1. Run the unit tests. Enter the following.
CLI
1
python -m unittest test

Modify the Binomialdistribution.py code until all the unit tests pass.

If you change the code in the distributions folder after pip installing the package, Python will not know about the changes.

When you make changes to the package files, you’ll need to run the following:

CLI
1
pip install --upgrade

In the Supporting materials section of this page, there is also a solution in the 4b_answer_binomial_package. Try not to look at the solution until your code passes all of the unit tests.


Supporting Materials


scikit-learn source code

scikit-learn source code


Contributing to a GitHub project

Use the following resources to learn how to contribute to a GitHub project:


Advanced Python OOP topics

Use the following resouces to learn about more advanced OOP topics that appear in the scikit-learn package:


Putting code on PyPi

Putting code on PyPi


PyPi vs. test PyPi

Note that pypi.org and test.pypy.org are two different websites. You’ll need to register separately at each website. If you only register at pypi.org , you will not be able to upload to the test.pypy.org repository.

Remember that your package name must be unique. If you use a package name that is already taken, you will get an error when trying to upload the package.


Summary of the terminal commands used in the video

CLI
1
2
3
4
5
6
7
8
9
10
11
cd binomial_package_files
python setup.py sdist
pip install twine

# commands to upload to the pypi test repository
twine upload --repository-url https://test.pypi.org/legacy/ dist/*
pip install --index-url https://test.pypi.org/simple/ dsnd-probability

# command to upload to the pypi repository
twine upload dist/*
pip install dsnd-probability

More PyPi resources

This tutorial explains how to distribute Python packages, including more configuration options for your setup.py file. You’ll notice that the Python command to run the setup.py is slightly different, as shown in the following example:

CLI
1
python3 setup.py sdist bdist_wheel

This command still outputs a folder called dist. The difference is that you will get both a .tar.gz file and a .whl file. The .tar.gz file is called a source archive, whereas the .whl file is a built distribution. The .whl file is a newer type of installation file for Python packages. When you pip install a package, pip firsts look for a .whl file (wheel file); if there isn’t one, it looks for the .tar.gz file.

A .tar.gz file (an sdist) contains the files needed to compile and install a Python package. A .whl file (a built distribution) only needs to be copied to the proper place for installation. Behind the scenes, pip installing a .whl file has fewer steps than installing a .tar.gz file.

Other than this command, the rest of the steps for uploading to PyPi are the same.


To learn more about PyPi, see the following resources:


Exercise: Upload to PyPi

Exercise: Upload to PyPi

In this part of the lesson, you’ll practice uploading a package to PyPi.

In the Supporting materials section of this page, there is a zip file called 5_exercise_upload_to_pypi.zip . Download and unzip this file.

The Python package is located in the folder 5_exercise_upload_to_pypi.

You need to create three files:

  • setup.cfg
  • README.md
  • license.txt

You also need to create accounts for the pypi test repository and pypi repository.

Don’t forget to keep your passwords; you’ll need to type them into the command line.

Once you have all the files set up correctly, you can use the following commands on the command line. You need to make the name of the package unique, so change the name of the package from distributions to something else. That means changing the information in setup.py and the folder name.

In the terminal, make sure you are in the 5_exercise_upload_to_pypi directory. If not, navigate there by entering the following at the command line:

CLI
1
2
3
4
5
cd 5_exercise_upload_to_pypi

python setup.py sdist

pip install twine

Commands to upload to the PyPi test repository

CLI
1
2
3
twine upload --repository-url https://test.pypi.org/legacy/ dist/*

pip install --index-url https://test.pypi.org/simple/ distributions

Command to upload to the PyPi repository

CLI
1
2
3
twine upload dist/*

pip install distributions

If you get stuck, rewatch the previous video showing how to upload a package to PyPi.


Supporting Materials


Lesson summary


What we covered in this lesson

  • Classes vs. objects
  • Methods and attributes
  • Magic methods and inheritance
  • Python packages

Web Development

HTML Tutorial

Develop a data dashboard using Flask, Boostrap, Plotly and Pandas


Introduction

### Why should a data scientist learn web development?

In this course, you are going to use Flask to build a data dashboard. You might be thinking that you already have good tools for visualizing data such as matplotlib, seaborn, or Tableau.

However, the web development skills you’ll learn in this lesson will prepare you for building other types of data science applications. Data scientists are increasingly being asked to deploy their work as an application in the cloud.

For example, consider a project where you build a model that classifies disaster relief messages into categories. With your web development skills, you could turn that model into a web app where you would input a message and display the resulting message category.

As another example, consider a system that recommends movies based on a user’s preferences. Part of the recommendation engine could include a web application that displays recommended products based on a userid. What you learn in this course will set you up for building the web app portion of the recommendation engine.


Lesson Overview

### How to Think about This Lesson

The lesson first gives an overview of the three base languages for web development: html, css, and JavaScript. You could take an entire course just on each of these languages. The goal is for you to get comfortable writing at least some code in each language so that you understand the web template files at the end of the lesson. This lesson goes through a lot of information to get you up to speed.

To work with the web template and make a data dashboard, you will only need to write Python code. If you want to customize the dashboard, you can do so with just a few changes to the html code. But the underlying technologies of data dashboard will be css, html, JavaScript, and Python.

Lesson Outline

  • Basics of a web app
    • html
    • css
    • javascript
  • Front-end libraries
    • boostrap
    • plotly
  • Back-end libraries
    • flask
  • Deploy a web app to the cloud

Lesson Files

All of the lesson’s exercises are contained in classroom workspaces. You’ll even deploy a web app from the classroom workspace; however, if you prefer to work locally, you can find the lesson files in this data scientist nanodegree GitHub repo.


The Web


Components of a Web App

Front End: - Content: HTML - Design: CSS - Interactions: JavaScript Back End: - Server - Database

The Front End


Front End: HTML


HTML Document Example

Here is an example of HTML code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<!DOCTYPE html>

<html>

<head>
<title>Page Title</title>
</head>

<body>
<h1>A Photo of a Beautiful Landscape</h1>
<a href="https://www.w3schools.com/tags">HTML tags</a>
<p>Here is the photo</p>
<img src="photo.jpg" alt="Country Landscape">
</body>

</html>

Explanation of the HTML document

As you progress through the lesson, you’ll find that the <head> tag is mostly for housekeeping like specifying the page title and adding meta tags. Meta tags are in essence information about the page that web crawlers see but users do not. The head tag also contains links to javascript and css files, which you’ll see later in the lesson.

The website content goes in the <body> tag. The body tag can contain headers, paragraphs, images, links, forms, lists, and a handful of other tags. Of particular note in this example are the link tag <a> and the image tag <img>.

Both of these tags link to external information outside of the html doc. In the html code above, the link <a> tag links to an external website called w3schools. The href is called an attribute, and in this case href specifies the link.

The image <img> tag displays an image called “photo.jpg”. In this case, the jpg file and the html document are in the same directory, but the documents do not have to be. The src attribute specifies the path to the image file relative to the html document. The alt tag contains text that gets displaced in case the image cannot be found.


Full List of Tags and How to Use Them

This is a link to one of the best references for html. Use this website to look up html tags and how to use them. W3Schools HTML Tags

In fact, the W3Schools website has a lot of free information about web development syntax.

Checking your HTML

It’s a good idea to check the validity of your HTML. Here is a website that checks your HTML for syntax errors: W3C Validator. Try pasting your HTML code here and running the validator. You can read through the error messages and fix your HTML.


Exercise: HTML

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<!DOCTYPE html>
<html lang="en-US">

<head>
<!-- TODO: Add a title tag and use the title 'My Udacity Task List' -->
<title>My Udacity Task List</title>
</head>

<body>
<p style="color:red">All instructions are included in the index.html file</p>
<!-- TODO: Add a header tag, h1. The h1 should say "Today's TODO list' -->
<h1>Today's TODO list</h1>
<!-- TODO: Notice that the workspace folder contains the Udacity logo in a file called udacity_logo.png. Insert the image here -->
<img src="udacity_logo.png" alt="Udacity Logo">
<!-- TODO: Use a link tag to link to the Udacity website https://www.udacity.com --Make sure to add text in-between the opening and closing tags.-->
<p><a href="https://www.udacity.com">Udacity website</a></p>
<!-- TODO: Use a paragraph tag. Inside the paragraph tag, introduce yourself -->
<p>Hi, my name is Zacks. Here are my tasks in this week:</p>
<!-- TODO: Make an unordered list containing at least three items that you plan to do this week to progress in the nanodgree. Look up the syntax for unordered lists if you're not sure how to do this. -->
<ul>
<li>Learn HTML</li>
<li>Learn CSS</li>
<li>Learn JavaScript</li>
</ul>
<!-- TODO: Get creative and add anything else you would like to add. The W3Schools website has a wealth of information about html tags. See: https://www.w3schools.com/tags -->
<p>
<a href="https://www.linkedin.com/in/zacks-shen/">
<img src="https://raw.githubusercontent.com/ZacksAmber/PicGo/master/img/20211109001011.jpeg" alt="My LinkedIn" width="400" height="300">
</a>
</p>
</body>

</html>

Div and Span


Summary of Div and Span Elements

You can use div elements to split off large chunks of html into sections. Span elements, on the other hand, are for small chunks of html. You generally use span elements in the middle of a piece of text in order to apply a specific style to that text. You’ll see how this works a bit later in the CSS portion of the lesson.

1
2
3
<div>
<p>This is an example of when to use a div elements versus a span element. A span element goes around <span>a small chunk of html</span></p>
</div>

IDs and Classes

1
2
3
4
5
6
7
8
9
<div id="top">
<p class="first_paragraph">First paragraph of the section</p>
<p class="second_paragraph">Second paragraph of the section</p>
</div>

<div id="bottom">
<p class="first_paragraph">First paragraph of the section</p>
<p class="second_paragraph">Second paragraph of the section</p>
</div>

Exercise: HTML Div, Span, IDs, Classes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
<!DOCTYPE html>

<html lang="en-US">

<head>
<title>Udacity Task List</title>
<!-- Ignore the following link to the css stylesheet. You will learn more about this in the next exercise. -->
<link rel="stylesheet" type="text/css" href="style.css">
</head>

<body>
<h1>Today's TODO list</h1>
<img src="udacity_logo.png" alt="Udacity Logo">
<!-- TODO: add an id to the Udacity link tag -->
<a id="main-link" href="https://www.udacity.com">Udacity</a>
<!-- NOTE - Adding id and class attributes to HTML does not change the appearance of the page. The changes made in this exercise affect how the page is displayed because this page has been linked to a style sheet. You'll be learning more about that shortly. -->
<!-- TODO: Wrap the following paragraphs and list with a
div tag. Add an id to the div tag called main-content -->
<!-- TODO: add a class to the the next two paragraphs
after this comment. Call the class bold-paragraph -->
<div id="main-content">
<p class="bold-paragraph">Hi, my name is Andrew.</p>
<p class="bold-paragraph">I am a Udacity student from Los Angeles, California</p>
<!-- TODO: add a span around the terms data scientist after this comment. Add a class attribute called green-text -->
<p>I'm currently studying for the <span class="green-text">data scientist</span> nanodegree program</p>
<p>These are my tasks:</p>
<ul>
<li>Watch ten videos</li>
<li>Answer all the quizzes</li>
<li>Work on the project for 2 hours</li>
</ul>
<p>Here is a table of the tasks that I've completed this week</p>
</div>
<table>
<caption>Breakdown of Tasks Completed</caption>
<tr>
<th>Day</th>
<th>Tasks Completed</th>
</tr>
<tr>
<td>Monday</td>
<td>Completed five concepts</td>
</tr>
<tr>
<td>Tuesday</td>
<td>Did three quizzes</td>
</tr>
</table>

<br>
<nav>
<a href="https://www.w3schools.com/html/">HTML</a> |
<a href="https://www.w3schools.com/w3css/">CSS</a> |
<a href="https://www.w3schools.com/js/default.asp">JavaScript</a> |
<a href="https://www.w3schools.com/Jquery/default.asp">jQuery</a>
</nav>
</body>

</html>

Front End: CSS


CSS and this Lesson

To build the data dashboard at the end of this lesson, you won’t need to actually write any CSS. Instead, you’ll use libraries that take care of the CSS for you. In this that, that would be the Bootstrap library.

But if you are interested in understanding what Bootstrap is doing under the hood, then you need to understand how to style a website with CSS. This page has a summary of some important aspects of CSS programming.


What is the Purpose of CSS?

In most professional websites, css is kept in a separate stylesheet. This makes it easier to separate content (html) from style (css). Code becomes easier to read and maintain.

If you’re interested in the history of css and how it came about, here is an interesting link: history of css.

CSS stands for cascading style sheets. The “cascading” refers to how rules trickle down to the various layers of an html tree. For example, you might specify that all paragraphs have the same font type. But then you want to override one of the paragraphs to have a different font type. How does a browser decide which rules apply when there is a conflict? That’s based on the cascade over. You can read more about that here.


Different ways to write CSS

As discussed in the video, there are essentially two ways to write CSS: inline or with a stylesheet.

Inline means that you specify the CSS directly inside of an html tag like so:

1
<p style="font-size:20px;">This is a paragraph</p>

Alternatively, you can put the CSS in a stylesheet. The stylesheet can go underneath an html head tag like so:

1
2
3
4
5
6
...
<head>
<style>
p {font-size: 20px;}
</style>
</head>

Or the css can go into its own separate css file (extension .css). Then you can link to the css file within the html head tag like so:

1
2
3
<head>
<link rel="stylesheet" type"text/css" href="style.css">
</head>

where style.css is the path to the style.css file. Inside the style.css file would be the style rules such as

1
2
3
4
p {
color:red;
}


CSS Rules and Syntax

CSS is essentially a set of rules that you can use to stylize html. The W3 Schools CSS Website is a good place to find all the different rules you can use. These including styling text, links, margins, padding, image, icons and background colors among other options.

The general syntax is that you:

  1. select the html element, id, and/or class of interest
  2. specify what you want to change about the element
  3. specify a value, followed by a semi-colon

For example

1
2
3
a {
text-decoration:none;
}

where a is the element of interest, text-decoration is what you want to change, and none is the value. You can write multiple rules within one set of brackets like:

1
2
3
4
5
a {
text-decoration:none;
color:blue;
font-weight:bold;
}

You can also select elements by their class or id.

To select by class name, you use a dot like so:

1
2
3
.class_name {
color: red;
}

To select by id name, you use the pound sign:

1
2
3
#id_name {
color: red;
}

You can make more complex selections as well like “select paragraphs inside the div with id “div_top” . If your html looks like this,

1
2
3
<div id="div_top">
<p>This is a paragraph</p>
</div>

then the CSS would be like this:

1
2
3
div#div_top p {
color: red;
}

Margins and Padding

The difference between margin and padding is a bit tricky. Margin rules specify a spatial buffer on the outside of an element. Padding specifies an internal spatial buffer.

These examples below show how this works. They use a div element with a border. Here is the div without any margin or padding:

1
2
3
<div style="border:solid red 1px;">
Box
</div>
Box

Margin

In this case, the div has a margin of 40 pixels. This creates a spatial buffer on the outside of the div element.

1
2
3
<div style="border:solid red 1px;margin:40px;">
Box
</div>
Box

Padding

This next case has a padding of 40px. In the case of padding, the spatial buffer is internal.

1
2
3
<div style="border:solid red 1px;padding:40px;">
Box
</div>
Box

Margin and Padding

In this case, the div element has both a margin of 40 pixels and a padding of 40 pixels.

1
2
3
<div style="border:solid red 1px;margin:40px;padding:40px;">
Box
</div>
Box

Specifying Size: Pixels versus Percent versus EM Units

In CSS there are various ways to define sizes, widths, and heights. The three main ones are pixels, percentages, and em units.

When you use px, you’re defining the exact number of pixels an element should use in terms of size. So

1
<p style="font-size: 12px;">

means the font-size will be exactly 12 pixels.

The percent and em units have a similar function. They dynamically change sizing based on a browser’s default values. For example

1
<p style="font-size: 100%"> 

means to use the default browser font size. 150% would be 1.5 times the default font size. 50% would be half. Similarly, 1em unit would be 1 x default_font. So 2em would be 2 x default font, etc. The advantage of using percents and em is that your web pages become dynamic. The document adapts to the default settings of whatever device someone is using be that a desktop, laptop or mobile phone.

As an aside, percentages and em units are actually calculating sizes relative to parent elements in the html tree. For example, if you specify a font size in a body tag , then the percentages will be relative to the body element:

1
2
3
4
5
<body style="font-size: 20px">
<p style="font-size:80%">This is a paragraph</p>
...
</body>

This is a paragraph

...

Because different browsers might render html and CSS differently, there isn’t necessarily a right or wrong way to specify sizes. This will depend on who will use your website and on what type of devices. You can read more here. You won’t need to worry about all of this because in the web app, you’re going to use a CSS framework that takes care of all of this for you.


Exercise: CSS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
<!DOCTYPE html>

<html lang="en-US">
<head>
<title>Udacity Task List</title>
<link rel="stylesheet" type="text/css" href="style.css">
<!-- TODO: include a link here to the css style sheet title style.css. You'll need to use the link tag with the ref and href attributes. Then open the style.css file and follow the instructions there -->
</head>
<body>
<h1>Today's TODO list</h1>
<img src="udacity_logo.png" alt="Udacity Logo">
<a id="main-link" href="https://www.udacity.com">Udacity</a>
<div id="main-content">
<p class="bold-paragraph">Hi, my name is Andrew.</p>
<p class="bold-paragraph">I am a Udacity student from Los Angeles, California</p>
<p>I'm currently studying for the data scientist nanodegree program</p>
<p>These are my tasks:</p>
<ul>
<li>Watch ten videos</li>
<li>Answer all the quizzes</li>
<li>Work on the project for 2 hours</li>
</ul>
<p>Here is a table of the tasks that I've completed this week</p>
</div>
<table>
<caption>Breakdown of Tasks Completed</caption>
<tr>
<th>Day</th>
<th>Tasks Completed</th>
</tr>
<tr>
<td>Monday</td>
<td>Completed five concepts</td>
</tr>
<tr>
<td>Tuesday</td>
<td>Did three quizzes</td>
</table>
<br>
<nav>
<a href="https://www.w3schools.com/html/">HTML</a> |
<a href="https://www.w3schools.com/w3css/">CSS</a> |
<a href="https://www.w3schools.com/js/default.asp">JavaScript</a> |
<a href="https://www.w3schools.com/Jquery/default.asp">jQuery</a>
</nav>
</body>

</html>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
/* TODO:
- add a left margin of 25px and a right margin of 25px to the body tag */
body {
margin-left: 25px;
margin-right: 25px;
}

/* TODO: h1 header
- change the h1 header to all capital letters
- add a top and bottom margin of 20px
hint: https://www.w3schools.com/cssref/pr_text_text-transform.asp*/
h1 {
text-transform: uppercase;
margin-top: 20px;
margin-botton: 20px;
}
/* TODO: img
- make the Udacity logo only half the width of the screen
hint: https://www.w3schools.com/css/css_dimension.asp
*/
img {width: 50%}
/* TODO: Udacity link
- make the Udacity link on its own line instead of next to the image
- give the link a top and bottom margin of 20px
- remove the underline
- increase the font to 45px
- change the font color to gray
hint: the block value might be of interest
https://www.w3schools.com/cssref/pr_class_display.asp
hint: make sure to specify the Udacity link using the id; otherwise all links will be styled like the Udacity link
*/
a#main-link {
display: block;
margin-top: 20px;
margin-botton: 20px;
text-decoration: none;
font-size: 45px;
color: gray;
}

/* TODO: Div main-content id
- change the font of all elements inside the #main-content div to helvetica
hint: https://www.w3schools.com/cssref/pr_font_font-family.asp
*/
div#main-content {
font-family: helvetica;
}

/* TODO: bold-paragraph class
- for the paragraphs with the bold-paragraph class, make the text bold
*/
p.bold-paragraph {
font-weight: bold;
}

/* TODO: table
- draw a black border around the td elements in the table
hint: https://www.w3schools.com/css/css_border.asp
*/
td {
border: solid black 1px;
}

Front End: Bootstrap Library


Documentation References

Here are some key parts of the Bootstrap documentation for your reference:


Why Bootstrap?

Bootstrap is one of the easier front-end frameworks to work with. Bootstrap eliminates the need to write CSS or JavaScript. Instead, you can style your websites with HTML. You’ll be able to design sleek, modern looking websites more quickly than if you were coding the CSS and JavaScript directly.


Exercise: Bootstrap

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
<!doctype html>
<html lang="en">

<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">

<!-- Bootstrap CSS -->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
<!-- TODO: Change the title of the page to Data Dashboard -->
<title>Data Dashboard</title>
</head>

<body>
<!-- TODO: add a navigation bar to the top of the web page
The navigation bar should include
- a link with a navbar-brand class. The link should say World Bank Dashboard and the href attribute should be equal to "#", which means that if somebody clicks on the link, the browser remains on the same page
- a link to the Udacity data science nanodegree website: https://www.udacity.com/course/data-scientist-nanodegree--nd025
- a link to the World Bank data website: https://data.worldbank.org/
- any other links you'd like to add
- align the Udacity and World Bank links to the right side of the navbar (hint: use ml-auto)
HINT: If you get stuck, re-watch the previous video and/or use an example from the documentation on the Bootstrap website: https://getbootstrap.com/docs/4.0/components/navbar/#nav
-->
<nav class="navbar navbar-expand-lg navbar-dark bg-dark">
<a class="navbar-brand" href="#">World Bank Dashboard</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>

<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav ml-auto">
<li class="nav-item active">
<a class="nav-link" href="https://www.udacity.com/course/data-scientist-nanodegree--nd025">Udacity <span class="sr-only">(current)</span></a>
</li>
<li class="nav-item">
<a class="nav-link" href="https://data.worldbank.org/">World Bank</a>
</li>
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
About Author
</a>
<div class="dropdown-menu" aria-labelledby="navbarDropdown">
<a class="dropdown-item" href="https://zacks.one">Blog</a>
<a class="dropdown-item" href="https://www.linkedin.com/in/zacks-shen/">LinkedIn</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="https://github.com/ZacksAmber">GitHub</a>
</div>
</li>
</ul>
<form class="form-inline my-2 my-lg-0">
<input class="form-control mr-sm-2" type="search" placeholder="Search" aria-label="Search">
<button class="btn btn-outline-success my-2 my-sm-0" type="submit">Search</button>
</form>
</div>
</nav>

<!-- TODO: Create a div with the row class. Inside this row, there should be three columns of the following size and in the following order:
- col-1
- col-1
- col-10
-->
<div class="row ml-1 mt-2">
<div class="col-1">
<a href="https://github.com/ZacksAmber">
<img class="img-fluid" src="assets/githublogo.png" alt="github logo">
</a>
</div>
<div class="col-1 border-right">
<a href="https://www.linkedin.com/in/zacks-shen/">
<img class="mb-3 img-fluid" src="assets/linkedinlogo.png" alt="linkedin logo">
</a>
<a href="https://www.instagram.com">
<img class="img-fluid" src="assets/instagram.png" alt="instagram logo">
</div>
<div class="col-10">
<h2>World Bank Data Dashboard</h2>
<h4 class="text-muted">Land Use Visualizations</h4>

<div class="container">
<div class="row mb-3">
<div class="col-4">
<img class="img-fluid" src="assets/plot1.png" alt="chart one">
</div>
<div class="col-4">
<img class="img-fluid" src="assets/plot2.png" alt="chart two">
</div>
<div class="col-4">
<img class="img-fluid" src="assets/plot3.png" alt="chart three">
</div>
</div>
<div class="row">
<div class="col-6">
<img class="img-fluid" src="assets/plot4.png" alt="chart four">
</div>
<div class="col-6">
<img class="img-fluid" src="assets/plot5.png" alt="chart five">
</div>
</div>

</div>
</div>
</div>

<!-- TODO: In the first column, put a link to your github profile and use the github logo image in the asset folder. Make sure to use the img-fluid class in the img tags -->
<!-- TODO: In the second column, put a link to your linkedin profile and your instagram account. Use the images in the asset folder. If you don't have these accounts, then add the links anyway and in the href attribute, put href="#". That tells the browser to do nothing when the link is clicked. Also, add a border to the right side of the column -->
<!-- TODO: In the third column,
add an h2 header that says World Bank Data Dashboard and an h4 header that says Land Use Visualizations. Change the color of the h4 font using the text-muted class. Remove the Hello World Header -->


<!-- TODO: In the third column underneath the h2 and h4 tags, you'll place the five visualizations plot1.png, plot2.png...plot5.png. First, wrap them in a container class.

Put the visualizations in two rows such that the first row contains plot1.png, plot2.png, and plot3.png, spaced evenly into three columns. The second row should contain plot4.png, plot5.png, evenly spaced into three columns. The final result should be:
plot1.png plot2.png plot3.png
plot4.png plot5.png -->

<!-- TODO: Add margins and padding where appropriate. In Bootstrap you can add margins using classes: mr-3 would be margin-right 3. mr-5 would be a larger right margin. You can use mt, mb, ml, mr, pt, pb, pl, pr where t, b, l and r stand for top, bottom, left and right.
-->

<!-- TODO: paste your HTML into the W3C html validator. Fix any errors that come up: https://validator.w3.org/#validate_by_input -->

<!-- Optional JavaScript -->
<!-- jQuery first, then Popper.js, then Bootstrap JS -->
<script src="https://code.jquery.com/jquery-3.2.1.slim.min.js" integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
</body>

</html>


Front End: JavaScript


JavaScript and this Lesson

To build the data dashboard at the end of this lesson, you won’t need to write any JavaScript at all. That’s because you’ll use libraries (Bootstrap and Plotly) that take care of the JavaScript for you.

You won’t need to get into the details of JavaScript syntax, but it’s good to have at least an idea of what is happening under the hood.


What is JavaScript?

  • JavaScript is a high level language like Python, PHP, Ruby, and C++. It was specifically developed to make the front-end of a web application more dynamic; however, you can also use javascript to program the back-end of a website with the JavaScript runtime environment node.
  • Java and javaScript are two completely different languages that happen to have similar names.
  • JavaScript syntax, especially for front-end web development, is a bit tricky. It’s much easier to write front-end JavaScript code using a framework such as jQuery.

Basic JavaScript Syntax

Here are a few rules to keep in mind when writing JavaScript:

  • a line of code ends with a semi-colon ;
  • () parenthesis are used when calling a function much like in Python
  • {} curly braces surround large chunks of code or are used when initializing dictionaries
  • [] square brackets are used for accessing values from arrays or dictionaries much like in Python

Here is an example of a JavaScript function that sums the elements of an array.

1
2
3
4
5
6
7
8
9
function addValues(x) {
var sum_array = 0;
for (var i=0; i < x.length; i++) {
sum_array += x[i];
}
return sum_array;
}

addValues([3,4,5,6]);

What is jQuery?

Jquery is a JavaScript library that makes developing the front-end easier. JavaScript specifically helps with manipulating html elements. The reason we are showing you Jquery is because the Bootstrap library you’ll be using depends on Jquery. But you won’t need to write any Jquery yourself.

Here is a link to the documentation of the core functions in jquery: jQuery API documentation

Jquery came out in 2006. There are newer JavaScript tools out there like React and Angular.

As a data scientist, you probably won’t need to use any of these tools. But if you work in a startup environment, you’ll most likely hear front-end engineers talking about these tools.


jQuery Syntax

The jQuery library simplifies JavaScript quite a bit. Compare the syntax. Compare these two examples from the video for changing the h1 title element when clicking on the image.

This is pure JavaScript code for changing the words in the h1 title element.

1
2
3
4
function headFunction() {
document.getElementsByTagName("h1")[0].innerHTML =
"A Photo of a Breathtaking View";
}

This code searches the html document for all h1 tags, grabs the first h1 tag in the array of h1 tags, and then changes the html. Note that the above code is only the function. You’d also have to add an onClick action in the image html tag like so:

1
<img src="image.jpg" onclick="headFunction()">

The jQuery code is more intuitive. Once the document has loaded, the following code adds an onclick event to the image. Once the image is clicked, the h1 tag’s text is changed.

1
2
3
4
5
 $(document).ready(function(){
$("img").click(function(){
$("h1").text("A Photo of a Breathtaking View");
});
});

The dollar sign $ is jQuery syntax that says “grab this element, class or id”. That part of the syntax should remind you somewhat of CSS. For example $(“p#first”) means find the paragraph with id=”first”. Or $(“#first”) would work as well.

Javascript has something called callback function, which can make learning javascript a bit tricky. Callback functions are essentially functions that can be inputs into other functions. In the above code, there is the ready() function that waits for the html document to load. Then there is another function being passed into the ready function. This section function adds an on-click event to an image tag. Then there’s another function passed into the click() function, which changes the h1 text.


Exercise: JavaScript

In the next exercise, you’ll write a bit of jQuery just so that you can see how it works and what it does. This is the only time in the lesson you’ll actually write any JavaScript.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
/*
TODO: Currently, the opacity of the ul element is set to zero.
You can see this in the style.css file. Hence, the tasks are not showing up.

Write jQuery code that does the following:
- when clicking on the word "tasks" in the sentence, "these are my tasks", fade in the ul element
- hint: if you look at the html, the word tasks is surrounded by a span element with an id called "fade_in_tasks".
- hint: https://api.jquery.com/id-selector/
- hint: https://api.jquery.com/click/
- hint: http://api.jquery.com/fadein/
- hint: Don't forget to write code that waits for the html document to load. Re-watch the javascript screencast if you're stuck.
*/

$(document).ready(function() {
$("#fade_in_tasks").click(function() {
$("ul").fadeIn("slow");
});
});

Front End: Plotly

Python Plotly
Plotly JavaScript Open Source Graphing Library


Chart Libraries

There are many web chart libraries out there for all types of use cases. When choosing a library, you should consider checking whether or not the library is still being actively developed.

d3.js is one of the most popular (and complex!) javascript data visualization libraries. This library is still actively being developed, which you can tell because the latest commit to the d3 GitHub repository is fairly recent.

Other options include chart.js, Google Charts, and nvd3.js, which is built on top of d3.js


Why Plotly

For this lesson, we’ve chosen plotly for a specific reason: Plotly, although a private company, provides open source libraries for both JavaScript and Python.

Because the web app you’re developing will have a Python back-end, you can use the Python library to create your charts. Rather than having you learn more JavaScript syntax, you can use the Python syntax that you already know. However, you haven’t built a back-end yet, so for now, you’ll see the basics of how Plotly works using the JavaScript library. The syntax between the Python and Javascript versions is similar.

Later in the lesson, you’ll switch to the Python version of the Plotly library so that you can prepare visualizations on the back-end of your web app. Yet you could write all the visualization code in JavaScript if you wanted to. Watch the screencast below to learn the basics of how Plotly works, and then continue on to the Plotly exercise.

Here are a few links to some helpful parts of the plotly documentation:


Exercise: Plotly

plot1.js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
var year = [1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015];

var arable_land_brazil = [30.924583699244103,
30.990028882024003,
31.0554740648039,
31.1207996037396,
31.198209170939897,
31.275618738140302,
31.521965413357503,
31.8094695709811,
32.1206632097572,
32.558918611078504,
32.5948835506464,
32.6369263974999,
32.4998145520415,
32.7225913899504,
32.7273771437186,
32.7181645677148,
32.9466843101456,
32.974680969689395,
33.3576728793727,
33.8100342899258,
33.8100342899258];

var country_name_brazil = 'Brazil';

var trace1 = {
/* TODO: Use the year, arable_land_brazil, and country_name_brazil to create a trace for a line chart */
x: year,
y: arable_land_brazil,
mode: 'lines',
type: 'scatter',
name: country_name_brazil
};

var arable_land_germany = [49.67917502148379,
49.6634105817984,
49.6404526572124,
49.776517105037,
49.1489483638031,
48.912451640636206,
48.822012037833204,
48.6355558103537,
48.7400017201342,
48.7799982796686,
48.8330083725198,
48.5948612066988,
48.61330197608051,
48.535696870607794,
48.4380826711798,
47.9100324181656,
47.9659169153087,
47.8108681930338,
47.8588626461821,
47.9363714531384,
47.9592041483809];

var country_name_germany = 'Germany';
var trace2 = {
/* TODO: Create another trace for the Germany data using a line chart */
x: year,
y: arable_land_germany,
mode: 'lines',
type: 'scatter',
name: country_name_germany
};


var arable_land_china = [55.6902039146848,
55.6944173715386,
55.7435214092539,
55.7808021320313,
55.7222181390954,
55.601913887829596,
55.3795417237072,
55.2323417623281,
54.9767049909297,
55.0086611269185,
55.115181785736894,
54.763679479991296,
54.810017687289296,
54.80799387248529,
54.8084187711588,
54.8080992214598,
54.8084187711588,
54.8084187711588,
54.8084187711588,
54.8084187711588,
56.2229587724434];
var country_name_china = 'China';
var trace3 = {
/* TODO: Create another trace for the China data using a line chart */
x: year,
y: arable_land_china,
mode: 'lines',
type: 'scatter',
name: country_name_china
};


var layout = {
title:'Percent of Land Used for Agriculture <br> 1990-2015',
};

var data = [trace1, trace2, trace3];

Plotly.newPlot('plot1', data, layout);

plot2.js

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
var year = [2015];
var arable_land_brazil = [33.8100342899258];
var country_name_brazil = 'Brazil';

var arable_land_germany = [47.9592041483809];
var country_name_germany = 'Germany';

var arable_land_china = [56.2229587724434];
var country_name_china = 'China';

var trace1 = {
/* TODO: Use the country name and arable land data to make a bar chart */
x: [country_name_brazil, country_name_germany, country_name_china],
y: [arable_land_brazil[0], arable_land_germany[0], arable_land_china[0]],
type: 'bar'
};

var layout = {
/* TODO: include a title for the chart */
title: 'Percent of Land Area Used for <br> Agriculture in 2015'
};

var data = [trace1];

Plotly.newPlot('plot2', data, layout);


The Backend


In this next part of the lesson, you’ll build a backend using Flask. Because Flask is written in Python, you can use any Python library in your backend including pandas and scikit-learn.

In this part of the lesson, you’ll practice

  • setting up the backend
  • linking the backend and the frontend together
  • deploying the app to a server so that the app is available from a web address

What is Flask?

Flask. A web framework takes care of all the routing needed to organize a web page so that you don’t have to write the code yourself!

When you type “http://www.udacity.com” into a browser, your computer sends out a request to another computer (ie the server) where the Udacity website is stored. Then the Udacity server sends you the files needed to render the website in your browser. The Udacity computer is called a server because it “serves” you the files that you requested.

The HTTP part of the web address stands for Hypter-text Transfer Protocol. HTTP defines a standard way of sending and receiving messages over the internet.

When you hit enter in your browser, your computer says “get me the files for the web page at www.udacity.com”: except that message is sent to the server with the syntax governed by HTTP. Then the server sends out the files via the protocol as well.

There needs to be some software on the server that can interpret these HTTP requests and send out the correct files. That’s where a web framework like Flask comes into play. A framework abstracts the code for receiving requests as well as interpreting the requests and sending out the correct files.


Why Flask?

  • First and foremost, you’ll be working with Flask because it is written in Python. You won’t need to learn a new programming language.
  • Flask is also a relatively simple framework, so it’s good for making a small web app.
  • Because Flask is written in Python, you can use Flask with any other Python library including pandas, numpy and scikit-learn. In this lesson, you’ll be deploying a data dashboard and pandas will help get the data ready.

Continue on to start building the backend.


Backend: Flask


Using Flask in the Classroom Workspace

In the next part of the lesson, you’ll see a classroom workspace. The classroom workspace already has Flask set up for you. So for now, all you need to do to run the Flask app is to open a Terminal and type.

1
python worldbank.py

That assumes you are in the default workspace directory within Terminal. That will get the server running.


Seeing your App in the Workspace

Once the server is running, open a new terminal window and type

1
env | grep WORK

This command will return the Linux environmental variables that contain information about your classroom workspace. The env command will list all the environmental variables. The | symbol is a pipe for sending output from one command to another. The grep command searches text, so grep WORK will search for any text containing the word WORK.

The command should return two variables:

1
2
WORKSPACEDOMAIN=udacity-student-workspaces.com
WORKSPACEID=viewc7f3319f2

Your WORKSPACEID variable will be different but the WORKSPACEDOMAIN should be the same. Now, open a new web browser window, and type the following in the address bar:

1
http://WORKSPACEID-3001.WORKSPACEDOMAIN

In this example, that would be: https://viewc7f3319f2-3001.udacity-student-workspaces.com/

DON’T FORGET TO INCLUDE -3001. You should be able to see the web app. The number 3001 represents the port for accessing your web app.


Creating New Pages

To create a new web page, you first need to specify the route in the routes.py as well as the name of the html template.

1
2
3
@app.route('/new-route')
def render_the_route():
return render_template('new_route.html')

The route name, function name, and template name do not have to match; however, it’s good practice to make them similar so that the code is easier to follow.

The new_route.html file must go in the templates folder. Flask automatically looks for html files in the templates folder.


What is @app.route?

Python Decorators

To use Flask, you don’t necessarily need to know what @app.route is doing. You only have to remember that the path you place inside of @app.route() will be the web address. And then the function you write below @app.route is used to render the correct html template file for the web address.

In Python, the @ symbol is used for decorators. Decorators are a shorthand way to input a function into another function. Take a look at this code. Python allows you to use a function as an input to another function:

1
2
3
4
5
6
7
8
9
def decorator(input_function):

return input_function

def input_function():
print("I am an input function")

decorator_example = decorator(input_function)
decorator_example()

Running this code will print the string:

I am an input function

Decorators provide a short-hand way of getting the same behavior:

1
2
3
4
5
6
7
8
9
def decorator(input_function):
print("Decorator function")
return input_function

@decorator
def input_function():
print("I am an input function")

input_function()

This code will print out:

Decorator function
I am an input function

Instead of using a decorator function, you could get the same behavior with the following code:

1
2
input_function = decorator(input_function)
input_function()

Because @app.route() has the . symbol, there’s an implication that app is a class (or an instance of a class) and route is a method of that class. Hence a function written underneath @app.route() is going to get passed into the route method. The purpose of @app.route() is to make sure the correct web address gets associated with the correct html template. This code

1
2
3
@app.route('/homepage')
def some_function()
return render_template('index.html')

is ensuring that the web address ‘[www.website.com/homepage`](http://www.website.com/homepage%60) is associated with the index.html template.

If you’d like to know more details about decorators and how @app.route() works, check out these tutorials:


Exercise: Flask

File: /home/workspace/1_flask_exercise/worldbankapp/templates/new_route.html

1
2
3
4
5
6
7
8
9
10
11
<!doctype html>

<html>
<head>
<title>New Route Page</title>
</head>

<body>
<h1>The new_route.html page</h1>
</body>
</html>

File: /home/workspace/1_flask_exercise/worldbankapp/routes.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from worldbankapp import app

from flask import render_template

@app.route('/')
@app.route('/index')
def index():
return render_template('index.html')

@app.route('/project-one')
def project_one():
return render_template('project_one.html')

# TODO: Add another route. You can use any names you want
# Then go into the templates folder and add an html file that matches the file name you put in the render_template method. You can create a new file by going to the + sign at the top of the workspace and clicking on Create New File. Make sure to place the new html file in the templates folder.
@app.route('/new-route')
def render_the_route():
return render_template('new_route.html')

# TODO: Start the web app per the instructions in the instructions.md file and make sure your new html file renders correctly.


Backend: Flask + Pandas


Code from the Screencast

Here is the code from the routes.py file before refactoring.

The data set comes from this link at the World Bank’s data repository: link to dataset

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
from worldbankapp import app
from flask import render_template
import pandas as pd

df = pd.read_csv('data/API_SP.RUR.TOTL.ZS_DS2_en_csv_v2_9948275.csv', skiprows=4)

# Filter for 1990 and 2015, top 10 economies
df = df[['Country Name','1990', '2015']]
countrylist = ['United States', 'China', 'Japan', 'Germany', 'United Kingdom', 'India', 'France', 'Brazil', 'Italy', 'Canada']
df = df[df['Country Name'].isin(countrylist)]

# melt year columns and convert year to date time
df_melt = df.melt(id_vars='Country Name', value_vars = ['1990', '2015'])
df_melt.columns = ['country','year', 'variable']
df_melt['year'] = df_melt['year'].astype('datetime64[ns]').dt.year

# add column names
df_melt.columns = ['country', 'year', 'percentrural']

# prepare data into x, y lists for plotting
df_melt.sort_values('percentrural', ascending=False, inplace=True)

data = []
for country in countrylist:
x_val = df_melt[df_melt['country'] == country].year.tolist()
y_val = df_melt[df_melt['country'] == country].percentrural.tolist()
data.append((country, x_val, y_val))
print(country, x_val, y_val)

@app.route('/')
@app.route('/index')
def index():
return render_template('index.html')

@app.route('/project-one')
def project_one():
return render_template('project_one.html')

Exercise

The next exercise will be after the section on using Plotly, Pandas, and Flask together. For now, the next part of the lesson has the refactored code shown in this screencast so that you can explore it in more detail. You’ll find it in the 2_flask+pandas_example folder.


Backend: Flask + Plotly + Pandas Part 1

In this next video, you’ll see an example of how to pass data from the back end to the front end of the web app. In the next four parts of this lesson, you’ll get a sense for how data and Plotly code can be taken from the back end and, sent to the front end, and then used to render plots on the front end. The goal of these next few videos is to show you how the web template works, which you’ll be using later in the final exercise.


Summary Part 1

The purpose of this section is to give you an idea of how the final web app works in terms of passing information back and forth between the back end and front end. The web template you’ll be using at the end of the lesson will already provide the code for sharing information between the back and front ends. Your task will be to wrangle data and set up the plotly visualizations using Python. But it’s important to get a sense for how the web app works.

In the video above, the data set was sent from the back end to the front end. This was accomplished by including a variable in the render_template() function like so:

1
2
3
4
5
6
data = data_wrangling()

@app.route('/')
@app.route('/index')
def index():
return render_template('index.html', data_set = data)

What this code does is to first load the data using the data_wrangling function from wrangling.py. This data gets stored in a variable called data.

In render_template, that data is sent to the front end via a variable called data_set. Now the data is available to the front_end in the data_set variable.

In the index.html file, you can access the data_set variable using the following syntax:

1
{{ data_set }}

You can do this because Flask comes with a template engine called Jinja. Jinja also allows you to put control flow statements in your html using the following syntax:

1
2
3
{% for tuple in data_set %}
<p>{{tuple}}</p>
{% end_for %}

The logic is:

  1. Wrangle data in a file (aka Python module). In this case, the file is called wrangling.py. The wrangling.py has a function that returns the clean data.
  2. Execute this function in routes.py to get the data in routes.py
  3. Pass the data to the front-end (index.html file) using the render_template method.
  4. Inside of index.html, you can access the data variable with the squiggly bracket syntax {{ }}

Next

In the next part, you’ll see how to create a Plotly visualization on the back end and then send the visualization code to the front end for rendering.


Backend: Flask + Plotly + Pandas Part 2

In this section, you’ll see how to create a Plotly visualization on the back end and then send the information to the front end for rendering.


Summary Part 2

In the second part, a Plotly visualization was set up on the back-end inside the routes.py file using Plotly’s Python library. The Python plotly code is a dictionary of dictionaries. The Python dictionary is then converted to a JSON format and sent to the front-end via the render_templates method.

Simultaneously a list of ids are created for the plots. This information is also sent to the front-end using the render_template() method.

On the front-end, the ids and visualization code (JSON code) is then used with the Plotly javascript library to render the plots.

In summary:

  1. Python is used to set up a Plotly visualization
  2. An id is created associated with each visualization
  3. The Python Plotly code is converted to JSON
  4. The ids and JSON are sent to the front end (index.html).
  5. The front end then uses the ids, JSON, and JavaScript Plotly library to render the plots.

JavaScript or Python

You could actually do all of this with only JavaScript. You would read the data, wrangle the data, and then create the plots all using JavaScript; however, to do all of this in JavaScript, you’d need to learn more about JavaScript programming. Instead, you can use the pandas and Python skills you already have to wrangle data on the back-end.


Backend: Flask + Plotly + Pandas Part 3

Here, the screencast video shows how to make more complex visualizations in Plotly. This example shows a line chart containing a unique line for each country in the data set.


Summary Part 3

In part 3, the code iterated through the data set to create a visualization with multiple lines: one for each country.

The original code for a line chart with a single line was:

1
2
3
4
5
6
graph_one = [go.Scatter(
x = data[0][1],
y = data[0][2],
mode = 'lines',
name = country
)]

To make a visualization with multiple lines, graph_one will be a list of line charts. This was accomplished with the following code:

1
2
3
4
5
6
7
8
graph_one = []
for data_tuple in data:
graph_one.append(go.Scatter(
x = data_tuple[1],
y = data_tuple[2],
mode = 'lines',
name = data_tuple[0]
))

Next

In the last section of flask, plotly, and pandas, you’ll see how to add more visualizations to the data dashboard. Then, you’ll see some example code and finally you will practice using flask, plotly, and pandas together.


Backend: Flask + Plotly + Pandas Part 4

In this next section, you’ll see how to add more visualizations in the back end code and then render those visualizations on the front end.


Summary Part 4

In the last part, three more visualizations were added to the wrangling Python module. The wrangling included reading in the data, cleaning the data, and preparing the Plotly code. Each visualization’s code was appended to a list called figures. These visualizations were then imported into the routes.py file. This figures list was sent from the back end to the front end via the render_template method. A list of ids were also sent from the back end to the front end.

Then on the front end (index.html), a div was created for each visualization’s id. And with help from the JavaScript Plotly library, each visualization was rendered inside appropriate div.


Beyond a CSV file

Besides storing data in a local csv file (or text, json, etc.), you could also store the data in a database such as a SQL database.

The database could be local to your website meaning that the database file is stored on the same server as your website; alternatively, the database could be stored somewhere else like on a separate database server or with a cloud service like Amazon AWS.

Using a database with your web app goes beyond the scope of this introduction to web development, here are a few resources for using databases with Flask apps:


Next Steps

In the next part of the lesson, you can look at the code and try running the web app from the classroom. Then in the next exercise, you’ll practice adding another visualization to the web app.


Exercise: Flask + Plotly + Pandas

Index.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
<head>

<title>World Bank Data Dashboard</title>

<!--import script files needed from plotly and bootstrap-->
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
<script src="https://code.jquery.com/jquery-3.3.1.min.js" integrity="sha384-tsQFqpEReu7ZLhBV2VZlAu7zcOV+rXbYlF2cqB8txI/8aZajjp4Bqd+V6D5IgvKT" crossorigin="anonymous"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>

</head>

<body>

<!--navbar links-->
<nav class="navbar navbar-expand-lg navbar-dark bg-dark sticky-top">
<a class="navbar-brand" href="#">World Bank Dashboard</a>
<button class="navbar-toggler" type="button" data-toggle="collapse"
data-target="#navbarTogglerDemo02"
aria-controls="navbarTogglerDemo02" aria-expanded="false"
aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>

<div class="collapse navbar-collapse" id="navbarTogglerDemo02">
<ul class="navbar-nav ml-auto mt-2 mt-lg-0">
<li class="nav-item">
<a class="nav-link" href="https://www.udacity.com">Udacity</a>
</li>
<li class="nav-item">
<a class="nav-link" href="https://data.worldbank.org/">World Bank Data</a>
</li>
</ul>
</div>
</nav>

<!--middle section-->
<div class="row">

<!--social media buttons column-->
<div class="col-1 border-right">
<div id="follow-me" class="mt-3">
<a href="#">
<img src="/static/img/linkedinlogo.png" alt="linkedin" class="img-fluid mb-4 ml-2">
</a>
<a href="#">
<img src="/static/img/githublogo.png" alt="github" class="img-fluid ml-2">
</a>
</div>
</div>

<!--visualizations column-->
<div class="col-11">

<!--chart descriptions-->
<div id="middle-info" class="mt-3">

<h2 id="tag-line">World Bank Data Dashboard</h2>
<h4 id="tag-line" class="text-muted">Top 10 World Economies Land Use</h4>

</div>

<!--charts-->
<div id="charts" class="container mt-3 text-center">

<!--top two charts-->
<div class="row">
<div class="col-6">
<div id="{{ids[0]}}"></div>
</div>
<div class="col-6">
<div id="{{ids[1]}}"></div>
</div>
</div>

<!--bottom two charts-->
<div class="row mb-6">
<div class="col-6">
<div id="chart3">
<div id="{{ids[2]}}"></div>
</div>
</div>
<div class="col-6">
<div id="chart4">
<div id="{{ids[3]}}"></div>
</div>
</div>
</div>

<!--TODO: Create another row and place a fifth chart in that row-->
<div class="row mb-12">
<div class="col-6">
<div id="chart4">
<div id="{{ids[4]}}"></div>
</div>
</div>
</div>
</div>
<div>
</div>

<!--footer section-->
<div id="footer" class="container"></div>

</body>


<footer>

<script type="text/javascript">
// plots the figure with id
// id must match the div id above in the html
var figures = {{figuresJSON | safe}};
var ids = {{ids | safe}};
for(var i in figures) {
Plotly.plot(ids[i],
figures[i].data,
figures[i].layout || {});
}
</script>

</footer>


</html>

wrangle_data.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
import pandas as pd
import plotly.graph_objs as go

# TODO: Scroll down to line 157 and set up a fifth visualization for the data dashboard

def cleandata(dataset, keepcolumns = ['Country Name', '1990', '2015'], value_variables = ['1990', '2015']):
"""Clean world bank data for a visualizaiton dashboard

Keeps data range of dates in keep_columns variable and data for the top 10 economies
Reorients the columns into a year, country and value
Saves the results to a csv file

Args:
dataset (str): name of the csv data file

Returns:
None

"""
df = pd.read_csv(dataset, skiprows=4)

# Keep only the columns of interest (years and country name)
df = df[keepcolumns]

top10country = ['United States', 'China', 'Japan', 'Germany', 'United Kingdom', 'India', 'France', 'Brazil', 'Italy', 'Canada']
df = df[df['Country Name'].isin(top10country)]

# melt year columns and convert year to date time
df_melt = df.melt(id_vars='Country Name', value_vars = value_variables)
df_melt.columns = ['country','year', 'variable']
df_melt['year'] = df_melt['year'].astype('datetime64[ns]').dt.year

# output clean csv file
return df_melt

def return_figures():
"""Creates four plotly visualizations

Args:
None

Returns:
list (dict): list containing the four plotly visualizations

"""

# first chart plots arable land from 1990 to 2015 in top 10 economies
# as a line chart

graph_one = []
df = cleandata('data/API_AG.LND.ARBL.HA.PC_DS2_en_csv_v2.csv')
df.columns = ['country','year','hectaresarablelandperperson']
df.sort_values('hectaresarablelandperperson', ascending=False, inplace=True)
countrylist = df.country.unique().tolist()

for country in countrylist:
x_val = df[df['country'] == country].year.tolist()
y_val = df[df['country'] == country].hectaresarablelandperperson.tolist()
graph_one.append(
go.Scatter(
x = x_val,
y = y_val,
mode = 'lines',
name = country
)
)

layout_one = dict(title = 'Change in Hectares Arable Land <br> per Person 1990 to 2015',
xaxis = dict(title = 'Year',
autotick=False, tick0=1990, dtick=25),
yaxis = dict(title = 'Hectares'),
)

# second chart plots ararble land for 2015 as a bar chart
graph_two = []
df = cleandata('data/API_AG.LND.ARBL.HA.PC_DS2_en_csv_v2.csv')
df.columns = ['country','year','hectaresarablelandperperson']
df.sort_values('hectaresarablelandperperson', ascending=False, inplace=True)
df = df[df['year'] == 2015]

graph_two.append(
go.Bar(
x = df.country.tolist(),
y = df.hectaresarablelandperperson.tolist(),
)
)

layout_two = dict(title = 'Hectares Arable Land per Person in 2015',
xaxis = dict(title = 'Country',),
yaxis = dict(title = 'Hectares per person'),
)


# third chart plots percent of population that is rural from 1990 to 2015
graph_three = []
df = cleandata('data/API_SP.RUR.TOTL.ZS_DS2_en_csv_v2_9948275.csv')
df.columns = ['country', 'year', 'percentrural']
df.sort_values('percentrural', ascending=False, inplace=True)
for country in countrylist:
x_val = df[df['country'] == country].year.tolist()
y_val = df[df['country'] == country].percentrural.tolist()
graph_three.append(
go.Scatter(
x = x_val,
y = y_val,
mode = 'lines',
name = country
)
)

layout_three = dict(title = 'Change in Rural Population <br> (Percent of Total Population)',
xaxis = dict(title = 'Year',
autotick=False, tick0=1990, dtick=25),
yaxis = dict(title = 'Percent'),
)

# fourth chart shows rural population vs arable land
graph_four = []

valuevariables = [str(x) for x in range(1995, 2016)]
keepcolumns = [str(x) for x in range(1995, 2016)]
keepcolumns.insert(0, 'Country Name')

df_one = cleandata('data/API_SP.RUR.TOTL_DS2_en_csv_v2_9914824.csv', keepcolumns, valuevariables)
df_two = cleandata('data/API_AG.LND.FRST.K2_DS2_en_csv_v2_9910393.csv', keepcolumns, valuevariables)

df_one.columns = ['country', 'year', 'variable']
df_two.columns = ['country', 'year', 'variable']

df = df_one.merge(df_two, on=['country', 'year'])

for country in countrylist:
x_val = df[df['country'] == country].variable_x.tolist()
y_val = df[df['country'] == country].variable_y.tolist()
year = df[df['country'] == country].year.tolist()
country_label = df[df['country'] == country].country.tolist()

text = []
for country, year in zip(country_label, year):
text.append(str(country) + ' ' + str(year))

graph_four.append(
go.Scatter(
x = x_val,
y = y_val,
mode = 'markers',
text = text,
name = country,
textposition = 'top'
)
)

layout_four = dict(title = 'Rural Population versus <br> Forested Area (Square Km) 1990-2015',
xaxis = dict(title = 'Rural Population'),
yaxis = dict(title = 'Forest Area (square km)'),
)

# TODO: Make a fifth chart from the data in API_SP.RUR.TOTL_DS2_en_csv_v2_9914824.csv
# This csv file contains data about the total rural population for various countries over many years
# Make a bar chart showing the rural population of these countries ['United States', 'China', 'Japan', 'Germany', 'United Kingdom', 'India', 'France', 'Brazil', 'Italy', 'Canada'] in the year 2015.

# HINT: you can use the clean_data() function. You'll need to specify the path to the csv file, and which columns you want to keep. The chart 2 code might help with understanding how to code this.
df_five = cleandata('data/API_SP.RUR.TOTL_DS2_en_csv_v2_9914824.csv', ['Country Name', '2015'], ['2015'])
df_five.columns = ['country','year','ruralpopulation']
df_five.sort_values('ruralpopulation', ascending=False, inplace=True)

# TODO: once the data is clean, make a list called graph_five and append the plotly graph to this list.
graph_five = []
graph_five.append(
go.Bar(
x = df_five.country.tolist(),
y = df_five.ruralpopulation.tolist(),
)
)

# TODO: fill a layout variable for the fifth visualization
layout_five = dict(title = 'Rural Population in 2015',
xaxis = dict(title = 'Country',),
yaxis = dict(title = 'Rural Population'))

# append all charts to the figures list
figures = []
figures.append(dict(data=graph_one, layout=layout_one))
figures.append(dict(data=graph_two, layout=layout_two))
figures.append(dict(data=graph_three, layout=layout_three))
figures.append(dict(data=graph_four, layout=layout_four))

# TODO: append the figure five information to the figures list
figures.append(dict(data=graph_five, layout=layout_five))

return figures

Deployment

Note: In the classroom workspace, do not update the Python, using the conda update python command. Consequently, the pip freeze > requirements.txt command is also not required in the workspace. We will provide you the requirements.txt file containing the bare minimum package list.


Instructions Deploying from the Classroom

Here is the code used in the screencast to get the web app running:

  1. Create a new folder _web_app_, and move all of the application folders and files to the new folder:

    1
    2
    3
    cd 5_deployment/
    mkdir web_app
    mv -t web_app/ data/ worldbankapp/ worldbank.py wrangling_scripts/ requirements.txt runtime.txt
  2. [Applicable only for the Local practice. Not for the workspace.] Create a virtual environment and then activate the environment:

    1
    2
    3
    4
    5
    6
    7
    # Update Python
    conda update python
    # Run the following from the Exercise folder
    # Create a virtual environment
    python3 -m venv worldbankvenv
    # Activate the new environment (Mac/Linux)
    source worldbankenv/bin/activate

    The new environment will automatically come with Python packages meant for data science. In addition, pip install the specific Python packages needed for the web app

    1
    pip install flask==0.12.5 pandas==0.23.3 plotly==2.0.15 gunicorn==19.10.0
  3. Install the Heroku command-line tools. The classroom workspace already has Heroku installed.

    1
    2
    3
    4
    # Verify the installation
    heroku --version
    # Install, if Heroku not present
    curl https://cli-assets.heroku.com/install-ubuntu.sh | sh

    For your local installation, you can refer to the official installation instructions. And then log into heroku with the following command

    1
    heroku login -i

    Heroku asks for your account email address and password, which you type into the terminal and press enter.

  4. The next steps involves some housekeeping:

    • remove app.run() from worldbank.py

    • type cd web_app into the Terminal so that you are inside the folder with your web app code.

      Create a proc file, which tells Heroku what to do when starting your web app:

      1
      touch Procfile

      Then open the Procfile and type:

      1
      web gunicorn worldbank:app
  5. [Applicable only for the Local practice. Not for the workspace.] Create a requirements.txt file, which lists all of the Python packages that your app depends on:

    1
    pip freeze > requirements.txt

    For workspace users, the requirements.txt is already available in the exercise folder. In addition, we have also provided a runtime.txt file in the exercise folder, that declares the exact Python version number to use. Heroku supports these Python runtimes.

  6. Initialize a git repository and make a commit:

    1
    2
    3
    4
    5
    # Run it just once, in the beginning
    git init
    # For the first time commit, you need to configure the git username and email:
    git config --global user.email "you@example.com"
    git config --global user.name "Your Name"

    Whenever you make any changes to your _web_app_ folder contents, you will have to run git add and git commit commands.

    1
    2
    3
    4
    5
    # Every time you make any edits to any file in the web_app folder
    git add .
    # Check which files are ready to be committed
    git status
    git commit -m "your message"
  7. Now, create a Heroku app:

    1
    2
    3
    4
    5
    heroku create my-app-name --buildpack heroku/python
    # For example,
    # heroku create sudkul-web-app --buildpack heroku/python
    # The output will be like:
    # https://sudkul-web-app.herokuapp.com/ | https://git.heroku.com/sudkul-web-app.git

    where my-app-name is a unique name that nobody else on Heroku has already used. You can optionally define the build environment using the option --buildpack heroku/python The heroku create command should create a git repository on Heroku and a web address for accessing your web app. You can check that a remote repository was added to your git repository with the following terminal command:

    1
    git remote -v
  8. Before you finally push your local git repository to the remote Heroku repository, you will need the following environment variables (kind of secrets) to send along:

    1
    2
    3
    4
    5
    # Set any environment variable to pass along with the push
    heroku config:set SLUGIFY_USES_TEXT_UNIDECODE=yes
    heroku config:set AIRFLOW_GPL_UNIDECODE=yes
    # Verify the variables
    heroku config

    If your code uses any confidential variable value, you can use this approach to send those values secretly. These values will not be visible to the public users. Now, push your local repo to the remote Heroku repo:

    1
    2
    3
    # Syntax
    # git push <remote branch name> <local branch name>
    git push heroku main

    Other useful commands are:

    1
    2
    3
    4
    5
    # Clear the build cache
    heroku plugins:install heroku-builds
    heroku builds:cache:purge -a <app-name> --confirm <app-name>
    # Permanently delete the app
    heroku apps:destroy <app-name> --confirm <app-name>

Now, you can type your web app’s address, such as https://sudkul-web-app.herokuapp.com/, in the browser to see the results.


Other Services Besides Heroku

Heroku is just one option of many for deploying a web app, and Heroku is actually owned by Salesforce.com.

The big internet companies offer similar services like Amazon’s Lightsail, Microsoft’s Azure, Google Cloud, and IBM Cloud (formerly IBM Bluemix). However, these services tend to require more configuration. Most of these also come with either a free tier or a limited free tier that expires after a certain amount of time.


Virtual Environments vs. Anaconda

Virtual environments and Anaconda serve a very similar purpose.

  • Anaconda is a distribution of Python (and the analytics language R) specifically for data science. Anaconda comes installed with a package and environment manager called conda.

  • To ensure that your app only installs necessary packages, you should create a virtual Python environment. A virtual Python environment is a separate Python installation on your computer that you can easily remove and won’t interfere with your main Python installation. You can create separate environments using conda. These environments automatically come with default Python packages meant for data science. However, there can be additional packages that you’d want to install in the new environment.

When deploying a web app to a server, you should only include the necessary packages for running your web app. Otherwise, you’d be installing Python packages that you don’t need. We have already provided the package list in the requirements.txt in the workspace above. However, you can create one yourself using the pip freeze > requirements.txt command from the new environment.


Creating a Virtual Environment Locally on Your Computer

You can develop your app using the classroom workspace. If you decide to develop your app locally on your computer, you should set up a virtual environment there as well. Different versions of Python have different ways of setting up virtual environments. The env command allows us to create lightweight virtual environments :

1
2
3
4
5
6
7
# Optional - Update Python installation
conda update python
# Create a virtual environment
python3 -m venv myenv
# Activate the new environment
source myenv/bin/activate
# The new environment will automatically come with Python packages meant for data science.

On Windows, the command is;

1
2
> py -3 -m venv myvenv
> myvenv\Scripts\activate.bat

For more information, read through this link.


Databases for Your App

The web app in this lesson does not need a database. All of the data is stored in CSV files; however, it is possible to include a database as part of a Flask app. One common use case would be to store user login information such as username and password.

Flask is database agnostic meaning Flask can work with a number of different database types. If you are interested in learning about how to include a database as part of a Flask app, here are some resources:


Deployment

In the next part of the lesson, you’ll find a workspace where you can practice deploying the world bank web app. Set up an account on Heroku and then follow the instructions shown in this part of the lesson.

You’ll need to use a different name for the web app since the one used in this lesson is already taken.

Supporting Materials


Lesson Summary


Portfolio Exercise: Deploy a Data Dashboard

Introduction

Portfolio Exercise: Deploy a Data Dashboard

Personal portfolios are an excellent way to demonstrate your knowledge and creativity. In fact, they are little by little becoming a must-have for people working in the tech industry. In this portfolio building exercise, you will create a data dashboard using Bootstrap, Plotly, Flask and Heroku.

Note that a portfolio exercise like this is not reviewed. So you will not submit your work on this, and you do not need to complete this assignment in order to graduate.

Your main job will be to write Python code that reads in data, cleans the data, and then uses the data to make Plotly visualizations. This is your opportunity to show off your Python coding ability and visualization encoding skills.

In the next part of the lesson, you’ll find a workspace where you can develop the web app. Note that there is also an optional advanced version of the project where you’re encouraged to pull data from an API. You’ll see in this lesson that there are a few sections with “[advanced version]” in the title. If you’d like to do the advanced version, then you’ll want to go through this entire lesson before starting to develop your app.


General Instructions

Develop and deploy a data dashboard. The Web Development lesson has all of the information you need. If you are new to web development, you might have to go back to the concepts and rewatch some of the videos. The “deployment” parts of the lesson should be especially helpful. The video in that part of the lesson shows how to deploy a web app to Heroku. And the associated exercise has a complete, functioning web app with visualizations.

Most of the work will involve:

  1. Wrangling your chosen data set to get the data in the format you want
  2. Writing Python code to read in the data set and set up Plotly plots
  3. Tweaking HTML so that the website has the design and information that you want.

We are providing a template that uses the Bootstrap library and Flask framework. The template is the same one used to build the app in the course except the name of the app has been changed. In the template, everything has the generic name “myapp” instead of “worldbankapp”. The template is set up so that you can use pandas for loading the data and Python to create the dictionaries needed for plotly.

You’ll only need to modify the following files:

  • wrangle_data.py
  • index.html

Although the front-end is already set up for you, you should change the links and titles in index.html. If you want to add more visualizations or remove visualizations, you’ll need to adjust the front-end code in index.html accordingly. That will involve adding or removing rows and columns in the HTML file.

For deployment, you can use a back-end service like Heroku.


How to Build the App

You’ll find a workspace in the next part of the lesson. The workspace already contains the template code with a working web app. The web app has a back-end and front-end. Recall that you can run the web app from the workspace:

To run the app from the workspace, open a terminal and type env | grep WORK. Note the WORKSPACEDOMAIN and WORKSPACEID. To start the web app, type python myapp.py.

You can open a new browser window and go to the address: http://WORKSPACESPACEID-3001.WORKSPACEDOMAIN replacing WORKSPACEID and WORKSPACEDOMAIN with your values.

However, there is no data for the visualizations. You’ll need to write a Python script that reads in the data files of your choosing and sets up the plots for Plotly. The process will be exactly the same as the one presented in the web development course.

If you need to upload any files to the workspace, you can do so by clicking on the plus (+) sign and choosing “add file” or “add folder”.

The template code is also available on GitHub as part of the data scientist nanodegree term 2 repo.

Test your app in the workspace to make sure that everything is working. You’ll see that if you start the app without modifying any of the code, the app currently works.

You should also save your work to a GitHub or GitLab repository so that you can use your code as part of your professional portfolio.

Once you’re ready to deploy the app, don’t forget to remove the app.run() line of code in the myapp.py file (In the web development lesson, myapp.py was called worldbank.py). You’ll need to add a Procfile and requirements.txt file as well. Follow the instructions in the web development lesson to learn how to deploy the app from the classroom. And always comment your code :-)!

Also, at the end of this page you’re reading, you’ll find information about a more advanced version of the data dashboard that you can build.


Steps

Here is a reminder of the steps you’ll need to do:

  • find a data set or a few data sets that you’re interested in
  • explore and clean the data set
  • put the data into a csv file or files - you can use pandas or spreadsheet software to do this
  • upload your data sets to the correct folder
  • write a Python script to read in the data set and set up the Plotly visualizations
  • set up a virtual environment and install the necessary libraries to run your app
  • run your web app locally to make sure that everything works
  • deploy the app to Heroku or some other back-end service

Where to Build the Web App

We are providing a workspace containing a web app template. You can use this template to build and deploy your web app within the classroom.

The classroom has an Ubuntu Linux environment. Developing the app locally on macOS should be very similar. On a Windows machine, the commands are slightly different and you’ll need to use the command prompt. This link contains a comparison of MS-DOS vs Linux commands.

To install the Heroku command line interface on a Windows machine, follow the instructions here on the Heroku website.


Advanced Version of the Exercise

If you’d like an extra challenge, consider using an API to obtain your data. API stands for Application Programming Interface. An API provides a convenient way for two applications to communicate with each other. To be more concrete you can pull data directly from the World Bank API, clean the data in the back-end using pandas, and then display the results on your front-end. This would be instead of using a csv file for your data.

The benefit is that if the data ever changes, your web app will automatically have the correct data. Many companies provide APIs for accessing their data including Facebook, Twitter, Google among others. As an example, here is an API for pulling data about DVDs, movies, books, and games.

After the workspace, you’ll find a set of concepts that explain how to use the World Bank API. Go through that material if you’d like an extra challenge for building your web app.


APIs [advanced version]


What is an API?

Instead of downloading World Bank data via a csv file, you’re going to download the data using the World Bank API.

API is an acronym that stands for application programming interface. API’s provide a standardized way for two applications to talk to each other. For this project, the applications communicating with each other are the server application where World Bank stores data and your web application.

If you wanted to pull data directly from the World Bank’s server, you’d have to know what database system the World Bank was using. You’d also need permission to log in directly to the server, which would be a security risk for the World Bank. And if the World Bank ever migrated its data to a new system, you would have to rewrite all of your code again.

The API sits between your web app and the World Bank server. And the API allows you to execute code on the World Bank server without getting direct access.

All sorts of companies have public facing APIs including Facebook, Twitter, Google and Pinterest. You can pull data from these companies to create your own applications.

In the next section, you’ll get practice using Python to pull data from the World Bank API. This will set you up for creating the web app with data from the API instead of using data from a csv file.


APIs Besides the World Bank

All types of companies have APIs. Some of these APIs are only for internal company use while other APIs help the public consume data. A few examples of public APIs include the Twitter API, the Google Maps API, the Facebook Graph API, and the US Government Data APIs.

In addition, oftentimes you can find open source libraries or development kits for connecting to an API. For example, here is an open source Python development kit for the Facebook Graph API.

Some APIs might be used for pulling data from a database. But other APIs are for adding data to a database. For example, you might make an application that automatically tweets the current weather. In that case, you would use the Twitter API to post a tweet, which in reality inserts a tweet into Twitter’s database.


Using an API

In the next few parts of the lesson, you’ll see how to use the World Bank API. This API is relatively straightforward to use. Each API, however, will have a different set up and only allow you to take certain actions. In general, you send a request via a web url that specifies the information you want. You receive data back typically in XML or JSON.

The XML standard was developed in the 1970s and 1980s and soon became a common way to transfer data over the web. JSON was developed in the mid 1990s. Over time, JSON has increased in popularity relative to XML perhaps because JSON is easier to parse.

Some APIs require authentication; essentially the company with the API gives you ‘credentials’ so that they can track how you are using the API and ensure you have the proper permissions.

Some APIs might let you extract data from a database. Other APIs might even let you insert data into a database depending on the use case. Most APIs include extensive documentation so that you can figure out how to use APIs.

If you ever can’t figure out how to use an API, search online for examples. You can search for something like, “Examples for using the World Bank API” or “Examples for using the Facebook API”.

Move on to the next section to see how to use the World Bank API and incorporate it into a web app.


World Bank API [advanced version]

The World Bank API


REST Architecture

REST is a software architecture for the web. You don’t need to understand how REST works in order to use an API. but you will see the term used quite frequently when working with APIs. Modern web APIs are often called RESTFul to indicate that they conform to a REST Architecture.


World Bank API

Here is the website where the csv files were downloaded for the World Bank web app: World Bank Indicator Data

And here is the link to the World Bank API documenation: World Bank API Documentation

One tricky aspect of working with the World Bank API is that it only gives back 50 results at a time. There is an option called per_page that allows you to return up to 1000 results. However, some queries might have more than 1000 results. That’s where the page option comes into play. You’ll notice that at the very beginning of the data, there is a variable called page and another one called pages. If page=1 and pages=4, then you’d need to write 4 queries with the option page=1, page=2, page=3 and page=4.

Next, you’ll practice pulling data from the API using Python code.


Python and APIs [advanced version]

Python and APIs


World Bank Data Dashboard [advanced version]

Link to the Code

You can find code for this data dashboard here on GitHub.

How the Filter Works

This version of the web app has a filter made with a form. When you check the boxes on the form and click submit, the form gets submitted to the index.html page. It’s essentially a circle where the index.html loads, the form gets submitted to index.html itself, and then index.html loads again. With a web form, you could also submit the form to a different web page.

On the back-end, routes.py can access the information that was submitted with the web form; the front-end receives information about which boxes were checked.

Code your Project

Start working on your project! Go back to the “Workspace Portfolio Exercise” with the template code. You’ll find it earlier in this Portfolio Exercise lesson. Here are a few APIs that you might find interesting to work with:

Many government and city agencies have APIs where you can access city data.