Coding Temple's Data Analytics Program PreWork

Getting Started

Before you begin the exercises, please ensure the following:

You have created an account with Kaggle, Tableau, GitHub
You have downloaded MySQL Installer pgadmin4, R, R Studio, Tableau Public, and GitHub Desktop
You have access to Microsoft Excel, whether locally or online. Excel requires a subscription on your local machine, but offers a free month trial! Online, access to Excel is free.
You have created a GitHub Repository for all your precourse work to go into.
Watch this video walkthrough on how to create a GitHub Repository

Excel

What is Excel?

Microsoft Excel is a powerful spreadsheet software that allows users to create, edit, organize, and analyze data. It uses a grid of cells arranged in rows and columns, where users can enter numbers, text, and formulas to perform calculations, make charts and graphs, and manage data. Excel is a popular tool used in businesses, schools, and personal life for a variety of purposes, such as budgeting, project management, data analysis, and reporting. It offers a wide range of features and functions, including mathematical functions, formatting options, conditional formatting, data validation, pivot tables, and macros. Excel's ease of use, versatility, and flexibility have made it an essential tool for many professionals across industries, and it continues to evolve with new features and capabilities to meet the changing needs of users.

Loading a File

After launching Excel, we are greeted by the Home screen. In this screen, we will see three different panes off to the left hand side. Home, New, and Open. We will click on the Open button to open the Open dialog box. Once we select the open button, we will want to click on Browse option to begin browsing the files on our computer, and select the file we want to open in Excel.

With .csv files, Excel will infer the format it imports in. You can verify that the data has been loaded correctly by seeing the column names in the top row followed by data in each other row. Each row should have a value and each column should be a variable.

Creating Visualizations

Excel is a powerful tool that can be used to create compelling visualizations of data with minimal effort. Whether you're looking to create charts, graphs, or pivot tables, Excel offers a wide range of options to help you visualize your data in a way that is both informative and visually appealing. To get started with creating visualizations in Excel, you will need to select the rows or columns of data that you want to work with. This can be done by simply clicking and dragging the mouse over the cells that contain the data you want to use. Once you have selected the data, you can begin creating visualizations using the Insert tab in the Excel ribbon. This tab contains a range of options for creating visualizations, including charts, graphs, and pivot tables.

For Example: If I wanted to create a bar chart of my data, I would select the cells containing the data I want to chart, then click on the Insert tab, selecting the Bar Chart option from the Charts section. Excel will automatically generate a bar chart based on the data that I have selected! We can use the Design tab to change the chart type to any available charts from

Exercises

Complete the following exercises in an .xlsx file.
Upload the file to github.com and submit the link to the Git Repository to the Google Classroom Assignment.

Exercise 1:

Using Excel, load in the dataset located here.

Exercise 2:

Create a bar chart of the Fiscal Year and Revenue columns.

MySQL

Walk-Through

Create a Database

After installation of MySQL, we can open the Workbench, our GUI (Graphic User Interface) for MySQL operations. When opened, we are greeted by the MySQL welcome screen. Don't worry too much about the GUI yet, we will be covering over it in class.

On the screen in bold letters is MySQL Connections. Next to this, is a small + button. This is how we will set up our new connection. Once we click on this, we will focus on the Stored Connection and the Test Connection variables. Click on Connection Name and name your connection test. After that, click on the Test Connection button to verify that you have a connection to your database you are trying to create. In the event that the connection fails, you may want to double check your firewall or antivirus. You should also make sure that you installed MySQL using the MySQL Installer. You may also need to open the port on your computer to accept connections!

Once we test the connection and have verified that it works, click the OK button to create your first database. Then we can click on the connection name and open our database.

Create a Table

Structured Query Language (SQL) is a programming language used to manage and manipulate data in relational database systems. SQL is a standard language used across many database management systems, including MySQL, Oracle, Microsoft SQL Server, and PostgreSQL. To interact with a MySQL database, you use SQL commands to create, read, update, and delete data. SQL provides a powerful set of commands that allow you to query data, modify tables, and perform various operations on the data stored in a MySQL database. To create a table, we specify using SQL keywords CREATE and TABLE, giving a table name and the names of columns we wish to add to the table.

For Example:

            CREATE TABLE IF NOT EXISTS products(
                id INT,
                product VARCHAR(50),
                quantity INT,
            )

In this example, we create the table products only if the table does not already exist. We instantiate the table with three different columns and give each column a name and the data type we expect to see in the column. You can read more about creating a table in SQL here

Inserting Data

Once you have created a table in SQL, the next step is to add data to it. This is typically done using the INSERT keyword, which allows you to specify the values you want to insert into the table. To use the INSERT keyword, you will need to specify the name of the table you want to add data to, followed by the columns that you want to insert data into. You can then provide the values you want to insert into each column, either as a single value or as a set of values separated by commas.

For Example: Let's say I had a products table with the columns id, product_name, and quantity. To add new products to this table, I would use the following statement.

            INSERT INTO products
            VALUES(
                1,
                'banana',
                200
            )

In this example, we insert into the table products the values for the id column (1), the product_name column (banana), and the quantity column (200). For more on how to utilize the INSERT statement in MySQL, check out this resource.

Querying Data

Now that we have a table and data inside that table, what if I wanted to view my data? This is where queries come in! Queries are how we access and return data from a table. A query utilizes the SQL keyword SELECT to select the data that you want to have returned from the table to you or a user.

For Example:

            SELECT * FROM products

* in SQL has a special meaning. It will return all the data within a table! So, when we run our statement above, we will be returning all data available within our table.

Exercises

Complete the following exercises in MySQL
Save your code for all exercises to a single .sql file and upload the file to your GitHub Repository

Exercise 1:

Create a table named customers. Your table should have three columns in it's creation.

cust_id
cust_name
address

Exercise 2:

Now that you have created a customers table, insert in three rows of data You are free to choose the data that you insert into your table.

Exercise 3:

Query all the data from your database.

R

Download the Textbook
Read Chapter 1
Read Chapter 2

Follow Along

After installing R and R Studio, we will launch R Studio. R Studio has a lot in it's GUI, all of which we will be covering over together during our course. Today, we will focus on the new file button, located in the top left corner of R Studio. We will create a new .R, or R Script file. Now that we have our new file, we can begin to work with R!

Variable Assignment

In R, a variable is a name that is assigned to a value or an object. Variable assignment is the process of assigning a value or an object to a variable name. This allows you to store data and use it later in your code. The most basic form of variable assignment in R is to use the assignment operator, which is represented by the <- symbol. Notice that the assignment operand (<-) points to the object recieving the value of the expression. While the = can be used to assign variables as well, it is frowned upon to use for variable assignments and is improper syntax. Assignments can also be made in the opposite direction, with value -> variable_name. Variables can take on any type of value; from lists, to arrays, to vectors, to characters, to integers, to double precision values!

For Example:

x

3

            > x <- 3
            > print(x)
            3

            # Can also look like:
            > 3 -> x
            > x
            3

This code creates a variable named x and assigns it the value of 3! We can then use the variable in our code to reference the value that was assigned to it. We can view that value by calling to the variable or wrapping it in a print statement.

We can also assign variables using the assign() function! This is an equivalent to the code mentioned above.

            > assign("x", 3)
            > print(x)
            3

Working with Numerical Values

R allows users to perform a variety of operands with integer values. Some examples of these operands would be: +, -, *, /, %%, %/%. Let's take a look at how we can apply these to some numerical values:

            # Create two numerical variables
            num_1 <- 3
            num_2 <- 5
            
            # Perform addition
            num_1 + num_2
            
            # Perform subtraction
            num_1 - num_2
            
            # Perform multiplication
            num_1 * num_2
            
            # Perform division
            num_1 / num_2
            
            # Perform floor division, returning the product of the division without a remainder
            num_2 %/% num_1

            # Perform modulus, returning the remainder of the division
            num_2 %% num_1

Character Manipulation

In programming, strings are a fundamental data type that represent a sequence of characters. Python and R are two popular programming languages used in data analytics and statistical computing, and they both have different terminologies for strings. In Python, strings are referred to as "strings". A string is a sequence of characters enclosed within single quotes, double quotes, or triple quotes. Strings can be manipulated using various built-in string methods such as slicing, concatenation, and formatting. In contrast, R uses the term "Character" to refer to a string-type object. In R, a character is a sequence of characters enclosed within double quotes. Like Python, R also provides several built-in functions to manipulate character strings, such as nchar() to count the number of characters in a string, substr() to extract a substring from a character string, and paste() to concatenate multiple strings.

What if I wanted to create two character strings and save them to the variables my_first_name and my_last_name? The code for that would look like:

            > my_first_name <- "Alex"
            > my_last_name <- "Lucchesi"

Now, we have our two variables containing our name. How can I change this into a single variable named my_full_name instead? In R, we can utilize the paste function. paste concatenates values given, spearating them by a sep argument.

For Example:

            > my_full_name <- paste(my_first_name, my_last_name, sep=' ')
            > print(my_full_name)
            Alex Lucchesi

We can also use the print() function to concatenate pieces of our print statement with variables we have created.

            # Print a character string that has "My name is:" and our full name
            > print("My name is:", my_full_name)
            My name is: Alex Lucchesi

Vectors

R operates on named data structures. The simplest data structure in R is the numeric vector, which consists of a single entity of an ordered collection of numbers. In R, we will work a lot with vectors. A vector is like a list in Python, but can only hold a single data type. Vectors are highly functional and an important part of Data Analysis.

To create a vector, we use the c() function in R, passing in the values we wish to add to the vector, separated by a comma.

For Example:

            # Create a numerical vector and save it to the variable my_vec
            > my_vec <- c(1,2,3,4,5,6,7,8,9)
            > print(my_vec)
            1 2 3 4 5 6 7 8 9

Numerical vectors support element-wise operations; meaning we can add, subtract, multiply, and divide vectors across each value in each vector.

For Example:

            # Create a new numerical vector and save it to the variable my_vec2
            > my_vec2 <- c(9,8,7,6,5,4,3,2,1)
            
            # Create a new variable named my_added_vec which adds both vectors together
            > my_added_vec <- my_vec + my_vec2
            
            # View the new vector
            > print(my_added_vec)
            10, 10, 10, 10, 10, 10, 10, 10, 10

            # Square a vector
            > my_added_vec ^ 2
            100, 100, 100, 100, 100, 100, 100, 100, 100

Vectors also support common arithmetic functions, such as log, exp, sin, cos, tan, and sqrt. We can also use aggregate statistical functions such as max, min, length, sum, prod, mean, and median.

We can also recycle vectors. Vector recycling refers to using a shorter vector and a longer vector to complete computations. When we do this, we recycle the shorter vector, causing it to repeat itself from the beginning of the vector until it reaches the end of the second vector!

For Example:

            > vec_1 <- c(1,2)
            > vec_2 <- c(1,2,3,4,5,6)
            > vec_1 * vec_2
            1, 4, 3, 8, 5, 12

R can also generate sequences of numbers! A sequence is just like the range() function in Python, which generates number between a start and a stop value. We can define a sequence a couple of different ways. One is to use the seq() function to generate a sequence. Generating a sequence using this function allows you to also specify a step, or by argument. This argument specifies how much we want to increase the values within the sequence by. The other would be to use the : operand to create the sequence

For Example:

            # Generate a simple sequence using the seq() function
            > seq(1, 10)
            1,2,3,4,5,6,7,8,9,10

            # Generate a sequence using the by argument
            > seq(1,10, by=.2)
            1, 1.2, 1.4, 1.8, ..., 9.6, 9.8, 10
            
            # Generate a sequence using the : operand
            > 1:10
            1,2,3,4,5,6,7,8,9,10

Two things to note here are that using the by argument, we can assign a decimal value and return values that are instead decimal(double precision) values. The other thing to note is that when using a generator, it is inclusive of the stop value. Meaning the final value you give the generator will be returned as well, which differs from Python, where the range() function is non-inclusive of the stop value!

A function that is related to the sep() function is the rep() function. This function allows us to repeat or replicate values or objects in various ways.

For Example:

            # Use the rep() function to replicate a single value
            # Times allows us to specify the number of times we want a value to be replicated
            > rep(5, times = 5)
            5, 5, 5, 5, 5

            # Use the rep() function to replicate a vector
            # Each allows us to specify the number of times we want each value in the vector to be replicated before moving onto the next value in the object.
            > rep(my_added_vec, times = 5, each = 5)
            10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10
            10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10
            10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10
            10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10
            10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10

Sometimes, parts of a vector may not be completely known. When a value is "not available", or a "missing value", the vector assigns it the value of NA. Anytime we attempt to perform an operation on a NA value, we will return a NA, since the value is unknown. We can check if a null value exists using the function is.na(). This will return a logical vector where each instance of a null value is listed as TRUE and each non-null value is returned as FALSE.

            # Create a vector with a null value
            > v <- c(1,2,NA,3)

            # Check for null values
            > is.na(v)
            FALSE, FALSE, TRUE, FALSE

Sometimes, we may not want to return all the values in a vector. We may only be interested in a couple values from it. We can use indexing to return a subsection of a vector. Indexing a vector is done by calling to the vector and using square brackets immedietely after the expression. Indexing can use the position in the vector that the value we want to call to is located at. For example, in our vector v, if we wanted to access the value of NA, we would call the index position of 3, as it is the third value inside the vector! We can also give a range of numbers as an indexor, returning all the values within that range!

            # Index a vector and grab the NA value
            # Note that index positions start at the value of 1!
            > v[3]
            NA

            # We can also use negative numbers to index a vector.
            # Using a negative number allows us to access the last value within the vector
            > v[-1]
            3

            # Index a vector using a range
            > v[1:3]
            1, 2, NA

            # Index a vector using a negative range
            > v[-(1:3)]
            2, NA, 3

We can get even more functionality out of indexing a vector. Vectors can be accessed using a logical statement as well, where the values that match the conditional statement are what is returned to us! To do so, we would place the conditional statement inside the square brackets instead of an index position number.

            # Return only the values in v that are not NA values
            > v[!is.na(v)]
            1, 2, 3

Exercises

Complete the following exercises in a .r file.
Upload the file to your GitHub Repository

Exercise 1:

Create three variables; x, y, and z. Assign each of these variables a numerical value, then perform each of the operands you have learned about so far to all three variables.

Exercise 2:

Create two strings with your first and last name. Concatenate the strings and print the new value as "My name is your newly created concatenated variable will go here"

Exercise 3:

Create two numerical vectors of equal length. Multiply these vectors together and save the resulting vector to the variable mult_vect

Exercise 4:

Using your vector, mult_vect, index the vector and return the last 4 values.

Exercise 5:

Generate a sequence of numerical values. Save this sequence to the variable my_first_sequence

Python 101

Task 1: Read Textbook

Download e-book
Watch the Video Lesson
Read Chapter 2

Practice what you read
Watch the Video Lesson
Chapter 2 Quiz

Read Chapter 3

Practice what you read
Watch the Video Lesson

Read Chapter 4

Task 2: Read Textbook

Download e-book
Read Chapter 5

Practice what you read
Watch the Video Lesson
Chapter 5 Quiz

Read Chapter 6

Practice what you read
Watch the Video Lesson
Chapter 6 Quiz

Read Chapter 7

Practice what you read
Watch the Video Lesson
Chapter 7 Quiz

Read Chapter 8

Practice what you read
Watch the Video Lesson
Chapter 8 Quiz

Task 3: Coding Questions

Complete the Coding Questions below in one .pyfile
Upload this file to your GitHub Repository and submit the link to your repository to Google Classroom
Watch this video walkthrough

You will be turing in this assignment to you google classroom. Please save your 5 functions to one .py file demark the question numbers and the question in a comment above it's respective function

Question 1:

Write a function to print "hello_USERNAME!" USERNAME is the input of the function. The first line of the code has been defined as below.

            def first_odds():
            .....

Question 2:

Write a python function, first_odds that prints the odd numbers from 1-100 and returns nothing

            def first_odds():
            .....

Question 3:

Please write a Python function, max_num_in_list to return the max number of a given list. The first line of the code has been defined as below.

            def max_num_in_list():
            .....

Question 4:

Write a function to return if the given year is a leap year. A leap year is divisible by 4, but not divisible by 100, unless it is also divisible by 400. The return should be boolean Type (true/false).

            def is_leap_year():
            .....

Question 5:

Write a function to check to see if all numbers in list are consecutive numbers. For example, [2,3,4,5,6,7] are consecutive numbers, but [1,2,4,5] are not consecutive numbers. The return should be boolean Type.

            def is_consecutive(a_list):
            .....