Before you begin the exercises, please ensure the following:
Microsoft Excel is a powerful spreadsheet software that allows users to create, edit, organize, and analyze data. It uses a grid of cells arranged in rows and columns, where users can enter numbers, text, and formulas to perform calculations, make charts and graphs, and manage data. Excel is a popular tool used in businesses, schools, and personal life for a variety of purposes, such as budgeting, project management, data analysis, and reporting. It offers a wide range of features and functions, including mathematical functions, formatting options, conditional formatting, data validation, pivot tables, and macros. Excel's ease of use, versatility, and flexibility have made it an essential tool for many professionals across industries, and it continues to evolve with new features and capabilities to meet the changing needs of users.
After launching Excel, we are greeted by the Home
screen. In this screen, we will see three
different panes off to the left
hand side. Home
, New
, and Open
. We will click on the Open
button to open the Open
dialog box.
Once we select the open button, we will want to click on Browse
option to begin browsing the files
on our computer,
and select the file we want to open in Excel.
With .csv
files, Excel will infer the format it imports in. You can verify that the data has been
loaded correctly by seeing the column names in the top row followed by data in each other row.
Each row should have a value and each column should be a variable.
Excel is a powerful tool that can be used to create compelling visualizations of data with minimal effort.
Whether you're looking to create charts, graphs,
or pivot tables, Excel offers a wide range of options to help you visualize your data in a way that is both
informative and visually appealing.
To get started with creating visualizations in Excel, you will need to select the rows or columns of data that
you want to work with.
This can be done by simply clicking and dragging the mouse over the cells that contain the data you want to use.
Once you have selected the data, you can begin creating visualizations using the Insert
tab in the
Excel ribbon.
This tab contains a range of options for creating visualizations, including charts, graphs, and pivot tables.
For Example:
If I wanted to create a bar chart of my data, I would select the cells containing the data I want to chart, then
click on the Insert
tab, selecting the Bar Chart
option from the Charts
section. Excel will automatically
generate a bar chart based on the data that I have selected!
We can use the Design
tab to change the chart type to any available charts from
.xlsx
file.Using Excel, load in the dataset located here.
Create a bar chart of the Fiscal Year
and Revenue
columns.
After installation of MySQL, we can open the Workbench, our GUI (Graphic User Interface) for MySQL operations. When opened, we are greeted by the MySQL welcome screen. Don't worry too much about the GUI yet, we will be covering over it in class.
On the screen in bold letters is MySQL Connections. Next to this, is a small +
button.
This is how we will set up our new connection. Once we click on this, we will focus on the
Stored Connection
and the Test Connection
variables. Click on Connection Name
and name your connection
test
. After that, click on the
Test Connection
button to verify that you have a connection to your database you are trying to
create. In the event that the connection fails,
you may want to double check your firewall or antivirus. You should also make sure that you installed MySQL
using the MySQL Installer.
You may also need to open the port on your computer to accept connections!
Once we test the connection and have verified that it works, click the OK
button to create your
first database.
Then we can click on the connection name and open our database.
Structured Query Language (SQL) is a programming language used to manage and manipulate data in relational
database systems. SQL is a standard language used across many database management systems,
including MySQL, Oracle, Microsoft SQL Server, and PostgreSQL.
To interact with a MySQL database, you use SQL commands to create, read, update, and delete data. SQL provides a
powerful set of commands that allow you to
query data, modify tables, and perform various operations on the data stored in a MySQL database.
To create a table, we specify using SQL keywords CREATE
and TABLE
,
giving a table name and the names of columns we wish to add to the table.
For Example:
CREATE TABLE IF NOT EXISTS products( id INT, product VARCHAR(50), quantity INT, )
In this example, we create the table products
only if the table does not already exist. We
instantiate the table with three different columns
and give each column a name and the data type we expect to see in the column. You can read more about creating a
table in SQL here
Once you have created a table in SQL, the next step is to add data to it. This is typically done using the
INSERT
keyword,
which allows you to specify the values you want to insert into the table. To use the INSERT
keyword, you will need to specify
the name of the table you want to add data to, followed by the columns that you want to insert data into. You
can then provide the values you want to insert into each column,
either as a single value or as a set of values separated by commas.
For Example:
Let's say I had a products table with the columns id
, product_name
, and
quantity
. To add new products to
this table, I would use the following statement.
INSERT INTO products VALUES( 1, 'banana', 200 )
In this example, we insert into the table products
the values for the id column (1), the
product_name column (banana), and the quantity column (200).
For more on how to utilize the INSERT
statement in MySQL, check out this resource.
Now that we have a table and data inside that table,
what if I wanted to view my data? This is where queries come in! Queries are how we access and return data from
a table. A query utilizes the SQL keyword SELECT
to select the data that you want to have returned from the table to you or a user.
SELECT * FROM products
*
in SQL has a special meaning. It will return all the data within a table!
So, when we run our statement above, we will be returning all data available within our table.
Create a table named customers
. Your table should have three columns in it's creation.
Now that you have created a customers
table, insert in three rows of data
You are free to choose the data that you insert into your table.
Query all the data from your database.
After installing R and R Studio, we will launch R Studio. R Studio has a lot in it's GUI, all of
which we will be covering over together during our course. Today, we will focus on the new file
button, located in the top left corner of R Studio. We will create a new .R, or R Script
file.
Now that we have our new file, we can begin to work with R!
In R, a variable is a name that is assigned to a value or an object. Variable assignment is the process of
assigning a value or an object to a variable name.
This allows you to store data and use it later in your code. The most basic form of variable assignment in R
is to use the assignment operator, which is represented by the <-
symbol.
Notice that the assignment operand (<-
) points to the object recieving the value of the
expression. While the =
can be used to assign variables as well,
it is frowned upon to use for variable assignments and is improper syntax. Assignments can also be made in
the opposite direction, with value -> variable_name
. Variables can take on any type of value;
from lists, to arrays, to vectors, to characters, to integers, to double precision values!
x
and save the value 3
to it.
The code would look like:
> x <- 3 > print(x) 3 # Can also look like: > 3 -> x > x 3
This code creates a variable named x
and assigns it the value of 3
! We can then
use the variable in our code to reference the value that
was assigned to it. We can view that value by calling to the variable or wrapping it in a print statement.
We can also assign variables using the assign()
function! This is an equivalent to the code
mentioned above.
> assign("x", 3) > print(x) 3
R allows users to perform a variety of operands with integer values. Some examples of these operands would
be: +
,
-
, *
, /
, %%
, %/%
. Let's take a look at how
we can apply these to some numerical values:
# Create two numerical variables num_1 <- 3 num_2 <- 5 # Perform addition num_1 + num_2 # Perform subtraction num_1 - num_2 # Perform multiplication num_1 * num_2 # Perform division num_1 / num_2 # Perform floor division, returning the product of the division without a remainder num_2 %/% num_1 # Perform modulus, returning the remainder of the division num_2 %% num_1
In programming, strings are a fundamental data type that represent a sequence of characters. Python and R
are two popular programming languages used in data
analytics and statistical computing, and they both have different terminologies for strings. In Python,
strings are referred to as "strings". A string is a sequence of
characters enclosed within single quotes, double quotes, or triple quotes. Strings can be manipulated using
various built-in string methods such as slicing, concatenation, and formatting.
In contrast, R uses the term "Character" to refer to a string-type object. In R, a character is a sequence
of characters enclosed within double quotes. Like Python, R also provides
several built-in functions to manipulate character strings, such as nchar()
to count the number
of characters in a string, substr()
to extract a substring from a character string, and
paste()
to concatenate multiple strings.
What if I wanted to create two character strings and save them to the variables
my_first_name
and my_last_name
? The code for that would look like:
> my_first_name <- "Alex" > my_last_name <- "Lucchesi"
Now, we have our two variables containing our name. How can I change this into a single variable
named my_full_name
instead? In R, we can utilize the paste
function.
paste
concatenates values given, spearating them by a sep
argument.
For Example:
> my_full_name <- paste(my_first_name, my_last_name, sep=' ') > print(my_full_name) Alex Lucchesi
We can also use the print() function to concatenate pieces of our print statement with variables we have created.
# Print a character string that has "My name is:" and our full name > print("My name is:", my_full_name) My name is: Alex Lucchesi
R operates on named data structures. The simplest data structure in R is the numeric vector, which consists of a single entity of an ordered collection of numbers. In R, we will work a lot with vectors. A vector is like a list in Python, but can only hold a single data type. Vectors are highly functional and an important part of Data Analysis.
To create a vector, we use the c()
function in R, passing in the values we wish to add to the
vector,
separated by a comma.
For Example:
# Create a numerical vector and save it to the variable my_vec > my_vec <- c(1,2,3,4,5,6,7,8,9) > print(my_vec) 1 2 3 4 5 6 7 8 9
Numerical vectors support element-wise operations; meaning we can add, subtract, multiply, and divide vectors across each value in each vector.
For Example:
# Create a new numerical vector and save it to the variable my_vec2 > my_vec2 <- c(9,8,7,6,5,4,3,2,1) # Create a new variable named my_added_vec which adds both vectors together > my_added_vec <- my_vec + my_vec2 # View the new vector > print(my_added_vec) 10, 10, 10, 10, 10, 10, 10, 10, 10 # Square a vector > my_added_vec ^ 2 100, 100, 100, 100, 100, 100, 100, 100, 100
Vectors also support common arithmetic functions, such as log, exp, sin, cos, tan, and sqrt. We can also use aggregate statistical functions such as max, min, length, sum, prod, mean, and median.
We can also recycle vectors. Vector recycling refers to using a shorter vector and a longer vector to complete computations. When we do this, we recycle the shorter vector, causing it to repeat itself from the beginning of the vector until it reaches the end of the second vector!
For Example:
> vec_1 <- c(1,2) > vec_2 <- c(1,2,3,4,5,6) > vec_1 * vec_2 1, 4, 3, 8, 5, 12
R can also generate sequences of numbers! A sequence is just like the range()
function in
Python, which generates number between a start and a stop value.
We can define a sequence a couple of different ways. One is to use the seq()
function to
generate a sequence. Generating a sequence using this function allows you to also specify
a step, or by argument. This argument specifies how much we want to increase the
values within the sequence by.
The other would be to use the :
operand to create the sequence
For Example:
# Generate a simple sequence using the seq() function > seq(1, 10) 1,2,3,4,5,6,7,8,9,10 # Generate a sequence using the by argument > seq(1,10, by=.2) 1, 1.2, 1.4, 1.8, ..., 9.6, 9.8, 10 # Generate a sequence using the : operand > 1:10 1,2,3,4,5,6,7,8,9,10
Two things to note here are that using the by argument, we can assign a decimal value and
return values that are instead decimal(double precision) values.
The other thing to note is that when using a generator, it is inclusive of the stop value. Meaning
the final value you give the generator will be returned as well,
which differs from Python, where the range()
function is non-inclusive of the stop value!
A function that is related to the sep()
function is the rep()
function. This
function allows us to repeat or replicate values or objects in various ways.
For Example:
# Use the rep() function to replicate a single value # Times allows us to specify the number of times we want a value to be replicated > rep(5, times = 5) 5, 5, 5, 5, 5 # Use the rep() function to replicate a vector # Each allows us to specify the number of times we want each value in the vector to be replicated before moving onto the next value in the object. > rep(my_added_vec, times = 5, each = 5) 10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10 10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10 10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10 10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10 10,10,10,10,10,10,10,10,10,10, ..., 10,10,10,10,10,10,10,10,10,10
Sometimes, parts of a vector may not be completely known. When a value is "not available", or a "missing
value", the vector assigns it the value of NA.
Anytime we attempt to perform an operation on a NA value, we will return a NA, since the value
is unknown.
We can check if a null value exists using the function is.na()
. This will return a logical
vector where each instance of a null value is listed as TRUE
and each non-null value is returned as FALSE.
# Create a vector with a null value > v <- c(1,2,NA,3) # Check for null values > is.na(v) FALSE, FALSE, TRUE, FALSE
Sometimes, we may not want to return all the values in a vector. We may only be interested in a couple
values from it.
We can use indexing to return a subsection of a vector. Indexing a vector is done by calling to the vector
and using square brackets immedietely after the
expression. Indexing can use the position in the vector that the value we want to call to is located at. For
example, in our vector v
,
if we wanted to access the value of NA
, we would call the index position of 3,
as it is the third value inside the vector!
We can also give a range of numbers as an indexor, returning all the values within that range!
# Index a vector and grab the NA value # Note that index positions start at the value of 1! > v[3] NA # We can also use negative numbers to index a vector. # Using a negative number allows us to access the last value within the vector > v[-1] 3 # Index a vector using a range > v[1:3] 1, 2, NA # Index a vector using a negative range > v[-(1:3)] 2, NA, 3
We can get even more functionality out of indexing a vector. Vectors can be accessed using a logical statement as well, where the values that match the conditional statement are what is returned to us! To do so, we would place the conditional statement inside the square brackets instead of an index position number.
# Return only the values in v that are not NA values > v[!is.na(v)] 1, 2, 3
.r
file.Create three variables; x, y, and z. Assign each of these variables a numerical value, then perform each of the operands you have learned about so far to all three variables.
Create two strings with your first and last name. Concatenate the strings and print the new value as
"My name is your newly created concatenated variable will go here
"
Create two numerical vectors of equal length. Multiply these vectors together and save the resulting vector
to the variable mult_vect
Using your vector, mult_vect
, index the vector and return the last 4 values.
Generate a sequence of numerical values. Save this sequence to the variable my_first_sequence
.py
fileYou will be turing in this assignment to you google classroom. Please save your 5 functions to one .py file demark the question numbers and the question in a comment above it's respective function
Write a function to print "hello_USERNAME!" USERNAME is the input of the function. The first line of the code has been defined as below.
def first_odds(): .....
Write a python function, first_odds that prints the odd numbers from 1-100 and returns nothing
def first_odds(): .....
Please write a Python function, max_num_in_list to return the max number of a given list. The first line of the code has been defined as below.
def max_num_in_list(): .....
Write a function to return if the given year is a leap year. A leap year is divisible by 4, but not divisible by 100, unless it is also divisible by 400. The return should be boolean Type (true/false).
def is_leap_year(): .....
Write a function to check to see if all numbers in list are consecutive numbers. For example, [2,3,4,5,6,7] are consecutive numbers, but [1,2,4,5] are not consecutive numbers. The return should be boolean Type.
def is_consecutive(a_list): .....