TDM 10100: Project 6 — Fall 2022
Dataset(s)
The following questions will use the following dataset(s):
-
/anvil/projects/tdm/data/olympics/athlete_events.csv
-
/anvil/projects/tdm/data/death_records/DeathRecords.csv
Questions
ONE
Read the dataset /anvil/projects/tdm/data/olympics/athlete_events.csv
, into a data.frame called eventsDF
. (We do not need the tapply
function for Question 1.)
-
What are the years included in this data.frame?
-
What are the different countries participating in the Olympics?
-
How many times is each country represented?
-
Code used to solve this problem.
-
Output from running the code.
-
Answers to the code above.
TWO
-
What is the average height of participants from each country?
-
What are the oldest ages of the athletes from each country?
-
What is the sum of the weights of all participants from each country?
-
Code used to solve this problem.
-
Output from running the code.
-
Answers to the code above
THREE
Read the dataset /anvil/projects/tdm/data/death_records/DeathRecords.csv
into a data.frame called deathrecordsDF
. (We do not need the tapply
function for Question 3.)
-
What are the column names in this dataframe?
-
Change the column "DayOfWeekOfDeath" from numbers to weekdays
-
How many people died in total on each day of the week?
-
Code used to solve this problem.
-
Output from running the code.
-
Answers to the questions above
FOUR
-
What is the average age of Females versus Males at death?
-
What is the number of Females who are married? Divorced? Widowed? Single? Now find the analogous numbers for Males.
-
Now solve both questions from 4b at one time, i.e., use one command to find the number of Females who are married, divorced, widowed, or single, and the number of Males in each of these four categories. You can compute all eight numbers with just one
tapply
command.
-
Code used to solve this problem.
-
Output from running the code.
-
Answers to the question above
FIVE
-
Using the two data sets create two separate graphs or plots on the data that you find interesting (one graph or plot for each of the two data sets in this project). Write 1-2 sentences on each one and why you found it interesting/what you noticed in the dataset.
-
Code used to solve this problem.
-
Output from running the code.
Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project. |