STAT 19000: Project 12 — Fall 2020
Motivation: In the previous project you were forced to do a little bit of date manipulation. Dates can be very difficult to work with, regardless of the language you are using. lubridate
is a package within the famous tidyverse, that greatly simplifies some of the most common tasks one needs to perform with date data.
Context: We’ve been reviewing topics learned this semester. In this project we will continue solving data-driven problems, wrangling data, and creating graphics. We will introduce a tidyverse package that adds great stand-alone value when working with dates.
Scope: r
Questions
Question 1
Let’s continue our exploration of the Zillow time series data. A useful package for dealing with dates is called lubridate
. This is part of the famous tidyverse suite of packages. Run the code below to load it. Read the /class/datamine/data/zillow/State_time_series.csv
dataset into a data.frame named states
. What class and type is the column Date
?
library(lubridate)
-
R code used to solve the question.
-
class
andtypeof
columnDate
.
Question 2
Convert column Date
to a corresponding date format using lubridate
. Check that you correctly transformed it by checking its class like we did in question (1). Compare and contrast this method of conversion with the solution you came up with for question (3) in the previous project. Which method do you prefer?
Take a look at the following functions from |
Here is a video about |
-
R code used to solve the question.
-
class
of modified columnDate
. -
1-2 sentences stating which method you prefer (if any) and why.
Question 3
Create 3 new columns in states
called year
, month
, day_of_week
(Sun-Sat) using lubridate
. Get the frequency table for your newly created columns. Do we have the same amount of data for all years, for all months, and for all days of the week? We did something similar in question (3) in the previous project — specifically, we broke each date down by year. Which method do you prefer and why?
Take a look at functions |
You may find the argument of |
Here is a video about |
-
R code used to solve the question.
-
Frequency table for newly created columns.
-
1-2 sentences answering whether or not we have the same amount of data for all years, months, and days of the week.
-
1-2 sentences stating which method you prefer (if any) and why.
Question 4
Is there a better month or set of months to put your house on the market? Use tapply
to compare the average DaysOnZillow_AllHomes
for all months. Make a barplot showing our results. Make sure your barplot includes "all of the fixings" (title, labeled axes, legend if necessary, etc. Make it look good.).
If you want to have the month’s abbreviation in your plot, you may find both the |
This video might help with Question 4. |
-
R code used to solve the question.
-
The barplot of the average
DaysOnZillow_AllHomes
for all months. -
1-2 sentences answering the question "Is there a better time to put your house on the market?" based on your results.
Question 5
Filter the states
data to contain only years from 2010+ and call it states2010plus
. Make a lineplot showing the average DaysOnZillow_AllHomes
by Date
using states2010plus
data. Can you spot any trends? Write 1-2 sentences explaining what (if any) trends you see.
-
R code used to solve the question.
-
The time series lineplot for the average
DaysOnZillow_AllHomes
per date. -
1-2 sentences commenting on the patterns found in the plot, and your impressions of it.
Question 6
Do homes sell faster in certain states? For the following states: 'California', 'Indiana', 'NewYork' and 'Florida', make a lineplot for DaysOnZillow_AllHomes
by Date
with one line per state. Use the states2010plus
dataset for this question. Make sure to have each state line colored differently, and to add a legend to your plot. Examine the plot and write 1-2 sentences about any observations you have.
You may want to use the |
Make sure to fix the y-axis limits using the |
You may find the argument |
To make your legend fit, consider using the states abbreviation, and the arguments |
-
R code used to solve the question.
-
The time series lineplot for
DaysOnZillow_AllHomes
per date for the 4 states. -
1-2 sentences commenting on the patterns found in the plot, and your answer to the question "Do homes sell faster in certain states rather than others?".