Lubridate & Tidytext

Athanasios Paschos, Florian Nemetz & Kaja Benz

3th of June 2019

Introduction - Specifying Dates

Example:

Introduction - Specifying Dates

Example:

x <- "23-10-2018"

str(x)   
##  chr "23-10-2018"
#  first possible function to import dates 
y <- as.Date(x) 
y
## [1] "0023-10-20"
str(y) 
##  Date[1:1], format: "0023-10-20"

Introduction - With the right format

library(tidyverse)
x <- "2013-07-13"
  
y <- as.Date(x)

y
## [1] "2013-07-13"
str(y)    #   structure of y 
##  Date[1:1], format: "2013-07-13"
jan_12 <- as.Date("2015-01-12")
jan_12
## [1] "2015-01-12"
unclass(jan_12)
## [1] 16447

Introduction - Arithmetic and logical operators

x <- as.Date(c("2019-06-03", "2015-01-01", "1985-04-24", "2010-03-31", "1945-10-23"))
x
## [1] "2019-06-03" "2015-01-01" "1985-04-24" "2010-03-31" "1945-10-23"
min(x)
## [1] "1945-10-23"
x[1] > x[4]
## [1] TRUE
y <- x[1] - x[3]
y
## Time difference of 12458 days

Introduction- Getting datetimes into R

Examples

x <- as.POSIXct(c("2003-10-01 22:12:00", "2012-03-24 11:34:00"))
x
## [1] "2003-10-01 22:12:00 CEST" "2012-03-24 11:34:00 CET"
x1 <- as.POSIXct(c("2010-10-01 12:12:00", "2012-03-24 11:34:00"), tz = "America/Los_Angeles")
x1
## [1] "2010-10-01 12:12:00 PDT" "2012-03-24 11:34:00 PDT"
str(x1)
##  POSIXct[1:2], format: "2010-10-01 12:12:00" "2012-03-24 11:34:00"
unclass(x1)
## [1] 1285960320 1332614040
## attr(,"tzone")
## [1] "America/Los_Angeles"

Lubridate - Introduction

Lubridate has the advantage that:

  1. is simple to use,

  2. parses formats that don’t have the ISO - 8601 format

  3. allows parsing more than one date format in one vector

  4. All functions return the same time-zone “UTC”.

  5. Ignores seperators

Lubridate - Parsing functions - 1st option

Examples

library(lubridate)
x1 <- mdy(c("12.23.2009", "2/13/2000", "10231998"))  # without separators

x1
## [1] "2009-12-23" "2000-02-13" "1998-10-23"
x2 <- ymd("2018:11:30")
    
x2
## [1] "2018-11-30"
str(x2)
##  Date[1:1], format: "2018-11-30"

Examples

# two different separators 
    
x3 <- dmy("27-2.1999")   
    
x3
## [1] "1999-02-27"
x4 <- dmy("12.5.2000", "12.23.2010") 
## Warning: 1 failed to parse.
x4
## [1] "2000-05-12" NA

Lubridate - Parsing Functions - 2nd option

There is the function parse_date_time() which takes an additional argument, orders,

where you can specify the order of the components in the date.

General Format : parse_date_time(x, orders = " " or c( " " ), tz = “UTC”, truncated = ,…)

Example :

# parse "2010 September 20th" 

parse_date_time("2010 September 20th", orders = "ymd")
## [1] "2010-09-20 UTC"

that would be equivalent to using the ymd() function from the previous exercise.

Lubridate - Parsing Functions - 2nd option (2)

Examples :

x <- "Monday June 1st 2001 at 5pm"

parse_date_time(x, orders = "AmdyIp")
## Warning: All formats failed to parse. No formats found.
## [1] NA
parse_date_time(c("10.23.2010", "2019-10-23"), order = c("mdy","ymd"))
## [1] "2010-10-23 UTC" "2019-10-23 UTC"

Examples

#  Specify order to include both "mdy" and "dmy"

dates <- c("October 23, 2011", "September 10, 2010", "April 26, 2012", "13 July 2011", "3 November 2014")

parse_date_time(dates, orders = c("mdy", "dmy"))
## [1] "2011-10-23 UTC" "2010-09-10 UTC" "2012-04-26 UTC" "2011-07-13 UTC"
## [5] "2014-11-03 UTC"
dates <- c("October 23, 2011", "September 10, 2010", "April 26, 2012", "13 July 2011", "3.10.2014")

parse_date_time(dates, orders = c("mdy", "dmy"))
## [1] "2011-10-23 UTC" "2010-09-10 UTC" "2012-04-26 UTC" "2011-07-13 UTC"
## [5] "2014-03-10 UTC"
parse_date_time(dates, orders = c("dmy", "mdy"))
## [1] "2011-10-23 UTC" "2010-09-10 UTC" "2012-04-26 UTC" "2011-07-13 UTC"
## [5] "2014-10-03 UTC"

Lubridate - Parsing date-times

dmy_hm() ydm_hms() mdy_h()

Examples

x <- c("2014-04-14-04-35-59", "2010-04-01-12-00-00")
y <- ymd_hms(x)
y 
## [1] "2014-04-14 04:35:59 UTC" "2010-04-01 12:00:00 UTC"
parse_date_time(c("13.4.1994 12:00:00", "2000-23-9 03:30:40","5.16.2008 11:25:30"),
                     orders = c("dmy_hms", "ydm_HMS","mdy_HMS"))
## Warning: hms, hm and ms usage is deprecated, please use HMS, HM or MS
## instead. Deprecated in version '1.5.6'.
## [1] "1994-04-13 12:00:00 UTC" "2000-09-23 03:30:40 UTC"
## [3] "2008-05-16 11:25:30 UTC"
parse_date_time(c("13.4.1994 12:00:00", "2000-23-9 03:30:40","5.16.2008 11:25:30"),
                                            orders = c("dmy_HMS", "ydm_HMS","mdy_HMS"))
## [1] "1994-04-13 12:00:00 UTC" "2000-09-23 03:30:40 UTC"
## [3] "2008-05-16 11:25:30 UTC"

Lubridate - One tricky form for incomplete dates

ymd_hms(…, tz = NULL, truncated = 0)

Examples

x <- c("2011-12-31 12:59:59",   "2010-01-01 12:11", "2010-01-01 12",  "2010-11")
ymd_hms(x, truncated = 4)
## [1] "2011-12-31 12:59:59 UTC" "2010-01-01 12:11:00 UTC"
## [3] "2010-01-01 12:00:00 UTC" "2010-11-01 00:00:00 UTC"
x1 <- c("2011-12-31",   "2010-01-01", "2010-01-01",   "2010-11")
ymd(x1, truncated = 1)
## [1] "2011-12-31" "2010-01-01" "2010-01-01" "2010-11-01"
ymd_hms(x, truncated = 3)
## [1] "2011-12-31 12:59:59 UTC" "2010-01-01 12:11:00 UTC"
## [3] "2010-01-01 12:00:00 UTC" "2020-10-11 00:00:00 UTC"
ymd_hms(x1, truncated = 1)
## Warning: All formats failed to parse. No formats found.
## [1] NA NA NA NA

Lubridate - What can we extract?

Components of a datetime can be extracted by lubridate functions with the same name like:

They all work the same way just pass in a datetime or vector of datetimes. There are also a few useful functions that return other aspects of a datetime like if it occurs in the morning am(), in a leap_year(), or which quarter() or semester() it occurs in.

Examples

x <- mdy_hms("12.23.2018 12:34:45",  "2-28-2015 02:34:02", "4-15-2013 11:56:34",   "3-5-2011 06:12:00")

month(x)
## [1] 12  2  4  3
year(x)
## [1] 2018 2015 2013 2011
semester(x)
## [1] 2 1 1 1
yday(x)
## [1] 357  59 105  64
wday(x)
## [1] 1 7 2 7

Lubridate - Adding useful labels

month(x) and wday(x)

  1. label and
  2. abbr

Examples

x <- mdy_hms("12.23.2018 12:34:45", "2-28-2015 02:34:02", "4-15-2013 11:56:34", "3-5-2011 06:12:00")
x
## [1] "2018-12-23 12:34:45 UTC" "2015-02-28 02:34:02 UTC"
## [3] "2013-04-15 11:56:34 UTC" "2011-03-05 06:12:00 UTC"
month(x)
## [1] 12  2  4  3
month(x, label = TRUE, abbr = FALSE)
## [1] Dezember Februar  April    März    
## 12 Levels: Jänner < Februar < März < April < Mai < Juni < ... < Dezember
wday(x, label = TRUE, abbr = TRUE ) # more readable  with %>% table()
## [1] So\\. Sa\\. Mo\\. Sa\\.
## Levels: So\\. < Mo\\. < Di\\. < Mi\\. < Do\\. < Fr\\. < Sa\\.
wday(x)
## [1] 1 7 2 7

Lubridate - Adding useful labels

For “R” the the first day of the week is on Sunday

library(lubridate)

wday(x, label = TRUE)
## [1] So\\. Sa\\. Mo\\. Sa\\.
## Levels: So\\. < Mo\\. < Di\\. < Mi\\. < Do\\. < Fr\\. < Sa\\.
wday(x)
## [1] 1 7 2 7
##          ymd weekday nr_weekday revenue_euro
## 1 2019-05-06   Mo\\.          2        10000
## 2 2019-05-07   Di\\.          3        12000
## 3 2019-05-08   Mi\\.          4        12500
## 4 2019-05-09   Do\\.          5         9000
## 5 2019-05-10   Fr\\.          6        13000

Lubridate - Adding useful labels

Example

ymd <- ymd(c("2019-05-06", "2019-05-07", "2019-05-08", "2019-05-09","2019-05-10"))

weekday <- wday(ymd, label = TRUE, week_start = getOption("lubridate.week.start",1))
weekday
## [1] Mo\\. Di\\. Mi\\. Do\\. Fr\\.
## Levels: Mo\\. < Di\\. < Mi\\. < Do\\. < Fr\\. < Sa\\. < So\\.
nr_weekday <- wday(ymd, week_start = getOption("lubridate.week.start",1))
nr_weekday
## [1] 1 2 3 4 5
##          ymd weekday nr_weekday revenue_euro
## 1 2019-05-06   Mo\\.          1        10000
## 2 2019-05-07   Di\\.          2        12000
## 3 2019-05-08   Mi\\.          3        12500
## 4 2019-05-09   Do\\.          4         9000
## 5 2019-05-10   Fr\\.          5        13000

Lubridate - Plotting

url <- "https://archive.org/download/BildungAnDeutschenHochschulenAll2/Bildung%20an%20deutschen%20Hochschulen_all2.xlsx"
GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
## Response [https://ia801506.us.archive.org/0/items/BildungAnDeutschenHochschulenAll2/Bildung%20an%20deutschen%20Hochschulen_all2.xlsx]
##   Date: 2019-06-03 08:51
##   Status: 200
##   Content-Type: application/octet-stream
##   Size: 37.9 kB
## <ON DISK>  C:\Users\flone\AppData\Local\Temp\RtmpuKUnII\file1fbc34923f34.xlsx
Dataset_Hochschule <- read_excel(tf)

# Original plot
plot_Hochschule <-ggplot(Dataset_Hochschule, aes(x = Year, y = Numbers/1000 , color = Citizenship, shape = Status)) +
  geom_line(alpha = 0.4, shape = 5, size =1) +
  geom_jitter(width = 0.2) +
  facet_wrap(.~Gender) +
ggtitle("
Developement of german and non-german HEI personnel from 2005-2017
by Gender, Citizenship and occupational status group") +
  scale_y_continuous("Numbers in thousands", seq(0, max(Dataset_Hochschule$Numbers), 10)) + 
   scale_x_continuous(limits = c(2005, 2017), breaks = seq(2005,2017,2)) + theme_bw() + 
  
 theme(panel.spacing = unit(1.5, "lines"))
## Warning: Ignoring unknown parameters: shape
ggdraw(add_sub(plot_Hochschule, 
"Note: 
The shape of the data points shows the occupation 
and the color of the lines shows the nationality of the persons. 
The category 'Professor' includes full-time professors and junior professors. 
The category 'Non-Professor' includes full-time scientific and artistic personnel.  
It consists of docents and assistants, scientific and artistic co-workers
and staff for other special tasks.",
x=0.6, vjust=0.5, size = 9, fontface = "bold",  vpadding = grid::unit(1, "lines")))
## Warning: Removed 5 rows containing missing values (geom_point).

Lubridate - Time zones

Time zones represent the same across different geographic locations.

Examples

#  to find a specific time zone:   "Lisbon"

grep("Lisbon",OlsonNames(),value=TRUE) 
## [1] "Europe/Lisbon"
#  To find the time in Lisbon in specific time: for example right now
t <- with_tz(now(), tz = "Europe/Lisbon")
t
## [1] "2019-06-03 09:51:53 WEST"
# to change the timezone without changing the clock time.  →  in Jamaica
time <- dmy_hms("23.07.2019 23:12:45")

jam <- force_tz(time, tz = "Jamaica")
jam
## [1] "2019-07-23 23:12:45 EST"

Lubridate - Rounding

Examples

time  <- ymd_hms("2017-06-23 07:13:28 UTC")
wday(time)
## [1] 6
floor_date(time, unit = "month")  # ( weird function ) # Round down to month
## [1] "2017-06-01 UTC"
ceiling_date(time, unit = "week") #Round up to week
## [1] "2017-06-25 UTC"
round_date(time, unit = "6 minutes")  # Round to nearest 6 minutes
## [1] "2017-06-23 07:12:00 UTC"
round_date(time, unit = "3 years") # Round to nearest 3 years
## [1] "2016-01-01 UTC"

Lubridate- Taking differences of daytimes

To get finer control over a difference between datetimes we use the base function difftime(). For example instead of time1 - time2 , we use difftime(time1, time2)

Examples

date1<- mdy("July 23, 1985")
date2<- mdy_hms("Sep 20th , 1990, 02:56:15")
# How many days between date1 and date2 ?
difftime(date2, date1, unit = "days")
## Time difference of 1885.122 days
# How many weeks between date1 and date2 ?
difftime(date2, date1, unit = "weeks")
## Time difference of 269.3032 weeks

Lubridate - Time spans

Creating a time span - Periods

years() , months() , weeks() , days() , hours() minutes() , seconds()

Examples :

weeks(5)
## [1] "35d 0H 0M 0S"
hours(10)
## [1] "10H 0M 0S"
days(30)
## [1] "30d 0H 0M 0S"
weeks(3)
## [1] "21d 0H 0M 0S"

Creating a time span - Durations

dyears() , dweeks() , ddays() , dhours() , dminutes() , dseconds()

Examples

ddays(4)
## [1] "345600s (~4 days)"
dyears(2)  
## [1] "63072000s (~2 years)"
dweeks(4)  
## [1] "2419200s (~4 weeks)"
ddays(23)
## [1] "1987200s (~3.29 weeks)"

Lubridate - Time spans - Extra

Examples - Another ways to create periods

#Separate period and units vectors

period(c(90, 5), c("second", "minute"))
## [1] "5M 90S"
# Units as arguments

period(second = 3, minute = 1, hour = 2, day = 13, week = 1)
## [1] "20d 2H 1M 3S"

Examples

days() + minutes()   
## [1] "1d 0H 1M 0S"
ymd_hms("2015.2.3 21:23:00") + weeks(3)
## [1] "2015-02-24 21:23:00 UTC"
dmy("23th Sep 2014") + dhours(81)
## [1] "2014-09-26 09:00:00 UTC"
c(1:3) * hours(1)
## [1] "1H 0M 0S" "2H 0M 0S" "3H 0M 0S"
x <- ymd("2016-10-03")
x + days(2) + hours(8) + minutes(45)
## [1] "2016-10-05 08:45:00 UTC"

Lubridate - Generating sequences of datetimes

today() + 1:10 * days(1)
##  [1] "2019-06-04" "2019-06-05" "2019-06-06" "2019-06-07" "2019-06-08"
##  [6] "2019-06-09" "2019-06-10" "2019-06-11" "2019-06-12" "2019-06-13"

Lubridate - The tricky thing about months

In general lubridate returns the same day of the month in the next month,

but since the 31st of February doesn’t exist lubridate returns a missing value, NA

jan <- ymd_hms("2010-01-31 03:04:05")
jan + months(1:3)  
## [1] NA                        "2010-03-31 03:04:05 UTC"
## [3] NA

%m+% and %m-% that have different behavior.

Examples

jan <- ymd_hms("2010-01-31 03:04:05")
jan + months(1:3) 
## [1] NA                        "2010-03-31 03:04:05 UTC"
## [3] NA
jan1 <- ymd("2012-01-31")
jan1 %m+% months(1:3)
## [1] "2012-02-29" "2012-03-31" "2012-04-30"
apr1 <- ymd("2018.04.30")
apr1 - months(1:3)
## [1] "2018-03-30" NA           "2018-01-30"
apr1 %m-% months(1:3)
## [1] "2018-03-30" "2018-02-28" "2018-01-30"

Lubridate - Roll Back Date To Last Day Of Previous Month

rollback(dates, roll_to_first = FALSE, preserve_hms = TRUE) Dates: A POSIXct, POSIXlt or Date class object.

Examples

date <- ymd("2010-03-03")
dates <- date + months(0:2)
dates
## [1] "2010-03-03" "2010-04-03" "2010-05-03"
rollback(dates)
## [1] "2010-02-28" "2010-03-31" "2010-04-30"
date <- ymd_hms("2010-03-03 12:44:22")

rollback(date, roll_to_first = TRUE)
## [1] "2010-03-01 12:44:22 UTC"
rollback(date, preserve_hms = FALSE)
## [1] "2010-02-28 UTC"
rollback(date, roll_to_first = TRUE, preserve_hms = FALSE)
## [1] "2010-03-01 UTC"

Lubridate - Comparing intervals and datetimes

y2001 <- ymd("2001-01-01") %--% ymd("2001-12-31")

x21 <- ymd("2001-02-09") %--% mdy("3.23.2001")

ymd("2002-03-30") %within% y2001
## [1] FALSE
x21 %within% y2001
## [1] TRUE

Lubridate - Converting to durations and periods

Examples

int1 <- interval(ymd("2009-01-01"), ymd("2009-08-01"))
as.duration(int1)
## [1] "18316800s (~30.29 weeks)"
as.period(int1)
## [1] "7m 0d 0H 0M 0S"

Tidytext

How would you analyze the content of a book or a tweet..?

Tidytext

Julia Silge and David Robinson developed the package “tidytext”, which makes text mining easier and more effective.

Tidytext Flow: