--- title: "Introduction to almanac" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to almanac} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This vignette is designed to introduce some of the common terminology used in almanac to get you up to speed on how to use the package. Along the way, we will see example usage of a number of the building blocks that will allow you to construct more complex recurrence objects. ```{r setup} library(almanac) ``` ## Recurrence Rules A _recurrence rule_ is a structured object that determines if a date should be counted as an _event_ or not. At the most basic level, the job of a recurrence rule is to search through a pre-specified range of dates and flag any event dates in that range. To build a recurrence rule, you start with a base recurrence frequency. There are 4 frequencies to choose from: - `daily()` - `weekly()` - `monthly()` - `yearly()` Take the `yearly()` frequency, for example. By default, this will declare that 1 value per year is an event. ```{r} on_yearly <- yearly() on_yearly ``` The return value of `yearly()` is a _rrule_ object, short for "recurrence rule". This base object is all you need to start doing something useful. For example, you can pass this rrule to `alma_search()` along with a `from` and `to` date, and it will return all of the events in that date range. ```{r} alma_search(from = "1990-01-01", to = "1995-12-31", on_yearly) ``` What if we want a yearly value, but we want it on January 5th every year, rather than on the 1st? `yearly()` has an important argument called `since` that controls two things: the start date of the recurrence rule, and information such as the month, or the day of the month to use if no other conditions have been specified to override those. The default of `since` is set to `1900-01-01`, but this is arbitrary (see `almanac_since()`). It is because of this default that in the above example with `alma_search()` we get values on January 1st. Let's change that. ```{r} on_yearly_jan_5 <- yearly(since = "1990-01-05") alma_search("1990-01-01", "1995-12-31", on_yearly_jan_5) ``` Now that the `since` date has been set to 1990, if we try and find yearly dates before 1990, they will not be included. ```{r} # Same result as above, because the 1988 and 1989 dates are not included. alma_search("1988-01-01", "1995-12-31", on_yearly_jan_5) ``` There is also an `until` argument to `yearly()` that controls the upper bound of the range to look in. This is arbitrarily set to `2100-01-01`, but can be expanded or contracted as required (see `almanac_until()`). ## Event Set I mentioned earlier that the job of a recurrence rule is to flag dates in a pre-specified range to be events or not. The dates that are flagged as events are known as the _event set_. In the previous example, we used `alma_search()` to extract a subset of dates from the event set that were between `from` and `to`. You can get the entire event set with `alma_events()`. Notice that this is bounded by our custom `since` date, and the default `until` upper bound. Otherwise we'd have an infinite event set, which is nice in theory but bad in practice. ```{r} alma_events(on_yearly_jan_5) ``` You can also check if an existing date is included in a recurrence rule's event set with `alma_in()`. ```{r} # Uses the 10th of the month, pulled from `since` on_monthly <- monthly(since = "1990-01-10") x <- as.Date("2000-01-08") + 0:5 x x_in_set <- alma_in(x, on_monthly) x_in_set x[x_in_set] ``` ### Caching almanac attempts to be smart by caching the event set of a recurrence rule the first time that it is queried. This means that the first usage of a recurrence rule is generally slower than repeated uses. ```{r} since <- "1990-01-01" on_weekly <- weekly(since = since) # The first time is "slow" system.time(alma_search(since, "2000-01-01", on_weekly)) # Repeated access is fast system.time(alma_search(since, "2000-01-01", on_weekly)) # The entire event set is cached, so even if you change the arguments, # the operation is still fast. system.time(alma_search(since, "1990-05-01", on_weekly)) ``` ## Recurrence Conditions So far we have worked with the base recurrence rules. Things get _much_ more interesting when we start adding extra conditions to these rules. Conditions are ways to _limit_ or _expand_ a given recurrence rule to hone in on recurring dates that you are particularly interested in. All condition functions in almanac start with `recur_*()`. For example, let's take a monthly rule, which defaults to give us 1 day per month, and _expand_ it to give us every 4th and 16th day of the month. ```{r} on_4th_and_16th <- monthly(since = "2000-01-01") %>% recur_on_day_of_month(c(4, 16)) alma_search("2000-01-01", "2000-06-01", on_4th_and_16th) ``` An important thing to note here is that even though our `since` date is on the first of the month, we are "overriding" that with the recurrence condition, so that information is not used. Recurrence rules can continually be added to further refine your rule. When you add a condition to a rule, you get another rule back. Let's try creating a rule for the recurring holiday, Labor Day. This recurs on the first Monday of September, yearly. To do this, we will: - Use a `yearly()` base since this happens 1 time per year. - Use `recur_on_month_of_year()` to hone in on September. - Use `recur_on_day_of_week()` to hone in on the first Monday of the month. ```{r} on_labor_day <- yearly() %>% recur_on_month_of_year("Sep") %>% recur_on_day_of_week("Monday", nth = 1) alma_search("2000-01-01", "2005-01-01", on_labor_day) ``` The `nth` argument of `recur_on_day_of_week()` is especially useful for selecting from the end of the month. If we wanted the _last_ Monday in September instead, we could do: ```{r} on_last_monday_in_sept <- yearly(since = "2000-01-01") %>% recur_on_month_of_year("Sep") %>% recur_on_day_of_week("Monday", nth = -1) alma_search("2000-01-01", "2005-01-01", on_last_monday_in_sept) ``` ## Recurrence Sets Recurrence rules are powerful tools on their own, but they aren't enough to solve every task. Say you want to construct a rule that includes both Christmas and Labor Day as events. It would be impossible to construct this kind of event set using a single rule, but if you could _bundle_ multiple rules together, one for Christmas and one for Labor Day, then it would be possible. An _rset_ is a bundle of recurrence schedules. A recurrence schedule, or rschedule, is an overarching term for both rrules and rsets. There are three types of rsets in almanac. Each create their event set by performing some kind of set operation on the event sets of the underlying rschedules that you added to the set. - `runion()` takes the union. - `rintersect()` takes the intersection. - `rsetdiff()` takes the set difference. The most useful rset is runion, as this allows you construct an event set that, for example, falls on multiple holidays and all weekends. The following creates an runion from rrules based on Christmas and Labor Day. ```{r} on_christmas <- yearly() %>% recur_on_month_of_year("Dec") %>% recur_on_day_of_month(25) christmas_or_labor_day <- runion( on_christmas, on_labor_day ) alma_search("2000-01-01", "2002-01-01", christmas_or_labor_day) christmas_or_labor_day_except_2000_labor_day <- rsetdiff( christmas_or_labor_day, rcustom("2000-09-04") ) alma_search("2000-01-01", "2002-01-01", christmas_or_labor_day_except_2000_labor_day) ``` A recurrence set is a critical data structure in almanac. It serves as a general container to dump all of your company's holiday and weekend recurrence rules.