Wednesday, August 28, 2013

IBNR with the Bornhuetter-Ferguson Method using the R ChainLadder package

This is the third of a trilogy of posts demonstrating how to implement three basic deterministic Property&Casualty/General/Non-Life insurance actuarial techniques using the ChainLadder package in R.

In simplest form, the Bornhuetter-Ferguson ("BF") Method estimates IBNR for an accident/policy/underwriting/origin year (tranch of exposure) as the product of an a-priori estimate of ultimate loss for that exposure and an estimate of the percent of that ultimate loss unknown/unreported/undeveloped at the time:

IBNR = a-prioriEstimatedUltimateLoss * PercentUnreported

Then a BF estimate of ultimate loss is the sum of reported loss known at the time plus IBNR:

EstimatedUltimateLoss = ReportedLoss + IBNR

A-priori estimates of ultimate loss may originate from widespread sources. We will demonstrate with the Loss Ratio Method (see previous post). Estimates of unreported percents are commonly borne from a loss development analysis of a reported loss triangle (see previous post), demonstrated here.

For an example we will use data from the Institute and Faculty of Actuaries Claims Reserving Manual, Sections F & G (see previous posts at links above for online sources). The R statements below following the carat (">") may be copied and pasted into an R session.
  • First, a-priori EstimatedUltimateLoss
> Premium = c(4486, 5024, 5680, 6590, 7482, 8502)
> ELR = .83 
> aprioriEstimatedUltimateLoss = Premium * ELR
  • Next, PercentUnreported
    • The reported loss triangle
> ReportedLossTriangle = matrix(c(
   2777, 3264, 3452, 3594, 3719, 3717,
   3252, 3804, 3973, 4231, 4319,   NA,
   3725, 4404, 4779, 4946,   NA,   NA,
   4521, 5422, 5676,   NA,   NA,   NA,
   5369, 6142,   NA,   NA,   NA,   NA,
   5818,   NA,   NA,   NA,   NA,   NA), nrow = 6, byrow = TRUE)
    • Fire up ChainLadder. Select the volume weighted average age-to-age factors. Append a unity tail under the assumption that the first year is already at ultimate. Cumulate (multiplicatively) in appropriate order. Pause to inspect the cumulative development factors.
> library(ChainLadder)
> selected = attr(ata(ReportedLossTriangle), "vwtd")
> selected = c(selected, 1.000)
> CDF = cumprod(rev(selected))
> round(CDF, 3)
[1] 1.000 0.999 1.027 1.074 1.137 1.333
      • Notice how reversing the order of the age-to-age (ATA) factors before cumulating enables the tail factor to line up with the first, most mature year and the last factor to line up with the last, least mature year. Also, the last observed ATA is less than 1.000 -- as is the second CDF -- so we should expect to see (slightly) negative IBNR for the second-most-mature accident year.
      • Now calculate the unreported percents using the familiar formula
    > PercentUnreported = 1 - 1 / CDF
    • IBNR
    > IBNR = aprioriEstimatedUltimateLoss * PercentUnreported
    > round(IBNR)
    [1]    0   -2  122  379  749 1764
    • For the value of reported loss known at the time we can use ChainLadder's getLatestCumulative function
    > ReportedLoss = getLatestCumulative(ReportedLossTriangle)
    • Finally
    > EstimatedUltimateLoss = ReportedLoss + IBNR

    It is possible to program the steps above into an R function. Is there a ChainLadder volunteer? Contributors welcome! Contact Markus Gesmann or one of the other authors found here or under "Members" here.

    -dmm
    ------------------------------------------------

    Wednesday, July 24, 2013

    Implementing CLFM with the ChainLadder Package

    One of the requirements of the popular stochastic reserving method known as the Mack Method (see for example Dr. Mack's original paper, Murphy's original paper, Barnett & Zehnwirth, and others) is that the actuary select one of the "standard" averages, such as the simple or volume-weighted averages of the observed link ratios. If an actuary's selection differs from one of those then -- strictly speaking -- the results of the papers above do not apply. A recent paper in the journal Variance demonstrated the existence of a broader family of Chain-Ladder Factor Models (CLFM) to which, under certain conditions, many more selected factors conform. If those conditions are satisfied, the selected factor may be considered the "best linear unbiased estimate" (BLUE) of a member of the family, and uncertainty estimates of the chain-ladder central estimates are directly applicable. This post will demonstrate how the ChainLadder package implements CLFM for estimating chain-ladder method uncertainty when link ratios are "selected." (Note: the CLFM's psi function  is not yet fully implemented in ChainLadder.)

    As with a previous post, our example will consider data from the Institute and Faculty of Actuaries Claims Reserving Manual, Section F, "Case Estimates & the Projection of Incurred Claims", p. 9:

    > ReportedLossTriangle <- matrix(c(
       2777, 3264, 3452, 3594, 3719, 3717,
       3252, 3804, 3973, 4231, 4319,   NA,
       3725, 4404, 4779, 4946,   NA,   NA,
       4521, 5422, 5676,   NA,   NA,   NA,
       5369, 6142,   NA,   NA,   NA,   NA,
       5818,   NA,   NA,   NA,   NA,   NA), nrow = 6, byrow = TRUE)
    > library(ChainLadder)

    The observed link ratios, along with the simple and volume-weighted averages, are
    > ata(ReportedLossTriangle)
                 Age
    Accident Year 12-24 24-36 36-48 48-60 60-72
             2007 1.175 1.058 1.041 1.035 0.999
             2008 1.170 1.044 1.065 1.021    NA
             2009 1.182 1.085 1.035    NA    NA
             2010 1.199 1.047    NA    NA    NA
             2011 1.144    NA    NA    NA    NA
             smpl 1.174 1.059 1.047 1.028 0.999
             vwtd 1.173 1.058 1.046 1.027 0.999

    Suppose we made the following selections:
    > selected <- c(1.175, 1.059, 1.045, 1.03, .999)
    Then we can find each link ratio's "consistent" member of the CLFM family (i.e., the model whose expected value equals the selected factor) with the 'CLFMdelta' function
    > CLFMdelta(ReportedLossTriangle, selected)
    12 24 36 48 60 
     3  2 -1  6  1 
    attr(,"foundSolution")
    [1] TRUE TRUE TRUE TRUE TRUE

    Some background: Each member of the CLFM family is distinguished by the exponent of the beginning value of loss that impacts the variance of the age-to-age projection. The exponent need not be an integer. The fact that the attribute "foundSolution" is all TRUE indicates that a member was successfully found for each element of "selected" relative to the ReportedLossTriangle. A "solution" is not guaranteed. For example, CLFMdelta cannot find a solution for a 24-36 selection of 1.06. Refer to the paper for the necessary conditions.

    To continue, save the "solution" (i.e., the index to the CLFM member model) in an R variable:
    > delta <- CLFMdelta(ReportedLossTriangle, selected)

    Now call the MackChainLadder function with the method's "alpha" parameter equal to 2 - delta. (Aside: The CLFM paper formulates the exponent a-la Murphy and Barnett&Zehnwirth; ChainLadder uses Mack's formulation. "CLFMalpha" would have been a more appropriate name relative to the CLFM paper, but ChainLadder's choice of notation had already been made.)
    > MackChainLadder(ReportedLossTriangle, alpha = 2 - delta, est.sigma = "Mack", mse.method = "Independence")
    MackChainLadder(Triangle = ReportedLossTriangle, alpha = 2 - 
        delta, est.sigma = "Mack", mse.method = "Independence")

         Latest Dev.To.Date Ultimate     IBNR Mack.S.E  CV(IBNR)
    2007  3,717       1.000    3,717     0.00 0.00e+00       NaN
    2008  4,319       1.001    4,317    -2.32 9.06e-21 -3.90e-21
    2009  4,946       0.971    5,092   145.61 8.33e+01  5.72e-01
    2010  5,676       0.930    6,106   429.87 1.59e+02  3.70e-01
    2011  6,142       0.878    6,994   851.72 2.59e+02  3.05e-01
    2012  5,818       0.747    7,784 1,966.45 3.64e+02  1.85e-01

                  Totals
    Latest:    30,618.00
    Dev:            0.90
    Ultimate:  34,009.32
    IBNR:       3,391.32
    Mack S.E.:    532.88
    CV(IBNR):       0.16

    In this case, the central estimates for Ultimate Loss and IBNR will be exactly equal to what the actuary would produce under the deterministic Chain-Ladder Method with those selected factors. Furthermore, the total standard error, 532.88, is immediately applicable to the total estimates of Ultimate and IBNR.

    Caveats:
    1. Of course, the standard error does not account for model error. Residuals should be analyzed for potential model misspecification. 
    2. The "psi function" of the CLFM approach has not yet been implemented in ChainLadder. This is a future enhancement currently being considered by the package developers. Contributors are welcome!
    - dmm

    ------------------------------------------------

    Saturday, June 22, 2013

    IBNR with the Loss Ratio Method using the ChainLadder package

    ChainLadder is a package for the R statistical environment that contains various functions for performing loss reserving for Property/Casualty/General Insurance. Mostly, the package's functions are intended to implement sophisticated stochastic models, but many simpler, deterministic methods are relatively easy to perform using helper functions in the package. This post will demonstrate how to calculate IBNR with the Loss Ratio Method.

    In simplest terms, the Loss Ratio Method estimates ultimate loss as the product of premium and an expected loss ratio (ELR):
    EstimatedUltimateLoss = Premium * ELR

    Then, IBNR is the difference between estimated ultimate loss and loss known to date (referred to here as "reported loss"):
    IBNR = EstimatedUltimateLoss - ReportedLoss

    For an example we will look at the data in the Institute and Faculty of Actuaries Claims Reserving Manual, Section G, "Methods Using Loss Ratio & Loss Ratio Projections", p. 4.

    > ReportedLoss <- c(3483, 3844, 3977, 3880, 3261, 1889) # the "current diagonal"
    > Premium <- c(4486, 5024, 5680, 6590, 7482, 8502)
    > ELR <- .83 
    > EstimatedUltimateLoss <- Premium * ELR
    > IBNR <- EstimatedUltimateLoss - ReportedLoss

    Voila. Simple and succinct in R.

    Note: the Claims Reserving Manual gives premium ("aP" in the manual) in reverse accident year order (most recent year first, oldest accident year last), but we enter the accident year premium and loss data in chronological order (oldest accident year first).

    In the example above we expect the same loss ratio for every year, so ELR is a scalar and R knows to replicate that value for every element in the Premium vector before performing the multiplication. If we think 83% is expected for the most recent accident year but is expected to be 2 points less per year before that, we could store that pattern in an ELR vector:

    > ELR <- .83 + (-5:0) * .02

    and the remaining two calculations still hold.

    For a final complication, if we were provided a triangle of reported losses, ChainLadder has a helper function to pull off the most recent diagonal automatically. Again, from the Claims Reserving Manual

    > ReportedLossTriangle <- matrix(c(
       1001, 1855, 2423, 2988, 3335, 3483,
       1113, 2103, 2774, 3422, 3844, NA,
       1265, 2433, 3233, 3977, NA, NA,
       1490, 2873, 3880, NA, NA, NA,
       1725, 3261, NA, NA, NA, NA,
       1889, NA, NA, NA, NA, NA), nrow = 6, byrow = TRUE)

    Note: Use NA's for future values that truly are "not available" yet. The shape of the matrix (6x6) is completely specified by "nrow=6" because 36 values are given. Finally, R stores data in column format and expects data to be entered accordingly, so byrow=TRUE is required here because the data is entered in more readable, row-wise order.

    Then

    > library(ChainLadder) # nothing above required the package
    > ReportedLoss <- getLatestCumulative(ReportedLossTriangle)

    yields the original values above and the remaining calculations work without change.

    This is the first post demonstrating how to carry out deterministic reserving methods with the ChainLadder package in R.

    -dmm

    Wednesday, June 12, 2013

    IBNR with the Chain-Ladder Method using the R ChainLadder package

    ChainLadder is a package for the R statistical environment that contains various functions for performing loss reserving for Property/Casualty/General/Non-Life Insurance. Mostly, the package's functions are intended to implement sophisticated stochastic models, but many simpler, deterministic methods are relatively easy to perform using helper functions in the package. This post will demonstrate how to calculate IBNR with the Chain-Ladder Method using the ChainLadder package.

    In simplest terms, the Chain-Ladder Method estimates ultimate loss as the product ("the development") of loss known to date (referred to here as "reported loss") and a loss development factor (LDF):

    EstimatedUltimateLoss = ReportedLoss * LDF            (1)

    Then IBNR is the difference between estimated ultimate loss and loss known to date:

    IBNR = EstimatedUltimateLoss - ReportedLoss

    Sources for loss development factors (LDFs) include a formulaic calculation from a loss "triangle" (loss data arranged in wide, longitudinal format), judgmental selections based partially on data in a triangle and partially on other information, and benchmark statistics.

    Aside: Algorithm (1) is more generally referred to as the Loss Development Method, particularly when the LDFs come from a source separate from the source of the ReportedLoss being "developed" (projected to ultimate value). When link ratios from a triangle are chained together to form LDFs, as is demonstrated in this post below, the "Chain-Ladder Method" merits the full meaning of its name.

    For an example we will look at the data in the Institute and Faculty of Actuaries Claims Reserving Manual, Section F, "Case Estimates & the Projection of Incurred Claims", p. 9:

    > ReportedLossTriangle <- matrix(c(
       2777, 3264, 3452, 3594, 3719, 3717,
       3252, 3804, 3973, 4231, 4319,   NA,
       3725, 4404, 4779, 4946,   NA,   NA,
       4521, 5422, 5676,   NA,   NA,   NA,
       5369, 6142,   NA,   NA,   NA,   NA,
       5818,   NA,   NA,   NA,   NA,   NA), nrow = 6, byrow = TRUE)

    As noted in the "Loss Ratio Method" post: use NA's for future values that truly are "not available" yet; the shape of the matrix (6x6) is completely specified by "nrow=6" because 36 values are given; and R stores data in column format and expects data to be entered accordingly, so byrow=TRUE is required because the data is entered in more readable, row-wise order. 

    Let's give "names" to the rows and columns to aid our comprehension of the triangle when displayed. We will assume
    • the exposure periods that generate the six rows of observations are the most recent six accident years
    • the observations in a row are taken at the end of the last day of consecutive accident years starting with 12/31/2007, and that the time between each observation and the beginning of its accident year is measured in months.
    > rownames(ReportedLossTriangle) <- 2007:2012
    > colnames(ReportedLossTriangle) <- 12 * (1:6)

    For a final bit of accouterments, let's name the row and column dimensions:

    > names(dimnames(ReportedLossTriangle)) <- c("Accident Year", "Age")

    Now the triangle displays informatively:

    > ReportedLossTriangle
                 Age
    Accident Year   12   24   36   48   60   72
             2007 2777 3264 3452 3594 3719 3717
             2008 3252 3804 3973 4231 4319   NA
             2009 3725 4404 4779 4946   NA   NA
             2010 4521 5422 5676   NA   NA   NA
             2011 5369 6142   NA   NA   NA   NA
             2012 5818   NA   NA   NA   NA   NA

    Chain-Ladder LDF's are estimated as the product of "link-ratios" (also know as "report-to-report"  RTR  development factors, or "age-to-age"  ATA  development factors) derived from the triangle. To calculate the triangle's link ratios, use ChainLadder's ata function.

    > library(ChainLadder) # nothing above yet required the package
    > ata(ReportedLossTriangle)
                 Age
    Accident Year 12-24 24-36 36-48 48-60 60-72
             2007 1.175 1.058 1.041 1.035 0.999
             2008 1.170 1.044 1.065 1.021    NA
             2009 1.182 1.085 1.035    NA    NA
             2010 1.199 1.047    NA    NA    NA
             2011 1.144    NA    NA    NA    NA
             smpl 1.174 1.059 1.047 1.028 0.999
             vwtd 1.173 1.058 1.046 1.027 0.999

    Note:
    1. The dimension names and row names are inherited from the triangle and the column names are automatically generated from the column names of the triangle
    2. Each column's simple average ("smpl") and volume-weighted average ("vwtd") are "attributes" of the matrix returned by the 'ata' function and are automatically displayed when shown at the console
    Selecting Factors
    To learn how to select the simple or volume-weighted average calculated by ata, it would be helpful to inspect the contents of the object it returns. Save the result to a variable, then use R's str function:
    > y <- ata(ReportedLossTriangle)
    > str(y)
     ata [1:5, 1:5] 1.18 1.17 1.18 1.2 1.14 ...
     - attr(*, "dimnames")=List of 2
      ..$ Accident Year: chr [1:5] "2007" "2008" "2009" "2010" ...
      ..$ Age          : chr [1:5] "12-24" "24-36" "36-48" "48-60" ...
     - attr(*, "class")= chr [1:2] "ata" "matrix"
     - attr(*, "smpl")= Named num [1:5] 1.174 1.059 1.047 1.028 0.999
      ..- attr(*, "names")= chr [1:5] "12-24" "24-36" "36-48" "48-60" ...
     - attr(*, "vwtd")= Named num [1:5] 1.173 1.058 1.046 1.027 0.999
      ..- attr(*, "names")= chr [1:5] "12-24" "24-36" "36-48" "48-60" ...

    Let's read this:
    • y is an 'ata' object with two dimensions, of length 5 in each dimension and starting with the first five values shown
    • y has a "dimnames" attribute, which means its two dimensions are themselves "named" -- "Accident Year" and "Age" -- have the values shown (and more ...)
    • y has three additinonal attributes: 'class', 'smpl' and 'vwtd' (NoteAttributes are R's way of attaching useful additional information to an object that can be used by downstream processes for special purposes)
      • classy has two 'classes': "ata" and "matrix". In ChainLadder the "ata" class allows such objects to display as above. R's general "matrix" class allows matrices to display as a two dimensional array and perform matrix operations, among other things.
      • smpl: simple averages of the development periods' link ratios
      • vwtd: volume-weighted averages of the development periods' link ratios
    To extract the contents of an attribute of an object, use the attr function with the name of the object and the name of its attribute. Let's select the set of simple average link-ratio factors and store those values in the variable rtr:
    > rtr <- attr(y, "smpl")
    > rtr
        12-24     24-36     36-48     48-60     60-72 
    1.1741319 1.0585053 1.0470062 1.0277895 0.9994622 
    
    Actuarially speaking, we know that the first link-ratio will be applied to the most recent accident year. However, in the previous post we saw that the getLatestCumulative function will place the most recent accident year's value last. Therefore reverse the order of the elements in rtr before accumulating their product into an LDF. Finally, if we assume that the oldest accident year is already at ultimate, then the tail factor is unity, which is the sixth factor prepended to rtr. Putting all this together, we estimate ultimate loss and IBNR.
    > rtr <- c(1.000, rev(rtr))
    > LDF <- cumprod(rev(rtr))
    > ReportedLoss <- getLatestCumulative(ReportedLossTriangle)
    > EstimatedUltimateLoss <- ReportedLoss * LDF
    > IBNR <- EstimatedUltimateLoss - ReportedLoss
    > IBNR
         2007      2008      2009      2010      2011      2012 
     647.2483 1048.7603 1489.9588 1915.1175 2067.9313 1958.8447 

    Of course, one need not select the simple or volume-weighted average but may round the values or determine an rtr vector using supporting information and actuarial judgment. Here is an example -- note
    • the factors are entered in the "usual order" -- least mature development period first
    • a 5% tail is added at the end
    • the rtr order is reversed before the cumulative LDF is calculated
    > rtr <- c(1.175, 1.06, 1.05, 1.03, 1, 1.05)
    > rtr <- c(1.175, 1.06, 1.05, 1.03, 1, 1.05)
    > LDF <- cumprod(rev(rtr))
    > EstimatedUltimateLoss <- ReportedLoss * LDF
    > IBNR <- EstimatedUltimateLoss - ReportedLoss
    > IBNR
         2007      2008      2009      2010      2011      2012 
     185.8500  215.9500  403.0990  769.5237 1251.1837 2410.7387



    -dmm