An Introduction to Actuarial Modelling With Heavylight¶
This page provides a guided walkthrough of how to develop and run an actuarial model using heavylight
.
Before you start¶
- Ensure you have Python 3.8 or newer installed: Download Python
- Install heavylight from pypi:
python -m pip install heavylight
orpip install heavylight
- If working interactively, install jupyter lab
Once installed, check libraries are installed, and check the version of heavylight:
import heavylight
import pandas as pd
import numpy as np
print(f"Version: {heavylight.__version__ = }")
Version: heavylight.__version__ = '1.0.10'
Overview¶
This document covers the following stages in development and execution of the model:
- define the model
- pass in data (or simulate)
- pass in assumptions
- run the model
- viewing the results
In practice, defining, loading data and loading assumptions need to be carried out in parallel, so we will define several iterations of the model (Model1
, Model2
...), introducing one concept at a time. You can skip to the end to see the final iteration of the model.
This guide is designed to give an introduction, so doesn't cover some of the more advanced techniques around performance optimisation (vectorisation using numpy/tensors) or table/model design.
Define the model¶
The model is defined as a subclass of heavylight.Model
, and each value or variable we want to calculate is defined as a method (a function belonging to the class).
In actuarial models we are usually interested in projecting through time, this corresponds to methods with one argument t
. When the model is run these are automatically calculated, for each time period specified up to proj_len
.
self
is always the first parameter in a python class method. This allows us to refer to the class instance, including any other data attributes or methods, without passing them into the method separately.
Want to know more about classes? The python class tutorial can help: https://docs.python.org/3/tutorial/classes.html
class Model1(heavylight.Model):
def t(self, t):
"""time"""
return t
def num_alive(self, t):
"""probability that life is alive at time t, given alive at time 0"""
if t == 0:
return 1
else:
return self.num_alive(t - 1) * 0.99
We can quickly test the model runs by creating an instance of it, and assign it to the variable model1
. When an model is instanced, it is run as well, with the variable proj_len
controlling how many time periods it is run for. As Python uses zero-indexing this runs from 0, i.e. proj_len = 5
will calculate for t = 0, 1, 2, 3, 4.
Once the model has run, we can view the results as a dataframe using the .df
property.
model1 = Model1(proj_len = 5)
model1.df
t | num_alive | |
---|---|---|
0 | 0 | 1.000000 |
1 | 1 | 0.990000 |
2 | 2 | 0.980100 |
3 | 3 | 0.970299 |
4 | 4 | 0.960596 |
Exploring the results, and calculating more values¶
We can see that num alive runs off as expected, 1, 0.99, 0.99^2, 0.99^3...
We can view the values for a variable using .values
:
model1.num_alive.values
[1, 0.99, 0.9801, 0.9702989999999999, 0.96059601]
Likewise, we can sum up all of them (in this case it's not too meaningful as we aren't looking at cashflows yet!)
model1.num_alive.sum()
4.90099501
We can obtain individual values by calling them with the appropriate parameter (t
in this case)
model1.num_alive(4)
0.96059601
We can view a dataframe just of the variables results. This isn't that useful when the function is only dependent on t, however if we have functions with multiple keys such as both t
and duration
, then we can view all the output.
model1.num_alive.df
t | num_alive | |
---|---|---|
0 | 0 | 1.000000 |
1 | 1 | 0.990000 |
2 | 2 | 0.980100 |
3 | 3 | 0.970299 |
4 | 4 | 0.960596 |
Running the projection to time 4 doesn't mean we can't obtain further values, it just means they aren't precomputed (and in extreme cases you may get a recursion limit or stack overflow error). We can query any value by calling it.
model1.num_alive(20)
0.8179069375972307
This computes all the intermediate values - we can check this in the dataframe.
Note: the variable t
hasn't been calculated for t > 4, this is because the value of t
has never been requested for t > 4, as the t(t)
method isn't called by num_alive
.
model1.df
t | num_alive | |
---|---|---|
0 | 0 | 1.000000 |
1 | 1 | 0.990000 |
2 | 2 | 0.980100 |
3 | 3 | 0.970299 |
4 | 4 | 0.960596 |
Passing data to the model¶
Our model isn't going to be that useful unless we can pass in data, for this model we are interested in the probability of being alive at each valuation point, the probability of dying then, and the expected claim cost arising. (Sum Assured x Number of Deaths)
We'll add in two new methods, num_deaths
to count the expected deaths occuring, and age
to calculate the current age.
num_deaths
is dependent on the number alive at the start of the period, and the mortality rate q_x
: as we've not defined a mortality table we'll fudge this for now as being a linear function of age $q_x(t) = 0.001 (x + t)$
For the age
variable, we want to get this from data at t = 0
from the field init_age
. There are lots of different ways to store and pass in data, for example in a dictionary, using a dataclass, or as individual variables. For this example we'll use a dictionary, which will be stored in the model as data
. We can then query the data field using self.data["init_age]
.
class Model2(heavylight.Model):
def t(self, t):
"""time"""
return t
def num_alive(self, t):
"""probability that life is alive at time t, given alive at time 0"""
if t == 0:
return 1
else:
return self.num_alive(t - 1) - self.num_deaths(t - 1) # deaths moved to a separate method
def num_deaths(self, t):
"""number of deaths occuring between time t-1 and t"""
if t < 0:
return 0
else:
q_x = 0.001 * self.age(t)
return self.num_alive(t) * q_x
def age(self, t):
"""age in years at time t"""
if t == 0:
return self.data["init_age"]
elif t % 12 == 0:
return self.age(t - 1) + 1
else:
return self.age(t - 1)
We pass the data in as pass of the model instance creation, using the key data
means it is accessible as self.data
within the model, in our case self.data['init_age']
will look up init_age in the dictionary and return 32
.
data2 = dict(init_age = 32)
model2 = Model2(data=data2, proj_len=10*12 + 1)
model2.df
t | num_alive | num_deaths | age | |
---|---|---|---|---|
0 | 0 | 1.000000 | 0.032000 | 32 |
1 | 1 | 0.968000 | 0.030976 | 32 |
2 | 2 | 0.937024 | 0.029985 | 32 |
3 | 3 | 0.907039 | 0.029025 | 32 |
4 | 4 | 0.878014 | 0.028096 | 32 |
... | ... | ... | ... | ... |
116 | 116 | 0.013636 | 0.000559 | 41 |
117 | 117 | 0.013077 | 0.000536 | 41 |
118 | 118 | 0.012541 | 0.000514 | 41 |
119 | 119 | 0.012027 | 0.000493 | 41 |
120 | 120 | 0.011534 | 0.000484 | 42 |
121 rows × 4 columns
We've got more data here to look at now, as .df
outputs a pandas dataframe, we can access all the normal dataframe features, for example .plot()
gives us access to the plotting interface.
model2.df.plot(x='t', y='num_alive');
Adding a table to store structured assumptions¶
Heavylight comes with the Table
class which is optional, and provides a high performance table capable of handling most actuarial tables, and loading in data from pandas dataframes, Excel, and CSV formats. See the main Tables documentation for the latest details.
Here we will create a mortality table, rather than loading it it, although in practice most tables will be pre-computed.
We start by creating a dataframe, with ages running from 16 to 130. The Table
class needs to know what type the key is, in this case we are using integers, and we want to ensure that the lookup doesn't go past the lower and upper values (so any values under 16 or over 130 will look up 16 or 130 respectively), so we are specifing type int_bound
. The overall key is therefore age|int_bound
.
df_mort = pd.DataFrame(
{'age|int_bound': range(16, 131)}
)
df_mort
age|int_bound | |
---|---|
0 | 16 |
1 | 17 |
2 | 18 |
3 | 19 |
4 | 20 |
... | ... |
110 | 126 |
111 | 127 |
112 | 128 |
113 | 129 |
114 | 130 |
115 rows × 1 columns
Our table also need values (mortality rates). Mortality rates are generally exponential, so we'll populate q_x
using an exponential function - this has just been picked for ease of use:
$q_x = 0.002e^{0.064(x-30)} + 0.001$
We also use np.clip
to ensure this falls in the range $[0, 1]$ as it is a probability.
# 0.02*2.64**(0.04 * (t + 30)) + 0.002
df_mort['q_x|float'] = 0.002 * np.exp(0.064 * (df_mort['age|int_bound'] - 30)) + 0.001
df_mort['q_x|float'] = np.clip(df_mort['q_x|float'], 0, 1) # ensure between 0 and 1
df_mort
age|int_bound | q_x|float | |
---|---|---|
0 | 16 | 0.001816 |
1 | 17 | 0.001870 |
2 | 18 | 0.001928 |
3 | 19 | 0.001989 |
4 | 20 | 0.002055 |
... | ... | ... |
110 | 126 | 0.932827 |
111 | 127 | 0.994414 |
112 | 128 | 1.000000 |
113 | 129 | 1.000000 |
114 | 130 | 1.000000 |
115 rows × 2 columns
Let's check how that looks with a plot:
df_mort.plot(x='age|int_bound', y='q_x|float');
This looks fine, so we can create a heavylight Table
instance using it.
mort_table = heavylight.Table(df_mort)
Tables are looked up based on keys, using the square brackets []
, so to get the rate for age 110:
mort_table[110]
0.3356707392422082
Incorporating the mortality table into the model¶
We could now create a basis dictionary (much like the data dictionary), however for this case we'll use a small class for the basis. This will give us the option to use .item
notation rather than dictionary item lookups ['item']
. Either approach works.
At the same time as adding the mortality table, let's at a flat discount rate, and precompute the monthly equivalent rate. Note that power in python is accomplished with the **
operator, e.g. if we wrote =2^3
in Excel, the python equivalent would be 2**3
.
Python operators are listed here: https://docs.python.org/3/library/operator.html#mapping-operators-to-functions
class Basis:
def __init__(self, mort_table, disc_rate_pa):
self.mort_table = mort_table
self.disc_rate_pa = disc_rate_pa
self.disc_rate_pm = (1 + disc_rate_pa) ** (1/12) - 1
We can create a class instance with our specific values, passing in the mortality table (mort_table
) and a flat discount rate of 4% per annum
basis3 = Basis(mort_table=mort_table, disc_rate_pa = 0.04)
basis3.mort_table[120]
0.6356966578357008
Doing a quick check that the monthly discount rate is calculated:
basis3.disc_rate_pm
0.0032737397821989145
Updating the model to look up the mortality table, the key change is the addition of q_x
which looks up self.basis.mort_table
.
class Model3(heavylight.Model):
def t(self, t):
"""time"""
return t
def num_alive(self, t):
"""probability that life is alive at time t, given alive at time 0"""
if t == 0:
return 1
else:
return self.num_alive(t - 1) - self.num_deaths(t - 1) # deaths moved to a separate method
def num_deaths(self, t):
"""number of deaths occuring between time t-1 and t"""
if t < 0:
return 0
else:
return self.num_alive(t) * self.q_x(t)
def q_x(self, t):
return self.basis.mort_table[self.age(t)] / 12 # somewhat crude to get monthly mortality
def age(self, t):
"""age in years at time t"""
if t == 0:
return self.data["init_age"]
elif t % 12 == 0:
return self.age(t - 1) + 1
else:
return self.age(t - 1)
Running this - picking a longer projection and an older age so we have higher q_x values:
data3 = dict(init_age = 65)
model3 = Model3(data=data3, basis=basis3, proj_len = 361)
model3.df
t | num_alive | num_deaths | q_x | age | |
---|---|---|---|---|---|
0 | 0 | 1.000000 | 0.001649 | 0.001649 | 65 |
1 | 1 | 0.998351 | 0.001646 | 0.001649 | 65 |
2 | 2 | 0.996705 | 0.001643 | 0.001649 | 65 |
3 | 3 | 0.995061 | 0.001641 | 0.001649 | 65 |
4 | 4 | 0.993421 | 0.001638 | 0.001649 | 65 |
... | ... | ... | ... | ... | ... |
356 | 356 | 0.192231 | 0.001942 | 0.010100 | 94 |
357 | 357 | 0.190289 | 0.001922 | 0.010100 | 94 |
358 | 358 | 0.188367 | 0.001902 | 0.010100 | 94 |
359 | 359 | 0.186465 | 0.001883 | 0.010100 | 94 |
360 | 360 | 0.184581 | 0.001986 | 0.010762 | 95 |
361 rows × 5 columns
We can plot age
against q_x
to check it is broadly exponential:
model3.df.plot(x='age', y='q_x');
And likewise, have a look at the pattern of deaths - starting around 0.02, and tailing off. The saw type pattern is because ages are integer, so the same rate is used for 12 valuation periods.
model3.df.plot(x='t', y='num_deaths');
Adding Cashflows - expected claim, premium and present values¶
The q_x/12 approximation is a little poor, so this version replaces it with an appropriately 12thly-compounding version.
There are a few different ways we can work out present values, we could either pull out cashflows from the model and use a separate npv()
function, create a discounting model, however for simplicity in this case we'll add a v
function. In line with general actuarial practice which uses $v^t$: https://en.wikipedia.org/wiki/Actuarial_present_value
pv_claim
is an intermediate function, if we take the sum of this, it is the equivalent of Excel SUMPRODUCT
on the v
column and the expected_claim
column.
class Model4(heavylight.Model):
basis: Basis # added here to allow code auto-completion
data: dict
def t(self, t):
"""time"""
return t
def num_alive(self, t):
"""probability that life is alive at time t, given alive at time 0"""
if t == 0:
return 1
else:
return self.num_alive(t - 1) - self.num_deaths(t - 1) # deaths moved to a separate method
def num_deaths(self, t):
"""number of deaths occuring between time t-1 and t"""
if t < 0:
return 0
else:
return self.num_alive(t) * self.q_x_m(t)
def q_x(self, t):
"""Annual mortality rate"""
return self.basis.mort_table[self.age(t)]
def q_x_m(self, t):
"""Monthly mortality rate"""
return 1 - (1 - self.q_x(t)) ** (1/12)
def age(self, t):
"""age in years at time t"""
if t == 0:
return self.data["init_age"]
elif t % 12 == 0:
return self.age(t - 1) + 1
else:
return self.age(t - 1)
def expected_claim(self, t):
return self.data["sum_assured"] * self.num_deaths(t)
def v(self, t):
"""Present value factor for time t, discounting back to time 0"""
if t == 0:
return 1.0
else:
return self.v(t - 1) / (1 + self.basis.disc_rate_pm)
def pv_claim(self, t):
"""present value of the expected claim occuring at time t"""
return self.expected_claim(t) * self.v(t)
We've added sum_assured
in the code as a data item, so we'll update our data dictionary with sum_assured
. The basis is unchanged from the basis object we created earlier, called basis3
.
data4 = dict(init_age = 65,
sum_assured = 100_000,
)
model4 = Model4(data=data4, basis=basis3, proj_len = 361)
model4.df
t | num_alive | num_deaths | q_x | q_x_m | age | expected_claim | v | pv_claim | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1.000000 | 0.001664 | 0.019787 | 0.001664 | 65 | 166.403394 | 1.000000 | 166.403394 |
1 | 1 | 0.998336 | 0.001661 | 0.019787 | 0.001664 | 65 | 166.126493 | 0.996737 | 165.584413 |
2 | 2 | 0.996675 | 0.001659 | 0.019787 | 0.001664 | 65 | 165.850053 | 0.993485 | 164.769462 |
3 | 3 | 0.995016 | 0.001656 | 0.019787 | 0.001664 | 65 | 165.574073 | 0.990243 | 163.958523 |
4 | 4 | 0.993360 | 0.001653 | 0.019787 | 0.001664 | 65 | 165.298552 | 0.987012 | 163.151574 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
356 | 356 | 0.181641 | 0.001945 | 0.121199 | 0.010709 | 94 | 194.513192 | 0.312376 | 60.761245 |
357 | 357 | 0.179696 | 0.001924 | 0.121199 | 0.010709 | 94 | 192.430222 | 0.311357 | 59.914431 |
358 | 358 | 0.177772 | 0.001904 | 0.121199 | 0.010709 | 94 | 190.369557 | 0.310341 | 59.079418 |
359 | 359 | 0.175868 | 0.001883 | 0.121199 | 0.010709 | 94 | 188.330959 | 0.309328 | 58.256043 |
360 | 360 | 0.173985 | 0.001993 | 0.129143 | 0.011457 | 95 | 199.334525 | 0.308319 | 61.458555 |
361 rows × 9 columns
We can now calculate the present value of claims, using the .sum() method:
model4.pv_claim.sum()
47228.33160491008
Finally - let's run the model a few times in a loop to see the sensitivity of claims to age:
for age in [35, 45, 55, 65]:
proj_result = Model4(data=dict(init_age=age, sum_assured=100_000),basis=basis3, proj_len=361)
pv_claim = proj_result.pv_claim.sum()
print(f"Age: {age} → PV Claim: {pv_claim:0,.2f}")
Age: 35 → PV Claim: 12,358.30 Age: 45 → PV Claim: 20,323.02 Age: 55 → PV Claim: 32,279.74 Age: 65 → PV Claim: 47,228.33
From here the model result can be used downstream in other code, or exported to an file, e.g. using the pandas method to_excel
or to_csv
. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html
What next?¶
That completes this introduction, you could try adding some variables of your own to the model, for example:
- Adding an
expected_premium
function, which takesmonthly_premium
from data. - Adding a maximum policy term (at the moment it is controlled by
proj_len
). - Use a yield curve rather than a flat rate