RL Anonymity (with Python)¶
An experimental effort to use reinforcement learning techniques for data anonymization. The project repository is at RL anonymity (with Python).
Contents¶
Conceptual overview¶
The term data anonymization refers to techiniques that can be applied on a given dataset, \(D\), such that it makes it difficult for a third party to identify or infer the existence of specific individuals in \(D\). Anonymization techniques, typically result into some sort of distortion of the original dataset. This means that in order to maintain some utility of the transformed dataset, the transofrmations applied should be constrained in some sense. In the end, it can be argued, that data anonymization is an optimization problem meaning striking the right balance between data utility and privacy.
Reinforcement learning is a learning framework based on accumulated experience. In this paradigm, an agent is learning by iteracting with an environment without (to a large extent) any supervision. The following image describes, schematically, the reinforcement learning framework .

Reinforcement learning paradigm.¶
The agent chooses an action, \(A_t \in \mathbb{A}\), to perform out of predefined set of actions \(\mathbb{A}\). The chosen action is executed by the environment instance and returns to the agent a reward signal, \(R_{t+1}\), as well as the new state, \(S_{t + 1}\), that the enviroment is in. The overall goal of the agent is to maximize the expected total reward i.e.
The framework has successfully been used to many recent advances in control, robotics, games and elsewhere.
In this work we are intersted in applying reinforcment learning techniques, in order to train agents to optimally anonymize a given data set. In particular, we want to consider the following two scenarios
A tabular data set is to be publicly released
A data set is behind a restrictive API that allows users to perform certain queries on the hidden data set.
For the first scenario, let’s assume that we have in our disposal two numbers \(DIST_{min}\) and \(DIST_{max}\). The former indicates the minimum total data set distortion that it should be applied in order to satisfy some minimum safety criteria. The latter indicates the maximum total data set distortion that it should be applied in order to satisfy some utility criteria. Note that the same idea can be applied to enforce constraints on how much a column should be distorted. Furtheremore, let’s assume the most common transformations applied for data anonymization
Generalization
Suppresion
Permutation
Pertubation
Anatomization
We can conceive the above transformations as our action set \(\mathbb{A}\). We can now cast the data anonymity problem into a form suitable for reinforcement learning. Specifically, our goal, and the agent’s goal in that matter, is to obtain a policy $pi$ of transformations such that by following $pi$, the data set total distortion will be into the interval \([DIST_{min}, DIST_{max}]\). This is done by choosing actions/transformations from \(\mathbb{A}\). This is shown schematically in the figure below

Data anonymization using reinforcement learning.¶
Thus the environment is our case is an entity that encapsulates the original data set and controls the actions applied on it as well as the reward signal \(R_{t+1}\) and the next state \(S_{t+1}\) to be presented to the agent.
Nevertheless, there are some caveats that we need to take into account. We summarize these below.
First, we need a reward policy. The way we assign rewards implicitly specifies the degree of supervision we allow. For instance we could allow for a reward to be assigned every time a transformation is applied. This strategy allows for faster learning but it leaves little room for the agent to come up with novel strategies. In contrast, returning a reward at the end of the episode, although it increases the training time, it allows the agent to explore novel strategies. Related to the reward assignement is also the follwing issue. We need to reward the agent in a way that it is convinced that it should explore transformations. This is important as we don’t want to the agent to simply exploit around the zero distortion point. The second thing we need to take into account is that the metric we use to measure the data set distortion plays an important role. Thirdly, we need to hold into memory two copies of the data set. One copy that no distortion is applied and one copy that we distort somehow during an episode. We need this setting so that we are able to compute the column distortions. Fourthly, we need to establish the episode termination criteria i.e. when do we consider that an episode is complete. Finally, as we assume that a data set may contain strings, floating point numbers as well as integers, then computed distortions are normalized. This is needed in order to avoid having large column distortions, e.g. consider a salary column being distorted, and also being able to sum all the column distortions in a meanigful way.
Installation¶
The following packages are required:
You can install there as usual with pip
.
pip install -r requirements.txt
Installation of the package is done via setuptools
python setup.py
Run tests¶
The is a series of tests to verify the implementation. You can executed these by running the script execute_tests_with_coverage.sh
.
Generate documentation¶
You will need Sphinx in order to generate the API documentation. Assuming that Sphinx is already installed on your machine execute the following commands (see also Sphinx tutorial).
sphinx-quickstart docs
sphinx-build -b html docs/source/ docs/build/html
Examples¶
Some examples can be found below
Q-learning on a three columns dataset¶
Overview¶
In this example, we use a tabular Q-learning algorithm to anonymize a data set with three columns. In particular, we discretize the total dataset distortion into bins. Another approach could be to discretize the distortion of each column into bins and create tuples of indeces representing a state. We follow the latter approach in another example.
Q-learning¶
Q-learning is one of the early breakthroughs in the field of reinforcement learning [1]. It was first introduced in [2]. Q-learning is an off-policy algorithm where the learned state-action value function \(Q(s, \alpha)\) directly approximates the optimal state-action value function \(Q^*\). This is done independently of the policy \(\pi\) being followed [1].
The Q-learning algorithm is an iterative algorithm where we iterate over a number of episodes. At each episode the algorithm steps over the environment for a user-specified number steps it executes an action which results in a new state. This is shown collectively in the image below

Q-learning algorithm. Image from [1].¶
At each episode step, the algorithm updates \(Q(s, \alpha)\) according to:
where \(\alpha\) is a user-defined learning factor and \(\gamma\) is the user-defined discount factor. The algorithm requires the following user-defined input
Number of episodes
Number of steps per episode
\(\gamma\)
\(\alpha\)
An external policy function to decide which action to take (e.g. \(\epsilon\)-greedy)
Although with Q-learning \(Q(s, \alpha)\) directly approximates \(Q^*\) independently of the policy \(\pi\) being followed, the policy still has an effect in that it determines which state-action pairs and visited updated. However, for correct convergence all that is required is that all pairs continue to be updated [1]. In fact, any method guaranteed to find optimal behavior in the general case must require it [1].
The algorithm above, stores the expected value estimate for each state-action pair in a table. This means we cannot use it when we have continuous states or actions, which would lead to an array of infinite length. Given that the total dataset distortion is assumed to be in the range \([0, 1]\) of the real numbers; where the edge points mean no distortion and full distortion of the data set/column respectively. We discretize this range into bins and for each entailed value of the distortion we use the corresponding bin as a state index. Alternatively, we could discretize the distortion of each column into bins and create tuples of indeces representing a state.
We preprocess the data set by normalizing the numeric columns. We will use the cosine normalized distance to measure the distortion of columns with string data. Similarly, we use the following \(L_2\)-based norm for calculating the distortion of numeric columns
where $N$ is the size of the vector. This way the resulting distance, due to the normalization of numeric columns, will be in the range \([0,1]\).
Code¶
The necessary imports
import numpy as np
import random
from src.trainers.trainer import Trainer, TrainerConfig
from src.algorithms.q_learning import QLearning, QLearnConfig
from src.spaces.action_space import ActionSpace
from src.spaces.actions import ActionIdentity, ActionStringGeneralize, ActionNumericBinGeneralize
from src.policies.epsilon_greedy_policy import EpsilonGreedyPolicy, EpsilonDecayOption
from src.utils.iteration_control import IterationControl
from src.examples.helpers.plot_utils import plot_running_avg
from src.datasets import ColumnType
from src.examples.helpers.load_three_columns_mock_dataset import load_discrete_env, \
get_ethinicity_hierarchy, get_salary_bins, load_mock_subjects
from src.spaces.env_type import DiscreteEnvType
from src.utils import INFO
Next establish a set of configuration parameters
# configuration params
EPS = 1.0
EPSILON_DECAY_OPTION = EpsilonDecayOption.CONSTANT_RATE # .INVERSE_STEP
EPSILON_DECAY_FACTOR = 0.01
GAMMA = 0.99
ALPHA = 0.1
N_EPISODES = 1001
N_ITRS_PER_EPISODE = 30
N_STATES = 10
# fix the rewards. Assume that any average distortion in
# (0.3, 0.7) suits us
MAX_DISTORTION = 0.7
MIN_DISTORTION = 0.3
OUT_OF_MAX_BOUND_REWARD = -1.0
OUT_OF_MIN_BOUND_REWARD = -1.0
IN_BOUNDS_REWARD = 5.0
OUTPUT_MSG_FREQUENCY = 100
N_ROUNDS_BELOW_MIN_DISTORTION = 10
SAVE_DISTORTED_SETS_DIR = "q_learning_three_columns_results/distorted_set"
PUNISH_FACTOR = 2.0
The dirver code brings all the elements together
if __name__ == '__main__':
# set the seed for random engine
random.seed(42)
# set the seed for random engine
random.seed(42)
column_types = {"ethnicity": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"salary": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"diagnosis": ColumnType.INSENSITIVE_ATTRIBUTE}
action_space = ActionSpace(n=5)
# all the columns that are SENSITIVE_ATTRIBUTE will be kept as they are
# because currently we have no model
# also INSENSITIVE_ATTRIBUTE will be kept as is
action_space.add_many(ActionIdentity(column_name="salary"),
ActionIdentity(column_name="diagnosis"),
ActionIdentity(column_name="ethnicity"),
ActionStringGeneralize(column_name="ethnicity",
generalization_table=get_ethinicity_hierarchy()),
ActionNumericBinGeneralize(column_name="salary",
generalization_table=get_salary_bins(ds=load_mock_subjects(),
n_states=N_STATES)))
env = load_discrete_env(env_type=DiscreteEnvType.TOTAL_DISTORTION_STATE, n_states=N_STATES,
action_space=action_space,
min_distortion=MIN_DISTORTION, max_distortion=MIN_DISTORTION,
total_min_distortion=MIN_DISTORTION, total_max_distortion=MAX_DISTORTION,
punish_factor=PUNISH_FACTOR, column_types=column_types,
save_distoreted_sets_dir=SAVE_DISTORTED_SETS_DIR,
use_identifying_column_dist_in_total_dist=False,
use_identifying_column_dist_factor=-100,
gamma=GAMMA,
in_bounds_reward=IN_BOUNDS_REWARD,
out_of_min_bound_reward=OUT_OF_MIN_BOUND_REWARD,
out_of_max_bound_reward=OUT_OF_MAX_BOUND_REWARD,
n_rounds_below_min_distortion=N_ROUNDS_BELOW_MIN_DISTORTION)
# save the data before distortion so that we can
# later load it on ARX
env.save_current_dataset(episode_index=-1, save_index=False)
# configuration for the Q-learner
algo_config = QLearnConfig(gamma=GAMMA, alpha=ALPHA,
n_itrs_per_episode=N_ITRS_PER_EPISODE,
policy=EpsilonGreedyPolicy(eps=EPS, n_actions=env.n_actions,
decay_op=EPSILON_DECAY_OPTION,
epsilon_decay_factor=EPSILON_DECAY_FACTOR))
agent = QLearning(algo_config=algo_config)
trainer_config = TrainerConfig(n_episodes=N_EPISODES, output_msg_frequency=OUTPUT_MSG_FREQUENCY)
trainer = Trainer(env=env, agent=agent, configuration=trainer_config)
trainer.train()
# avg_rewards = trainer.avg_rewards()
avg_rewards = trainer.total_rewards
plot_running_avg(avg_rewards, steps=100,
xlabel="Episodes", ylabel="Reward",
title="Running reward average over 100 episodes")
avg_episode_dist = np.array(trainer.total_distortions)
print("{0} Max/Min distortion {1}/{2}".format(INFO, np.max(avg_episode_dist), np.min(avg_episode_dist)))
plot_running_avg(avg_episode_dist, steps=100,
xlabel="Episodes", ylabel="Distortion",
title="Running distortion average over 100 episodes")
print("=============================================")
print("{0} Generating distorted dataset".format(INFO))
# Let's play
env.reset()
stop_criterion = IterationControl(n_itrs=10, min_dist=MIN_DISTORTION, max_dist=MAX_DISTORTION)
agent.play(env=env, stop_criterion=stop_criterion)
env.save_current_dataset(episode_index=-2, save_index=False)
print("{0} Done....".format(INFO))
print("=============================================")
Results¶
The following images show the performance of the learning process

Running average reward.¶

Running average total distortion.¶
Although there is evidence of learning, it should be noted that this depends heavily on the applied transformations on the columns and the metrics used. So typically, some experimentation should be employed in order to determine the right options.
The following is snapshot of the distorted dataset produced by the agent
ethnicity,salary,diagnosis
British,0.3333333333333333,1
British,0.1111111111111111,0
British,0.5555555555555556,3
British,0.5555555555555556,3
British,0.1111111111111111,0
British,0.1111111111111111,1
British,0.1111111111111111,4
British,0.3333333333333333,3
British,0.1111111111111111,4
British,0.3333333333333333,0
Asian,0.1111111111111111,0
British,0.1111111111111111,0
British,0.1111111111111111,3
White,0.1111111111111111,0
British,0.1111111111111111,3
British,0.3333333333333333,4
Mixed,0.3333333333333333,4
British,0.7777777777777777,1
whilst the following is a snapshot of the distorted dataset by using ARX K-anonymity algorithm
NHSno,given_name,surname,gender,dob,ethnicity,education,salary,mutation_status,preventative_treatment,diagnosis
*,*,*,*,*,White British,*,0.3333333333333333,*,*,1
*,*,*,*,*,White British,*,0.1111111111111111,*,*,0
*,*,*,*,*,White British,*,0.1111111111111111,*,*,1
*,*,*,*,*,White British,*,0.3333333333333333,*,*,3
*,*,*,*,*,White British,*,0.1111111111111111,*,*,4
*,*,*,*,*,White British,*,0.3333333333333333,*,*,0
*,*,*,*,*,Bangladeshi,*,0.1111111111111111,*,*,0
*,*,*,*,*,White British,*,0.1111111111111111,*,*,0
*,*,*,*,*,White other,*,0.1111111111111111,*,*,0
*,*,*,*,*,White British,*,0.3333333333333333,*,*,4
*,*,*,*,*,White British,*,0.7777777777777777,*,*,1
*,*,*,*,*,White British,*,0.1111111111111111,*,*,2
*,*,*,*,*,White British,*,0.1111111111111111,*,*,2
*,*,*,*,*,White other,*,0.1111111111111111,*,*,2
*,*,*,*,*,White British,*,0.5555555555555556,*,*,0
*,*,*,*,*,White British,*,0.5555555555555556,*,*,4
*,*,*,*,*,White British,*,0.5555555555555556,*,*,0
*,*,*,*,*,White British,*,0.3333333333333333,*,*,0
Note that the K-anonymity algorithm removes some rows during the anonymization process, so there is no one-to-one correspondence to the two outpus. Nonetheless, it shows qualitatively what the two algorithms produce.
References¶
Richard S. Sutton and Andrw G. Barto, Reinforcement Learning. An Introduction 2nd Edition, MIT Press.
Watkins, Learning from delayed rewards, King’s College, Cambridge, Ph.D. thesis, 1989.
Q-learning algorithm on mock data set¶
Overview¶
In the previous example, we applied Q-learning on a dataset consisting of three columns. Moreover, we used a one dimensional state space; we discretized the range \([0,1]\) into bins and used the resulting bin index as the state index. In this example, we will simply allow for more columns in the data set. Other than that, this example is the same as the previous one.
Code¶
The necessary imports
import random
import numpy as np
from src.examples.helpers.load_full_mock_dataset import load_discrete_env, get_ethinicity_hierarchy, \
get_gender_hierarchy, get_salary_bins, load_mock_subjects
from src.datasets import ColumnType
from src.spaces.env_type import DiscreteEnvType
from src.spaces.action_space import ActionSpace
from src.spaces.actions import ActionIdentity, ActionStringGeneralize, ActionNumericBinGeneralize
from src.algorithms.q_learning import QLearnConfig, QLearning
from src.policies.epsilon_greedy_policy import EpsilonGreedyPolicy, EpsilonDecayOption
from src.trainers.trainer import Trainer, TrainerConfig
from src.examples.helpers.plot_utils import plot_running_avg
from src.utils import INFO
Next establish a set of configuration parameters
# configuration params
N_STATES = 10
GAMMA = 0.99
ALPHA = 0.1
PUNISH_FACTOR = 2.0
MAX_DISTORTION = 0.7
MIN_DISTORTION = 0.4
SAVE_DISTORTED_SETS_DIR = "/home/alex/qi3/drl_anonymity/src/examples/q_learning_all_cols_results/distorted_set"
EPS = 1.0
EPSILON_DECAY_OPTION = EpsilonDecayOption.CONSTANT_RATE # .INVERSE_STEP
EPSILON_DECAY_FACTOR = 0.01
USE_IDENTIFYING_COLUMNS_DIST = True
IDENTIFY_COLUMN_DIST_FACTOR = 0.1
N_EPISODES = 1001
N_ITRS_PER_EPISODE = 30
OUT_OF_MAX_BOUND_REWARD = -1.0
OUT_OF_MIN_BOUND_REWARD = -1.0
IN_BOUNDS_REWARD = 5.0
OUTPUT_MSG_FREQUENCY = 100
N_ROUNDS_BELOW_MIN_DISTORTION = 10
The dirver code brings all the elements together
if __name__ == '__main__':
# set the seed for random engine
random.seed(42)
# specify the column types. An identifying column
# will me removed from the anonymized data set
# An INSENSITIVE_ATTRIBUTE remains intact.
# A QUASI_IDENTIFYING_ATTRIBUTE is used in the anonymization
# A SENSITIVE_ATTRIBUTE currently remains intact
column_types = {"NHSno": ColumnType.IDENTIFYING_ATTRIBUTE,
"given_name": ColumnType.IDENTIFYING_ATTRIBUTE,
"surname": ColumnType.IDENTIFYING_ATTRIBUTE,
"gender": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"dob": ColumnType.SENSITIVE_ATTRIBUTE,
"ethnicity": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"education": ColumnType.SENSITIVE_ATTRIBUTE,
"salary": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"mutation_status": ColumnType.SENSITIVE_ATTRIBUTE,
"preventative_treatment": ColumnType.SENSITIVE_ATTRIBUTE,
"diagnosis": ColumnType.INSENSITIVE_ATTRIBUTE}
# define the action space
action_space = ActionSpace(n=10)
# all the columns that are SENSITIVE_ATTRIBUTE will be kept as they are
# because currently we have no model
# also INSENSITIVE_ATTRIBUTE will be kept as is
# in order to declare this we use an ActionIdentity
action_space.add_many(ActionIdentity(column_name="dob"),
ActionIdentity(column_name="education"),
ActionIdentity(column_name="salary"),
ActionIdentity(column_name="diagnosis"),
ActionIdentity(column_name="mutation_status"),
ActionIdentity(column_name="preventative_treatment"),
ActionIdentity(column_name="ethnicity"),
ActionStringGeneralize(column_name="ethnicity",
generalization_table=get_ethinicity_hierarchy()),
ActionStringGeneralize(column_name="gender",
generalization_table=get_gender_hierarchy()),
ActionNumericBinGeneralize(column_name="salary",
generalization_table=get_salary_bins(ds=load_mock_subjects(),
n_states=N_STATES))
)
action_space.shuffle()
env = load_discrete_env(env_type=DiscreteEnvType.TOTAL_DISTORTION_STATE,
n_states=N_STATES,
min_distortion=MIN_DISTORTION, max_distortion=MAX_DISTORTION,
total_min_distortion=MIN_DISTORTION, total_max_distortion=MAX_DISTORTION,
out_of_max_bound_reward=OUT_OF_MAX_BOUND_REWARD,
out_of_min_bound_reward=OUT_OF_MIN_BOUND_REWARD,
in_bounds_reward=IN_BOUNDS_REWARD,
punish_factor=PUNISH_FACTOR,
column_types=column_types,
action_space=action_space,
save_distoreted_sets_dir=SAVE_DISTORTED_SETS_DIR,
use_identifying_column_dist_in_total_dist=USE_IDENTIFYING_COLUMNS_DIST,
use_identifying_column_dist_factor=IDENTIFY_COLUMN_DIST_FACTOR,
gamma=GAMMA,
n_rounds_below_min_distortion=N_ROUNDS_BELOW_MIN_DISTORTION)
agent_config = QLearnConfig(n_itrs_per_episode=N_ITRS_PER_EPISODE, gamma=GAMMA,
alpha=ALPHA,
policy=EpsilonGreedyPolicy(eps=EPS, n_actions=env.n_actions,
decay_op=EPSILON_DECAY_OPTION,
epsilon_decay_factor=EPSILON_DECAY_FACTOR))
agent = QLearning(algo_config=agent_config)
trainer_config = TrainerConfig(n_episodes=N_EPISODES, output_msg_frequency=OUTPUT_MSG_FREQUENCY)
trainer = Trainer(env=env, agent=agent, configuration=trainer_config)
trainer.train()
avg_rewards = trainer.total_rewards
plot_running_avg(avg_rewards, steps=100,
xlabel="Episodes", ylabel="Reward",
title="Running reward average over 100 episodes")
avg_episode_dist = np.array(trainer.total_distortions)
print("{0} Max/Min distortion {1}/{2}".format(INFO, np.max(avg_episode_dist), np.min(avg_episode_dist)))
plot_running_avg(avg_episode_dist, steps=100,
xlabel="Episodes", ylabel="Distortion",
title="Running distortion average over 100 episodes")
Results¶
The following images show the performance of the learning process

Running average reward.¶

Running average total distortion.¶
References¶
Richard S. Sutton and Andrw G. Barto, Reinforcement Learning. An Introduction 2nd Edition, MIT Press.
Semi-gradient SARSA algorithm on mock data set¶
Overview¶
In this example, we use the episodic semi-gradient SARSA algorithm to anonymize a data set.
Semi-gradient SARSA algorithm¶
One of the major disadvantages of Qlearning we saw in the previous examples, is that we need to use a tabular representation of the state-action space. This poses limitations on how large the state space can be on current machines; for a data set with, say, 5 columns when each is discretized using 10 bins, this creates a state space of the the order \(O(10^5)\). Although we won’t address this here, we want to introduce the idea of weighting the columns. This idea comes from the fact that possibly not all columns carry the same information regarding anonimity and data set utility. Implicitly we decode this belief by categorizing the columns as
ColumnType.IDENTIFYING_ATTRIBUTE
ColumnType.QUASI_IDENTIFYING_ATTRIBUTE
ColumnType.SENSITIVE_ATTRIBUTE
ColumnType.INSENSITIVE_ATTRIBUTE
Thus, in this example, instead to representing the state-action function \(q_{\pi}\) using a table as we did in Q-learning on a three columns dataset, we will assume a functional form for it. Specifically, we assume that the state-action function can be approximated by \(\hat{q} \approx q_{\pi}\) given by
where \(\mathbf{w}\) is the weights vector and \(\mathbf{x}(s, a)\) is called the feature vector representing state \(s\) when taking action \(a\) [1]. We will use Tile coding to construct \(\mathbf{x}(s, \alpha)\). Our goal now is to find the components of the weight vector. We can use stochastic gradient descent (or SGD ) for this [1]. In this case, the update rule is [1]
where \(\alpha\) is the learning rate and \(U_t\), for one-step SARSA, is given by [1]:
Since, \(\hat{q}(s, a)\) is a linear function with respect to the weights, its gradient is given by
The semi-gradient SARSA algorithm is shown below

Episodic semi-gradient SARSA algorithm. Image from [1].¶
Tile coding¶
Since we consider all the columns distortions in the data set, means that we deal with a multi-dimensional continuous spaces. In this case, we can use tile coding to construct \(\mathbf{x}(s, \alpha)\) [1].
Tile coding is a form of coarse coding for multi-dimensional continuous spaces [1]. In this method, the features are grouped into partitions of the state space. Each partition is called a tiling, and each element of the partition is called a tile [1]. The following figure shows the a 2D state space partitioned in a uniform grid (left). If we only use this tiling, we would not have coarse coding but just a case of state aggregation.
In order to apply coarse coding, we use overlapping tiling partitions. In this case, each tiling is offset by a fraction of a tile width [1]. A simple case with four tilings is shown on the right side of following figure.

Multiple, overlapping grid-tilings on a limited two-dimensional space. These tilings are offset from one another by a uniform amount in each dimension. Image from [1].¶
One practical advantage of tile coding is that the overall number of features that are active at a given instance is the same for any state [1]. Exactly one feature is present in each tiling, so the total number of features present is always the same as the number of tilings [1]. This allows the learning parameter \(\eta\), to be set according to
where \(n\) is the number of tilings.
Code¶
The necessary imports
import random
import numpy as np
from src.algorithms.semi_gradient_sarsa import SemiGradSARSAConfig, SemiGradSARSA
from src.spaces.tiled_environment import TiledEnv, TiledEnvConfig, Layer
from src.spaces.action_space import ActionSpace
from src.spaces.actions import ActionIdentity, ActionStringGeneralize, ActionNumericBinGeneralize
from src.trainers.trainer import Trainer, TrainerConfig
from src.policies.epsilon_greedy_policy import EpsilonDecayOption
from src.algorithms.epsilon_greedy_q_estimator import EpsilonGreedyQEstimatorConfig, EpsilonGreedyQEstimator
from src.datasets import ColumnType
from src.spaces.env_type import DiscreteEnvType
from src.examples.helpers.load_full_mock_dataset import load_discrete_env, get_ethinicity_hierarchy, \
get_gender_hierarchy, get_salary_bins, load_mock_subjects
from src.examples.helpers.plot_utils import plot_running_avg
from src.utils import INFO
Next we set some constants
N_STATES = 10
N_LAYERS = 5
N_BINS = 10
N_EPISODES = 10001
OUTPUT_MSG_FREQUENCY = 100
GAMMA = 0.99
ALPHA = 0.1
N_ITRS_PER_EPISODE = 30
EPS = 1.0
EPSILON_DECAY_OPTION = EpsilonDecayOption.CONSTANT_RATE
EPSILON_DECAY_FACTOR = 0.01
MAX_DISTORTION = 0.7
MIN_DISTORTION = 0.4
OUT_OF_MAX_BOUND_REWARD = -1.0
OUT_OF_MIN_BOUND_REWARD = -1.0
IN_BOUNDS_REWARD = 5.0
N_ROUNDS_BELOW_MIN_DISTORTION = 10
SAVE_DISTORTED_SETS_DIR = "semi_grad_sarsa_all_columns/distorted_set"
PUNISH_FACTOR = 2.0
USE_IDENTIFYING_COLUMNS_DIST = True
IDENTIFY_COLUMN_DIST_FACTOR = 0.1
The driver code brings all elements together
if __name__ == '__main__':
# set the seed for random engine
random.seed(42)
# specify the column types. An identifying column
# will me removed from the anonymized data set
# An INSENSITIVE_ATTRIBUTE remains intact.
# A QUASI_IDENTIFYING_ATTRIBUTE is used in the anonymization
# A SENSITIVE_ATTRIBUTE currently remains intact
column_types = {"NHSno": ColumnType.IDENTIFYING_ATTRIBUTE,
"given_name": ColumnType.IDENTIFYING_ATTRIBUTE,
"surname": ColumnType.IDENTIFYING_ATTRIBUTE,
"gender": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"dob": ColumnType.SENSITIVE_ATTRIBUTE,
"ethnicity": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"education": ColumnType.SENSITIVE_ATTRIBUTE,
"salary": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"mutation_status": ColumnType.SENSITIVE_ATTRIBUTE,
"preventative_treatment": ColumnType.SENSITIVE_ATTRIBUTE,
"diagnosis": ColumnType.INSENSITIVE_ATTRIBUTE}
# define the action space
action_space = ActionSpace(n=10)
# all the columns that are SENSITIVE_ATTRIBUTE will be kept as they are
# because currently we have no model
# also INSENSITIVE_ATTRIBUTE will be kept as is
# in order to declare this we use an ActionIdentity
action_space.add_many(ActionIdentity(column_name="dob"),
ActionIdentity(column_name="education"),
ActionIdentity(column_name="salary"),
ActionIdentity(column_name="diagnosis"),
ActionIdentity(column_name="mutation_status"),
ActionIdentity(column_name="preventative_treatment"),
ActionIdentity(column_name="ethnicity"),
ActionStringGeneralize(column_name="ethnicity",
generalization_table=get_ethinicity_hierarchy()),
ActionStringGeneralize(column_name="gender",
generalization_table=get_gender_hierarchy()),
ActionNumericBinGeneralize(column_name="salary",
generalization_table=get_salary_bins(ds=load_mock_subjects(),
n_states=N_STATES)))
action_space.shuffle()
# load the discrete environment
env = load_discrete_env(env_type=DiscreteEnvType.MULTI_COLUMN_STATE, n_states=N_STATES,
min_distortion={"ethnicity": 0.133, "salary": 0.133, "gender": 0.133,
"dob": 0.0, "education": 0.0, "diagnosis": 0.0,
"mutation_status": 0.0, "preventative_treatment": 0.0,
"NHSno": 0.0, "given_name": 0.0, "surname": 0.0},
max_distortion={"ethnicity": 0.133, "salary": 0.133, "gender": 0.133,
"dob": 0.0, "education": 0.0, "diagnosis": 0.0,
"mutation_status": 0.0, "preventative_treatment": 0.0,
"NHSno": 0.1, "given_name": 0.1, "surname": 0.1},
total_min_distortion=MIN_DISTORTION, total_max_distortion=MAX_DISTORTION,
out_of_max_bound_reward=OUT_OF_MAX_BOUND_REWARD,
out_of_min_bound_reward=OUT_OF_MIN_BOUND_REWARD,
in_bounds_reward=IN_BOUNDS_REWARD,
punish_factor=PUNISH_FACTOR,
column_types=column_types,
action_space=action_space,
save_distoreted_sets_dir=SAVE_DISTORTED_SETS_DIR,
use_identifying_column_dist_in_total_dist=USE_IDENTIFYING_COLUMNS_DIST,
use_identifying_column_dist_factor=IDENTIFY_COLUMN_DIST_FACTOR,
gamma=GAMMA,
n_rounds_below_min_distortion=N_ROUNDS_BELOW_MIN_DISTORTION)
# the configuration for the Tiled environment
tiled_env_config = TiledEnvConfig(n_layers=N_LAYERS, n_bins=N_BINS,
env=env,
column_ranges={"gender": [0.0, 1.0],
"ethnicity": [0.0, 1.0],
"salary": [0.0, 1.0]})
# create the Tiled environment
tiled_env = TiledEnv(tiled_env_config)
tiled_env.create_tiles()
# agent configuration
agent_config = SemiGradSARSAConfig(gamma=GAMMA, alpha=ALPHA, n_itrs_per_episode=N_ITRS_PER_EPISODE,
policy=EpsilonGreedyQEstimator(EpsilonGreedyQEstimatorConfig(eps=EPS, n_actions=tiled_env.n_actions,
decay_op=EPSILON_DECAY_OPTION,
epsilon_decay_factor=EPSILON_DECAY_FACTOR,
env=tiled_env,
gamma=GAMMA,
alpha=ALPHA)))
# create the agent
agent = SemiGradSARSA(agent_config)
# create a trainer to train the SemiGradSARSA agent
trainer_config = TrainerConfig(n_episodes=N_EPISODES, output_msg_frequency=OUTPUT_MSG_FREQUENCY)
trainer = Trainer(env=tiled_env, agent=agent, configuration=trainer_config)
# train the agent
trainer.train()
# avg_rewards = trainer.avg_rewards()
avg_rewards = trainer.total_rewards
plot_running_avg(avg_rewards, steps=100,
xlabel="Episodes", ylabel="Reward",
title="Running reward average over 100 episodes")
avg_episode_dist = np.array(trainer.total_distortions)
print("{0} Max/Min distortion {1}/{2}".format(INFO, np.max(avg_episode_dist), np.min(avg_episode_dist)))
plot_running_avg(avg_episode_dist, steps=100,
xlabel="Episodes", ylabel="Distortion",
title="Running distortion average over 100 episodes")

Running average reward.¶

Running average total distortion.¶
The images above illustrate that there is clear evidence of learning as it was when using Qlearning. Furthermore, the training time is a lot more than the simple Qlearning algorithm. Thus, with the current implementation of semi0gradient SARSA we do not have any clear advantage. Instead, it could be argued that we maintain the constraints related with Qlearning (this comes form the tiling approach we used) without and clear advantage.
References¶
Richard S. Sutton and Andrw G. Barto, Reinforcement Learning. An Introduction 2nd Edition, MIT Press.
A2C algorithm on mock data set¶
Overview¶
Both the Q-learning algorithm we used in Q-learning on a three columns dataset and the SARSA algorithm in Semi-gradient SARSA on a three columns data set are value-based methods; that is they estimate directly value functions. Specifically the state-action function \(Q\). By knowing \(Q\) we can construct a policy to follow for example to choose the action that at the given state maximizes the state-action function i.e. \(argmax_{\alpha}Q(s_t, \alpha)\) i.e. a greedy policy. These methods are called off-policy methods.
However, the true objective of reinforcement learning is to directly learn a policy \(\pi\). One class of algorithms towards this directions are policy gradient algorithms like REINFORCE and Advantage Actor-Critic or A2C algorithms. A review of A2C methods can be found in [1].
A2C algorithm¶
Typically with policy gradient methods and A2C in particular, we approximate directly the policy by a parameterized model. Thereafter, we train the model i.e. learn its paramters by taking samples from the environment. The main advantage of learning a parameterized policy is that it can be any learnable function e.g. a linear model or a deep neural network.
The A2C algorithm is a the synchronous version of A3C [2]. Both algorithms, fall under the umbrella of actor-critic methods. In these methods, we estimate a parameterized policy; the actor and a parameterized value function; the critic. The role of the policy or actor network is to indicate which action to take on a given state. In our implementation below, the policy network returns a probability distribution over the action space. Specifically, a tensor of probabilities. The role of the critic model is to evaluate how good is the action that is selected.
In our implementation we use a shared-weights model and use a single agent that interacts with multiple instances of the environment. In other words, we create a number of workers where each worker loads its own instance of the data set to anonymize.
The objective of the agent is to maximize the expected discounted return [2]:
where \(\tau\) is the trajectory the agent observes with probability distribution \(\rho_{\theta}\), \(\gamma\) is the discount factor and \(R(s_t, \alpha_t)\) represents some unknown to the agent reward function. We can rewrite the expression above as
Let’s condense the involved notation by using \(G(\tau)\) to denote the sum in the expression above i.e.
The probability distribution \(\rho_{\theta}\) should be a function of the followed policy \(\pi_{\theta}\) as this dictates what action is followed. Indeed we can write [2],
where \(P(s_{t+1}| s_t, a_t)\) denotes the state transition probabilities. Policy gradient methods use the gardient of \(J(\pi_{\theta})\) in order to make progress. It turns out, see for example [2, 3] that we can write
This equation above forms the essence of the policy gradient methods. However, we cannot fully evaluate the integral above as we don’t know the transition probabilities. We can eliminate the term that involves the gradient \(\nabla_{\theta}\rho_{\theta}\) by using the expression for \(\rho_{\theta}\)
From the expression above only the term \(\pi_{\theta}(a_t, s_t)\) involves \(\theta\). Thus,
We will use the expression above as well as batches of trajectories in order to calculate the integral above. In particular, we will use the following expression
where \(N\) is the size of the batch. There are various expressions for \(G(\tau)\) (see e.g. [4]) . Belowe, we review some of them. The first expression is given by
and this is the expression used by the REINFORCE algorithm [2]. However, this is a full Monte Carlo estimate and when \(N\) is small the gradient estimation may exhibit high variance. In such cases learning may not be stable. Another expression we could employ is known as the reward-to-go term [2]:
Another idea is to use a baseline in order to reduce further the gradient variance [2]. One such approach is to use the so-called advantage function \(A(s_t, \alpha_t)\) defined as [2]
The advantage function measures how much the agent is better off by taking action \(a_t\) when in state \(s_t\) as opposed to following the existing policy. Let’s see how we can estimate the advantage function.
Estimate \(A(s_t, a_t)\)¶
The advantage function involes both the state-action value function \(Q_{\pi}(s_t, a_t)\) as well as the value function \(V_{\pi}(s_t)\). Given a model that somehow estimates \(V_{\pi}(s_t)\), we can estimate \(Q_{\pi}(s_t, a_t)\) from
or
Resulting in
GAE¶
The advantage actor-critic model we use in this section involves a more general form of the advanatge estimation known as Generalized Advantage Estimation or GAE. This is a method for estimating targets for the advantage function [3]. Specifically, we use the following expression for the advantage function [4]
when \(\lambda=0\) this expression results to the the expression for \(A(s_t, a_t)\) [4].
A2C model¶
As we already mentioned, in actor-critic methods, there are two models; the actor and the critic. The role of the policy or actor model is to indicate which action to take on a given state There are two main architectures for actor-critic methods; completely isolated actor and critic models or weight sharing models [2]. In the former, the two models share no common aspects. The advantage of such an approach is that it is usually more stable. The second architecture allows for the two models to share some characteristics and differentiate in the last layers. Although this second option requires careful tuning of the hyperparameters, it has the advantage of cross learning and use common extraction capabilities [2].
In this example, we will follow the second architecture. Moreover, to speed up training, we will use a multi-process environment that gathers samples from multiple environments at once.
The loss function, we minimize is a weighted sum of the two loss functions of the participating models i.e.
where
where \(MSE\) is the mean square error function and \(y_i\) are the state-value targets i.e.
Code¶
import random
from pathlib import Path
import numpy as np
import torch
from src.algorithms.a2c import A2C, A2CConfig
from src.networks.a2c_networks import A2CNetSimpleLinear
from src.examples.helpers.load_full_mock_dataset import load_discrete_env, get_ethinicity_hierarchy, \
get_gender_hierarchy, get_salary_bins, load_mock_subjects
from src.datasets import ColumnType
from src.spaces.env_type import DiscreteEnvType
from src.spaces.action_space import ActionSpace
from src.spaces.actions import ActionIdentity, ActionStringGeneralize, ActionNumericBinGeneralize
from src.utils.iteration_control import IterationControl
from src.examples.helpers.plot_utils import plot_running_avg
from src.spaces.multiprocess_env import MultiprocessEnv
from src.trainers.pytorch_trainer import PyTorchTrainer, PyTorchTrainerConfig
from src.maths.optimizer_type import OptimizerType
from src.maths.pytorch_optimizer_config import PyTorchOptimizerConfig
from src.utils import INFO
N_STATES = 10
N_ITRS_PER_EPISODE = 400
ACTION_SPACE_SIZE = 10
N_WORKERS = 3
N_EPISODES = 1001
GAMMA = 0.99
ALPHA = 0.1
PUNISH_FACTOR = 2.0
MAX_DISTORTION = 0.7
MIN_DISTORTION = 0.4
SAVE_DISTORTED_SETS_DIR = "/home/alex/qi3/drl_anonymity/src/examples/a2c_all_cols_multi_state_results/distorted_set"
USE_IDENTIFYING_COLUMNS_DIST = True
IDENTIFY_COLUMN_DIST_FACTOR = 0.1
OUT_OF_MAX_BOUND_REWARD = -1.0
OUT_OF_MIN_BOUND_REWARD = -1.0
IN_BOUNDS_REWARD = 5.0
OUTPUT_MSG_FREQUENCY = 100
N_ROUNDS_BELOW_MIN_DISTORTION = 10
N_COLUMNS = 11
def env_loader(kwargs):
column_types = {"NHSno": ColumnType.IDENTIFYING_ATTRIBUTE,
"given_name": ColumnType.IDENTIFYING_ATTRIBUTE,
"surname": ColumnType.IDENTIFYING_ATTRIBUTE,
"gender": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"dob": ColumnType.SENSITIVE_ATTRIBUTE,
"ethnicity": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"education": ColumnType.SENSITIVE_ATTRIBUTE,
"salary": ColumnType.QUASI_IDENTIFYING_ATTRIBUTE,
"mutation_status": ColumnType.SENSITIVE_ATTRIBUTE,
"preventative_treatment": ColumnType.SENSITIVE_ATTRIBUTE,
"diagnosis": ColumnType.INSENSITIVE_ATTRIBUTE}
# define the action space
action_space = ActionSpace(n=ACTION_SPACE_SIZE)
# all the columns that are SENSITIVE_ATTRIBUTE will be kept as they are
# because currently we have no model
# also INSENSITIVE_ATTRIBUTE will be kept as is
# in order to declare this we use an ActionIdentity
action_space.add_many(ActionIdentity(column_name="dob"),
ActionIdentity(column_name="education"),
ActionIdentity(column_name="salary"),
ActionIdentity(column_name="diagnosis"),
ActionIdentity(column_name="mutation_status"),
ActionIdentity(column_name="preventative_treatment"),
ActionIdentity(column_name="ethnicity"),
ActionStringGeneralize(column_name="ethnicity",
generalization_table=get_ethinicity_hierarchy()),
ActionStringGeneralize(column_name="gender",
generalization_table=get_gender_hierarchy()),
ActionNumericBinGeneralize(column_name="salary",
generalization_table=get_salary_bins(ds=load_mock_subjects(),
n_states=N_STATES)))
# shuffle the action space
# using different seeds
action_space.shuffle(seed=kwargs["rank"] + 1)
env = load_discrete_env(env_type=DiscreteEnvType.MULTI_COLUMN_STATE, n_states=N_STATES,
min_distortion={"ethnicity": 0.133, "salary": 0.133, "gender": 0.133,
"dob": 0.0, "education": 0.0, "diagnosis": 0.0,
"mutation_status": 0.0, "preventative_treatment": 0.0,
"NHSno": 0.0, "given_name": 0.0, "surname": 0.0},
max_distortion={"ethnicity": 0.133, "salary": 0.133, "gender": 0.133,
"dob": 0.0, "education": 0.0, "diagnosis": 0.0,
"mutation_status": 0.0, "preventative_treatment": 0.0,
"NHSno": 0.1, "given_name": 0.1, "surname": 0.1},
total_min_distortion=MIN_DISTORTION, total_max_distortion=MAX_DISTORTION,
out_of_max_bound_reward=OUT_OF_MAX_BOUND_REWARD,
out_of_min_bound_reward=OUT_OF_MIN_BOUND_REWARD,
in_bounds_reward=IN_BOUNDS_REWARD,
punish_factor=PUNISH_FACTOR,
column_types=column_types,
action_space=action_space,
save_distoreted_sets_dir=SAVE_DISTORTED_SETS_DIR,
use_identifying_column_dist_in_total_dist=USE_IDENTIFYING_COLUMNS_DIST,
use_identifying_column_dist_factor=IDENTIFY_COLUMN_DIST_FACTOR,
gamma=GAMMA,
n_rounds_below_min_distortion=N_ROUNDS_BELOW_MIN_DISTORTION)
# we want to get the distances as states
# not bin indices
env.config.state_as_distances = True
return env
def action_sampler(logits: torch.Tensor) -> torch.distributions.Distribution:
action_dist = torch.distributions.Categorical(logits=logits)
return action_dist
if __name__ == '__main__':
# set the seed for random engine
random.seed(42)
# set the seed for PyTorch
torch.manual_seed(42)
# this the A2C network
net = A2CNetSimpleLinear(n_columns=N_COLUMNS, n_actions=ACTION_SPACE_SIZE)
# agent configuration
a2c_config = A2CConfig(action_sampler=action_sampler, n_iterations_per_episode=N_ITRS_PER_EPISODE,
a2cnet=net, save_model_path=Path("./a2c_three_columns_output/"),
n_workers=N_WORKERS,
optimizer_config=PyTorchOptimizerConfig(optimizer_type=OptimizerType.ADAM,
optimizer_learning_rate=ALPHA))
# create the agent
agent = A2C(a2c_config)
# create a trainer to train the Qlearning agent
configuration = PyTorchTrainerConfig(n_episodes=N_EPISODES)
# set up the arguments
env = MultiprocessEnv(env_builder=env_loader, env_args={}, n_workers=N_WORKERS)
try:
env.make(agent=agent)
trainer = PyTorchTrainer(env=env, agent=agent, config=configuration)
# train the agent
trainer.train()
avg_rewards = trainer.total_rewards
plot_running_avg(avg_rewards, steps=100,
xlabel="Episodes", ylabel="Reward",
title="Running reward average over 100 episodes")
avg_episode_dist = np.array(trainer.total_distortions)
print("{0} Max/Min distortion {1}/{2}".format(INFO, np.max(avg_episode_dist), np.min(avg_episode_dist)))
plot_running_avg(avg_episode_dist, steps=100,
xlabel="Episodes", ylabel="Distortion",
title="Running distortion average over 100 episodes")
# play the agent on the environment.
# call the environment builder to create
# an instance of the environment
discrte_env = env.env_builder()
stop_criterion = IterationControl(n_itrs=10, min_dist=MIN_DISTORTION, max_dist=MAX_DISTORTION)
agent.play(env=discrte_env, criteria=stop_criterion)
except Exception as e:
print("An excpetion was thrown...{0}".format(str(e)))
finally:
env.close()
Results¶
The following images show the performance of the learning process

Running average reward.¶

Running average total distortion.¶
References¶
Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, Robert Babuska, A survey of Actor-Critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man and Cybernetics-Part C Applications and Reviews, vol 12, 2012.
Enes Bilgin, Mastering reinforcement learning with python. Packt Publishing.
Miguel Morales, Grokking deep reinforcement learning. Manning Publications.
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, Pieter Abbeel, High-Dimensional Continuous Control Using Generalized Advantage Estimation, Last download 26/04/2022.
API¶
epsilon_greedy_q_estimator¶
Module epsilon_greedy_q_estimator. Implements a q-estimator by assuming linear function approximation
- class epsilon_greedy_q_estimator.EpsilonGreedyQEstimatorConfig(eps: float = 1.0, n_actions: int = 1, decay_op: EpsilonDecayOption = EpsilonDecayOption.NONE, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None, gamma: float = 1.0, alpha: float = 1.0, env: Optional[Env] = None)¶
Configuration class for EpsilonGreedyQEstimator
- class epsilon_greedy_q_estimator.EpsilonGreedyQEstimator(config: EpsilonGreedyQEstimatorConfig)¶
Q-function estimator using an epsilon-greedy policy for action selection
- __init__(config: EpsilonGreedyQEstimatorConfig)¶
Constructor. Initialize the estimator with a given configuration
- Parameters
config (The instance configuration) –
- initialize() None ¶
Initialize the underlying weights
- Return type
None
- on_state(state: State) Action ¶
Returns the action on the given state
- Parameters
state (The state observed) –
- Return type
An environment specific Action type
- q_hat_value(state_action_vec: StateActionVec) float ¶
Returns the :math: hat{q} approximate value for the given state-action vector
- Parameters
state_action_vec (The state-action tiled vector) –
- Return type
float
a2c¶
- a2c.calculate_discounted_returns(rewards: array, discounts: array, n_workers: int = 1) array ¶
Calculate the discounted returns from the episode rewards
- Parameters
rewards (The list of rewards) –
discounts (The discount factor) –
n_workers (The number of workers) –
- a2c.create_discounts_array(end: int, base: float, start=0, endpoint=False)¶
Create an array of floating point numbers in [start, end) with the given base
- Parameters
end –
base –
start –
endpoint –
- class a2c.A2CConfig(gamma: float = 0.99, tau: float = 0.1, beta: Optional[float] = None, policy_loss_weight: float = 1.0, value_loss_weight: float = 1.0, max_grad_norm: float = 1.0, n_iterations_per_episode: int = 100, n_workers: int = 1, batch_size: int = 0, normalize_advantages: bool = True, device: str = 'cpu', action_sampler: Optional[Callable] = None, a2cnet: Optional[Module] = None, save_model_path: Optional[Path] = None, optimizer_config: Optional[PyTorchOptimizerConfig] = None)¶
Configuration for A2C algorithm
- class a2c._ActResult(logprobs: torch.Tensor, values: torch.Tensor, actions: torch.Tensor, entropies: torch.Tensor)¶
- class a2c.A2C(config: A2CConfig)¶
-
- _do_train(env: Env, episode_idx: int, **options) EpisodeInfo ¶
Train the algorithm on the episode. In fact this method simply plays the environment to collect batches
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- classmethod from_path(config: A2CConfig, path: Path)¶
Load the A2C model parameters from the given path
- Parameters
config (The configuration of the algorithm) –
path (The path to load the parameters) –
- Return type
An instance of A2C class
- on_episode(env: Env, episode_idx: int, **options) EpisodeInfo ¶
Train the algorithm on the episode
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- parameters() Any ¶
The parameters of the underlying model
- Return type
An array with the model parameters
q_learning¶
Simple Q-learning algorithm
- class q_learning.QLearnConfig(gamma: float = 1.0, alpha: float = 0.1, n_itrs_per_episode: int = 100, policy: Optional[Policy] = None)¶
Configuration for Q-learning
- class q_learning.QLearning(algo_config: QLearnConfig)¶
Q-learning algorithm implementation
- __init__(algo_config: QLearnConfig)¶
Constructor. Construct an instance of the algorithm by passing the configuration parameters
- Parameters
algo_config (The configuration parameters) –
- _do_train(env: Env, episode_idx: int, **option) EpisodeInfo ¶
Train the algorithm on the episode
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- _update_q_table(state: int, action: int, n_actions: int, reward: float, next_state: Optional[int] = None) None ¶
Update the tabular state-action function
- Parameters
state (State observed) –
action (The action taken) –
n_actions (Number of actions in the data set) –
reward (The reward observed) –
next_state (The next state observed) –
- Return type
None
- actions_after_episode_ends(env: Env, episode_idx: int, **options) None ¶
Execute any actions the algorithm needs after the episode ends
- Parameters
env (The environment that training occurs) –
episode_idx (The episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_training(env: Env, **options) None ¶
Any actions before training begins
- Parameters
env (The environment that training occurs) –
options (Any options passed by the client code) –
- Return type
None
- on_episode(env: Env, episode_idx: int, **options) EpisodeInfo ¶
Train the algorithm on the episode
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- play(env: Env, stop_criterion: Criterion) None ¶
Play the agent on the environment. This should produce a distorted dataset
- Parameters
env (The environment to) –
stop_criterion (The criteria to use to stop) –
- Return type
None
semi_gradient_sarsa¶
Module semi_gradient_sarsa. Implements episodic semi-gradient SARSA for estimating the state-action value function. the im[plementation follows the algorithm at page 244 in the book by Sutton and Barto: Reinforcement Learning An Introduction second edition 2020
- class semi_gradient_sarsa.SemiGradSARSAConfig(gamma: float = 1.0, alpha: float = 0.1, n_itrs_per_episode: int = 100, policy: Optional[Policy] = None)¶
Configuration class for semi-gradient SARSA algorithm
- class semi_gradient_sarsa.SemiGradSARSA(config: SemiGradSARSAConfig)¶
SemiGradSARSA class. Implements the semi-gradient SARSA algorithm as described
- __init__(config: SemiGradSARSAConfig) None ¶
- _do_train(env: Env, episode_idx: int, **options) EpisodeInfo ¶
Train the algorithm on the episode
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- _init() None ¶
Any initializations needed before starting the training
- Return type
None
- _validate() None ¶
Validate the state of the agent. Is called before any training begins to check that the starting state is sane
- Return type
None
- _weights_update(env: Env, state: State, action: Action, reward: float, next_state: State, next_action: Action, t: float = 1.0) None ¶
Update the weights due to the fact that the episode is finished
- Parameters
env (The environment instance that the training takes place) –
state (The current state) –
action (The action we took at state) –
reward (The reward observed when taking the given action when at the given state) –
next_state (The observed new state) –
next_action (The action to be executed in next_state) –
- Return type
None
- _weights_update_episode_done(env: Env, state: State, action: Action, reward: float, t: float = 1.0) None ¶
Update the weights of the underlying Q-estimator
- Parameters
state (The current state it is assumed to be a raw state) –
reward (The reward observed when taking the given action when at the given state) –
action (The action we took at the state) –
- Return type
None
- actions_after_episode_ends(env: Env, episode_idx: int, **options) None ¶
Any actions after the training episode ends
- Parameters
env (The training environment) –
episode_idx (The training episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_episode_begins(env: Env, episode_idx: int, **options) None ¶
Any actions to perform before the episode begins
- Parameters
env (The instance of the training environment) –
episode_idx (The training episode index) –
options (Any keyword options passed by the client code) –
- Return type
None
- actions_before_training(env: Env, **options) None ¶
Specify any actions necessary before training begins
- Parameters
env (The environment to train on) –
options (Any key-value options passed by the client) –
- Return type
None
- on_episode(env: Env, episode_idx: int, **options) EpisodeInfo ¶
Train the algorithm on the episode
- Parameters
env (The environment to train on) –
episode_idx (The index of the training episode) –
options (Any keyword based options passed by the client code) –
- Return type
An instance of EpisodeInfo
- play(env: Env, stop_criterion: Criterion) None ¶
Play the agent on the environment. This should produce a distorted dataset
- Parameters
env (The environment to) –
stop_criterion (The criteria to use to stop) –
- Return type
None
column_type¶
Module column_type specifies an enumeration of the column. This is similar to the ARX software. See the ARX documentation at: https://arx.deidentifier.org/wp-content/uploads/javadoc/current/api/org/deidentifier/arx/AttributeType.html
- class column_type.ColumnType(value)¶
An enumeration.
datasets_loaders¶
dataset_wrapper¶
exceptions¶
- class exceptions.Error(message)¶
General error class to handle generic errors
- __init__(message) None ¶
- __str__()¶
Return str(self).
- class exceptions.IncompatibleVectorSizesException(size1: int, size2: int)¶
- __init__(size1: int, size2: int) None ¶
- __str__()¶
Return str(self).
- class exceptions.InvalidDataTypeException(param_name: str, param_type: Any, param_types: str)¶
- __init__(param_name: str, param_type: Any, param_types: str)¶
- __str__()¶
Return str(self).
- class exceptions.InvalidParamValue(param_name: str, param_value: str)¶
- __init__(param_name: str, param_value: str)¶
- __str__()¶
Return str(self).
optimizer_type¶
Module optimizer_type. Specifies an enumeration for various PyTorch optimizers
- class optimizer_type.OptimizerType(value)¶
An enumeration.
pytorch_optimizer_builder¶
Module pytorch_optimizer_builder. Specifies a simple factory for building PyTorch optimizers
- pytorch_optimizer_builder.pytorch_optimizer_builder(opt_type: OptimizerType, model_params: Any, **options) Optimizer ¶
Factory method for building PyTorch optimizers
- Parameters
opt_type (The type of the optimizer) –
model_params (Model parameters to optimize on) –
options (Options for the optimizer) –
- Return type
A concrete instance of the optim.Optimizer class
loss_functions¶
Module loss_functions. Implements basic loss functions geared towards using PyTorch
- loss_functions.mse(returns: Tensor, values: Tensor) Tensor ¶
Mean square error loss function
- Parameters
returns (Values 1) –
values (Values 2) –
- Return type
A torch tensor representing the MSE loss
distortion_calculator¶
numeric_distance_type¶
Enumeration helper for quick and uniform access of the various distance metrics
- class numeric_distance_type.NumericDistanceType(value)¶
Enumeration of the various distance types
numeric_distance_calculator¶
pytorch_optimizer_config¶
Module pytorch_optimizer_configuration. Specifies a data class for configuring PyTorch optimizers
- class pytorch_optimizer_config.PyTorchOptimizerConfig(optimizer_type: OptimizerType = OptimizerType.ADAM, optimizer_learning_rate: float = 0.01, optimizer_betas: tuple = (0.9, 0.999), optimizer_weight_decay: float = 0, optimizer_amsgrad: bool = False)¶
Configuration class for the optimizer
string_distance_calculator¶
a2c_networks¶
Module a2c_networks. Specifies various networks for A2C algorithm
- class a2c_networks.A2CNetSimpleLinear(n_columns: int, n_actions: int)¶
A2CNetSimpleLinear. Specifies a network architecture consisting of three linear layers
- __init__(n_columns: int, n_actions: int)¶
Constructor.
- Parameters
n_columns (Number of columns) –
n_actions (Number of actions) –
- forward(x: Tensor) tuple ¶
Pass the state from the network
- Parameters
x (The torch tensor that represents the state) –
- Return type
The actor and the critic values
processes_manager¶
module process_manager. Utilities for managing processes
epsilon_greedy_policy¶
Module epsilon_greedy_policy. Implements epsilon-greedy policy with various decay options
- class epsilon_greedy_policy.EpsilonDecayOption(value)¶
Options for reducing epsilon
- class epsilon_greedy_policy.EpsilonGreedyConfig(eps: float = 1.0, n_actions: int = 1, decay_op: EpsilonDecayOption = EpsilonDecayOption.NONE, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)¶
Configuration class for EpsilonGreedyPolicy
- class epsilon_greedy_policy.EpsilonGreedyPolicy(eps: float, n_actions: int, decay_op: EpsilonDecayOption, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)¶
Epsilon-greedy policy implementation
- __call__(q_table: QTable, state: State) int ¶
Execute the policy
- Parameters
q_table (The q-table to use) –
state (The state observed) –
- Return type
An integer representing the action index
- __init__(eps: float, n_actions: int, decay_op: EpsilonDecayOption, max_eps: float = 1.0, min_eps: float = 0.001, epsilon_decay_factor: float = 0.01, user_defined_decrease_method: Optional[UserDefinedDecreaseMethod] = None)¶
Constructor. Initialize a policy with the given options
- Parameters
eps (The initial epsilon) –
n_actions (How many actions the environment assumes) –
decay_op (How to decay epsilon) –
max_eps (The maximum epsilon) –
min_eps (The minimum epsilon) –
epsilon_decay_factor (A decay factor used when decay_op = CONSTANT_RATE) –
user_defined_decrease_method (A user defined callable to decay epsilon) –
- __str__() str ¶
Returns the name of the policy
- Return type
A string representing the name of the policy
- actions_after_episode(episode_idx: int, **options) None ¶
Any actions the policy should execute after the episode ends
- Parameters
episode_idx (The episode index) –
options (Any options passed by the client code) –
- Return type
None
- classmethod from_config(config: EpsilonGreedyConfig)¶
Construct a policy from the given configuration
- Parameters
config (The configuration to use) –
- Return type
An instance of EpsilonGreedyPolicy class
- on_state(state: State) int ¶
Returns the optimal action on the current state
- Parameters
state (The state observed) –
- Return type
An integer representing the action index
preprocess_utils¶
actions¶
The actions module. This module includes various actions to be applied by the implemented RL agents
- class actions.ActionType(value)¶
Defines the type of an Action
- class actions.ActionBase(column_name: str, action_type: ActionType)¶
Base class for actions
- __init__(column_name: str, action_type: ActionType) None ¶
Constructor
- Parameters
column_name (The name of the column this is acting on) –
action_type (The type of the action) –
- abstract act(**ops) Any ¶
Perform the action
- Parameters
ops (The data to distort) –
- Return type
Typically the action returns the distorted subset of the data
- class actions.ActionIdentity(column_name: str)¶
Implements the identity action. Use this action to signal that no distortion should be applied.
- __init__(column_name: str) None ¶
Constructor
- Parameters
column_name (The name of the column this is acting on) –
- act(**ops) Any ¶
Perform the action
- Parameters
ops (The data to distort) –
- Return type
The distorted column
- class actions.ActionNumericBinGeneralize(column_name: str, generalization_table: Hierarchy)¶
Generalization Action for numeric columns using bins
- __init__(column_name: str, generalization_table: Hierarchy)¶
Constructor :param column_name: :type column_name: The name of the column this is acting on :param generalization_table: :type generalization_table: The bins to use
- act(**ops) Any ¶
Perform the action :param ops: :type ops: The data to distort
- Return type
Typically the action returns the distorted subset of the data
- class actions.ActionNumericStepGeneralize(column_name: str, step: float)¶
- __init__(column_name: str, step: float)¶
Constructor
- Parameters
column_name (The name of the column this is acting on) –
action_type (The type of the action) –
- act(**ops)¶
Perform an action :return:
- class actions.ActionRestore(column_name: str, restore_values: Hierarchy)¶
Implements the restore action
- __init__(column_name: str, restore_values: Hierarchy)¶
Constructor
- Parameters
column_name (The name of the column this is acting on) –
action_type (The type of the action) –
- act(**ops) Any ¶
Perform an action :return:
- class actions.ActionStringGeneralize(column_name: str, generalization_table: Hierarchy)¶
Implements the generalization action. The generalization_table must implement the __getitem__ function
- __init__(column_name: str, generalization_table: Hierarchy) None ¶
Constructor
- Parameters
column_name (The column name this action is acting on) –
generalization_table (The hierarchy for the generalization) –
- act(**ops) Any ¶
Performs the action
- Parameters
ops (The data to distort) –
- Return type
The distorted data
- add(key: Any, value: Any) None ¶
Add a new item in the underlying hierarchy
- Parameters
key (The key to use for the new item) –
value (The value of the new item) –
- Return type
None
- class actions.ActionSuppress(column_name: str, suppress_table: Hierarchy)¶
Implements the suppress action
- __init__(column_name: str, suppress_table: Hierarchy)¶
Constructor
- Parameters
column_name (The name of the column this is acting on) –
action_type (The type of the action) –
- act(**ops) None ¶
Perform the action :return: None
- class actions.ActionTransform(column_name: str, transform_value: Any)¶
Implements the transform action
- __init__(column_name: str, transform_value: Any)¶
Constructor
- Parameters
column_name (The name of the column this is acting on) –
action_type (The type of the action) –
- act(**ops) Any ¶
Perform an action :return:
action_space¶
Module action_space Specifies a wrapper to the discrete actions in the actions.py module
- class action_space.ActionSpace(n: int)¶
ActionSpace class models a discrete action space of size n
state¶
The state module. Specifies a wrapper to a state such that it exposes column distortions and the bin index of the overall distortion.
- class state.StateIterator(values: List)¶
StateIterator class. Helper class to iterate over the columns of a State object
- __init__(values: List)¶
- __len__()¶
Returns the total number of items in the iterator :return:
- property at: Any¶
Returns the value of the iterator at the current position without incrementing the position of the iterator :return: Any
- property finished: bool¶
Returns true if the iterator is exhausted :return:
- class state.State¶
Helper to represent a State
- __contains__(column_name: str) bool ¶
Returns true if column_name is in the column_distortions keys
- Parameters
column_name (The column name to query) –
- Returns
A boolean indicating if column_name is in the column_distortions
keys or not.
- __getitem__(name: str) float ¶
Get the distortion corresponding to the name-th column
- Parameters
name (The name of the column) –
- Return type
The column distortion
- __init__()¶
discrete_state_environment¶
RL Environment API taken from https://github.com/deepmind/dm_env/blob/master/dm_env/_environment.py
Classes
|
Configuration for discrete environment |
|
The DiscreteStateEnvironment class. |
tiled_environment¶
time_step¶
Module time_step. Specifies a wrapper for representing a step in the environment
- time_step.copy_time_step(time_step: TimeStep, **copy_options) TimeStep ¶
Helper to copy partly or in whole a TimeStep namedtuple. If copy_options is None or empty it returns a deep copy of the given time step
- Parameters
time_step (The time step to copy) –
copy_options (Members to be copied) –
- Return type
An instance of the TimeStep namedtuple
- class time_step.StepType(value)¶
Defines the status of a TimeStep within a sequence.
- class time_step.TimeStep(step_type, info, reward, discount, observation)¶
multiprocess_env¶
Module multiprocess_env. Specifies a vectorized environment where each instance of the environment is run independently. The implementation of the environment is taken from the book Grokking Deep Reinforcement Learning Algorithms by Manning publications
- class multiprocess_env.MultiprocessEnv(env_builder: Callable, env_args: dict, n_workers: int)¶
MultiprocessEnv class
- __init__(env_builder: Callable, env_args: dict, n_workers: int)¶
- __len__() int ¶
The number of workers handled by this instance
- _broadcast_msg(msg)¶
Broadcast the message to all workers
- Parameters
msg –
- _send_msg(msg: Any, rank: int)¶
Send the message to the process with the given rank
- Parameters
msg (The message to send) –
rank (The rank of the proces to send the message) –
- make(agent: Agent)¶
Create the workers
- work(rank, env_builder: Callable, env_args: dict, agent: Agent, pipe_end) None ¶
The worker function
- Parameters
rank (The rank of the worker) –
env_builder (The callable that builds the worker environment) –
env_args (The callable arguments) –
worker_end –
- Return type
None
replay_buffer¶
- class replay_buffer.ReplayBuffer(buffer_size: int)¶
The ReplayBuffer class. Models a fixed size replay buffer. The buffer is represented by using a deque from Python’s built-in collections library. This is basically a list that we can set a maximum size. If we try to add a new element whilst the list is already full, it will remove the first item in the list and add the new item to the end of the list. Hence new experiences replace the oldest experiences. The experiences themselves are tuples of (state1, reward, action, state2, done) that we append to the replay deque and they are represented via the named tuple ExperienceTuple
- __getitem__(name_attr: str) List ¶
Return the full batch of the name_attr attribute
- Parameters
name_attr (The name of the attribute to collect the) –
values (batch) –
- Return type
A list
- __init__(buffer_size: int)¶
Constructor
- Parameters
buffer_size (The maximum capacity of the buffer) –
- __len__() int ¶
Return the current size of the internal memory.
- add(state: Any, action: Any, reward: Any, next_state: Any, done: Any, info: dict = {}) None ¶
Add a new experience tuple in the buffer
- Parameters
state (The current state) –
action (The action taken) –
reward (The reward observed) –
next_state (The next state observed) –
done (Whether the episode is done) –
info (Any other info needed) –
- Return type
None
- get_item_as_torch_tensor(name_attr: str) Tensor ¶
Returns a torch.Tensor representation of the the named item
- Parameters
name_attr (The name of the attribute) –
- Return type
An instance of torch.Tensor
- reinitialize() None ¶
Reinitialize the internal buffer
- Return type
None
- sample(batch_size: int) List[ExperienceTuple] ¶
Randomly sample a batch of experiences from memory.
- Parameters
batch_size (The batch size we want to sample) –
- Return type
A list of ExperienceTuple
trainer¶
Module trainer. Specifies a utility class for training serial reinforcement learning algorithms
- class trainer.TrainerConfig(n_episodes: int = 1, output_msg_frequency: int = - 1)¶
- class trainer.Trainer(env: Env, agent: Agent, configuration: TrainerConfig)¶
- __init__(env: Env, agent: Agent, configuration: TrainerConfig) None ¶
Constructor. Initialize a trainer by passing the training environment instance the agen to train and configuration dictionary
- Parameters
env (The environment to train the agent) –
agent (The agent to train) –
configuration (Configuration parameters for the trainer) –
- actions_after_episode_ends(env: Env, episode_idx: int, **options) None ¶
Any actions after the training episode ends
- Parameters
env (The environment to train on) –
episode_idx (The training episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_episode_begins(env: Env, episode_idx: int, **options) None ¶
Perform any actions necessary before the training begins
- Parameters
env (The environment to train on) –
episode_idx (The training episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_training() None ¶
Any actions to perform before training begins
- Return type
None
- avg_distortion() array ¶
Returns the average reward per episode :return:
- avg_rewards() array ¶
Returns the average reward per episode :return:
- train() None ¶
Train the agent on the given environment
- Return type
None
pytorch_trainer¶
Module pytorch_multi_process_trainer. Specifies a trainer for PyTorch-based models.
- pytorch_trainer.worker(worker_idx: int, worker_model: Module, params: dir)¶
Executes the process work
- Parameters
worker_idx (The id of the worker) –
worker_model (The model the worker is using) –
params (Parameters needed) –
- class pytorch_trainer.PyTorchTrainerConfig(n_procs: int = 1, n_episodes: int = 100)¶
Configuration for PyTorchMultiProcessTrainer
- class pytorch_trainer.PyTorchTrainer(env: Env, agent: Agent, config: PyTorchTrainerConfig)¶
The class PyTorchMultiProcessTrainer. Trainer for multiprocessing with PyTorch
- __init__(env: Env, agent: Agent, config: PyTorchTrainerConfig) None ¶
Constructor. Initialize a trainer by passing the training environment instance the agent to train and configuration dictionary
- Parameters
agent (The agent to train) –
config (Configuration parameters for the trainer) –
- actions_after_episode_ends(env: Env, episode_idx: int, **options) None ¶
Any actions after the training episode ends
- Parameters
env (The environment to train on) –
episode_idx (The training episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_episode_begins(env: Env, episode_idx: int, **options) None ¶
Perform any actions necessary before the training begins
- Parameters
env (The environment to train on) –
episode_idx (The training episode index) –
options (Any options passed by the client code) –
- Return type
None
- actions_before_training() None ¶
Any actions to perform before training begins
- Return type
None
- avg_distortion() array ¶
Returns the average reward per episode :return:
- avg_rewards() array ¶
Returns the average reward per episode :return:
iteration_control¶
module iteration_control. Utility to control iteration
function_wraps¶
- function_wraps.time_func(fn: Callable)¶
- Execute the given callable and time the time
it took to execute
- Parameters
fn (Callable to execute) –
episode_info¶
Module episode_info. Specifies the dataclass EpisodeInfo that is used as the return item of on_episode() agent function to wrap episode results. This is a helper class to wrap the output after an episode has finished
- class episode_info.EpisodeInfo(episode_itrs: int = 0, episode_score: float = 0.0, total_distortion: float = 0.0, total_execution_time: float = 0.0, info: dict = <factory>)¶
mixins¶
module mixins. Various mixin classes to use for simplifying code
- class mixins.WithHierarchyTable¶
- __init__() None ¶
- add_hierarchy(key: str, hierarchy: Hierarchy) None ¶
Add a hierarchy for the given key :param key: The key to attach the Hierarchy :param hierarchy: The hierarchy to attach :return: None
- finished() bool ¶
Returns true if the action has exhausted all its transforms :return:
- reset_iterators()¶
Reinitialize the iterators in the table :return:
- class mixins.WithQTableMixinBase(table: Optional[QTable] = None)¶
Base class to impose the concept of Q-table
- __init__(table: Optional[QTable] = None)¶
- class mixins.WithQTableMixin(table: Optional[QTable] = None)¶
Helper class to associate a q_table with an algorithm
- __init__(table: Optional[QTable] = None)¶
Constructor
- Parameters
table (The Q-table representing the Q-function) –
- class mixins.WithMaxActionMixin(table: Optional[QTable] = None)¶
The class WithMaxActionMixin.
- __init__(table: Optional[QTable] = None)¶
Constructor
- Parameters
table (The Q-table representing the Q-function) –
- max_action(state: Any, n_actions: int) int ¶
Return the action index that presents the maximum value at the given state :param state: state index :param n_actions: Total number of actions allowed :return: The action that corresponds to the maximum value
- class mixins.WithEstimatorMixin¶
reward_manager¶
module reward_manager specifies a class that handles the rewards awarded by the environment.
- class reward_manager.RewardManager(bounds: tuple, out_of_max_bound_reward: float, out_of_min_bound_reward: float, in_bounds_reward: float, punish_factor: float, min_distortions: Any, max_distortions: Any)¶
The RewardManager class
- __init__(bounds: tuple, out_of_max_bound_reward: float, out_of_min_bound_reward: float, in_bounds_reward: float, punish_factor: float, min_distortions: Any, max_distortions: Any) None ¶
- get_reward_for_state(total_distortion: float, current_state: State, next_state: State, min_dist_bins: Any, **options) float ¶
Returns a user specified reward signal depending on the state and the options given
- Parameters
state –
options –
serial_hierarchy¶
module serial_hierarchy. A SerialHierarchy represents a hierarchy of transformations that are applied one after the other
- class serial_hierarchy.SerialHierarchy(values: dict)¶
A SerialHierarchy represents a hierarchy of transformations that are applied one after the other. Applications should explicitly provide the list of the ensuing transformations. For example assume that the data field has the value ‘foo’ then values
- __getitem__(item)¶
Returns the item-th item :param item: :return:
- __init__(values: dict) None ¶
Constructor. Initialize the hierarchy by passing the list of the ensuing transformations. :param values:
- __len__()¶
Returns the size of the hierarchy :return:
- __setitem__(key, value)¶
Set the key-th item to the given value. If the key-th item has already been set it overrides the existing value :param key: :param value: :return: