Introduction
Exploring your prediction space by making a portion of your recommendations at random on an on going basis has a number of benefits. These include having a clean set of data to use for further modelling, mitigating fixation for Dynamic Interaction algorithms and providing a performance baseline. This type of exploration is known as \(\epsilon\) exploration and is configured by setting an \(\epsilon\) parameter which is the proportion of recommendations that are made at random.
There are currently three approaches for implementing \(\epsilon\) exploration in the ecosystem.Ai platform:
- Project level \(\epsilon\) which randomises the order of the items recommended in the post scoring logic.
- Model level \(\epsilon\) for the Ecosystem Rewards algorithm which randomises the arm reward generated by the Ecoystem Rewards algorithm before passing the results to the post scoring logic.
- The \(\epsilon\)-greedy Dynamic Interactions algorithm which presents randomised offers to a proportion of customers and presents the best performing offers to the rest of the customers.
Below we outline the implementation of each of these approaches to highlight to advantages and disadvantages of each approach.
Project level epsilon
Configuring epsilon
Project level \(\epsilon\) is configured in the Deployment settings. The \(\epsilon\) parameter is a number between 0 and 1 which specifies the proportion of recommendations that should be made at random. For example, if \(\epsilon\) is set to 0.1, then 10% of the recommendations will be made at random. In order to set the \(\epsilon\) parameter in the workbench enable New Knowledge in the Deployment and set the epsilon value in the accordion that appears at the bottom of the screen. To set the \(\epsilon\) parameter in the python package, set the epsilon
parameter in the define_deployment_multi_armed_bandit
function. The following truncated example shows how to set the \(\epsilon\) parameter in the python package:
from prediction.apis import deployment_management as dm
from prediction.apis import ecosystem_generation_engine as ge
#Configure epsilon
new_knowledge = dm.define_deployment_multi_armed_bandit(epsilon=0.1)
#Create a deployment using the configured value of epsilon
deployment_step = dm.create_deployment(
auth,
project_id=project_id,
deployment_id=deployment_id,
version=version,
plugin_post_score_class="PlatformDynamicEngagement.java",
plugin_pre_score_class="PreScoreDynamic.java",
scoring_engine_path_dev=runtime_path,
parameter_access=parameter_access,
multi_armed_bandit=new_knowledge,
setup_offer_matrix=offer_matrix,
)
#Push the deployment and print the resulting properties file
push_result = ge.process_push(auth,deployment_step)
if "ErrorMessage" in push_result:
print(push_result["ErrorMessage"])
else:
print(push_result["properties"])
The properties file that is generated by pushing the configured Deployment will contain the following entry:
predictor.epsilon=0.1
This value will be used by the runtime to allocate API calls for the exploration approach. These changes will require a push or /refresh
of the deployment to take effect.
How epsilon is used
The \(\epsilon\) parameter is used to allocate API calls for exploration during the scoring process. As part of the scoring process a random number is generated and compared to the \(\epsilon\) parameter. If the random number is less than the \(\epsilon\) parameter, then the API call is allocated for exploration. If an item is allocated for exploration, the explore
integer in the params
JSONObject is set to 1, otherwise it is set to 0. If the getTopScores method is then used in the post scoring logic to generate the object that is returned then this value of explore
will automatically be used to determine whether the top scores are selected randomly or not. The following code snippet shows how this is done:
/**
* @param params
* @param predictResult
* @return
*/
private static JSONObject getTopScores(JSONObject params, JSONObject predictResult) {
int resultCount = 1;
if (params.has("resultcount")) resultCount = params.getInt("resultcount");
if (predictResult.getJSONArray("final_result").length() <= resultCount)
resultCount = predictResult.getJSONArray("final_result").length();
/* depending on epsilon and mab settings */
if (params.getInt("explore") == 0) {
predictResult.put("final_result", getSelectedPredictResult(predictResult, resultCount));
predictResult.put("explore", 0);
} else {
predictResult.put("final_result", getSelectedPredictResultRandom(predictResult, resultCount));
predictResult.put("explore", 1);
}
return predictResult;
}
The explore
integer can also be extracted from the params
JSONObject earlier in the post scoring logic and used to construct additonal logic if required.
Model level epsilon for Ecosystem Rewards
Configuring epsilon
Model level \(\epsilon\) can be set for the Ecosystem Rewards algorithm as part of the Dynamic Interaction set up in the engagement tab. \(\epsilon\) can be set in the Advanced Setting accordion. Once \(\epsilon\) is set and the configuration has been saved, the value of \(\epsilon\) should be set if in randomisation
object of the configuration stored in the dynamic_engagement
collection in the ecosystem_meta
database. This change will require a push or /refresh
of the deployment to take effect. The following truncated example shows how to set the \(\epsilon\) parameter in the python package:
from prediction.apis import online_learning_management as ol
from prediction.apis import deployment_management as dm
from prediction.apis import ecosystem_generation_engine as ge
#Configure epsilon
online_learning_uuid = ol.create_online_learning(
auth,
name=deployment_id,
description=dynamic_interaction_description,
feature_store_collection=ol_feature_store_collection,
feature_store_database=ol_feature_store_database,
options_store_database=options_collection,
options_store_collection=options_db,
randomisation_success_reward = 0.5,
randomisation_fail_reward = 0.05,
randomisation_processing_count = 200,
randomisation_processing_window = 604800000,
randomisation_epsilon = 0.1,
contextual_variables_offer_key="offer"
)
#Configure the deployment to use the Dynamic Interaction configuration
new_knowledge = dm.define_deployment_multi_armed_bandit(epsilon=0, dynamic_interaction_uuid=online_learning_uuid)
#Create a deployment using the configured value of epsilon
deployment_step = dm.create_deployment(
auth,
project_id=project_id,
deployment_id=deployment_id,
version=version,
plugin_post_score_class="PlatformDynamicEngagement.java",
plugin_pre_score_class="PreScoreDynamic.java",
scoring_engine_path_dev=runtime_path,
parameter_access=parameter_access,
multi_armed_bandit=new_knowledge,
setup_offer_matrix=offer_matrix,
)
#Push the deployment and print the resulting properties file
push_result = ge.process_push(auth,deployment_step)
if "ErrorMessage" in push_result:
print(push_result["ErrorMessage"])
else:
print(push_result["properties"])
How epsilon is used
The \(\epsilon\) parameter is used to allocate API calls for exploration during the scoring process. As part of the scoring process a random number is generated and compared to the \(\epsilon\) parameter. If the random number is less than the \(\epsilon\) parameter, then the API call is allocated for exploration. If an item is allocated for exploration, then the arm_reward
score is set to a number sampled from a uniform distribution between 0 and 1 instead of being sampled from the Beta distribution defined by the \(\alpha\) and \(\beta\) parameters.
In contrast to the project level \(\epsilon\) approach, the model level \(\epsilon\) approach applies the exploration before the post scoring logic so the scope for further processing and additional logic is limited.
Epsilon-greedy Dynamic Interactions
The \(\epsilon\)-greedy Dynamic Interactions algorithm uses \(\epsilon\) exploration as the core prediction approach rather than as an addition to another approach as in the previous two approaches. The details of the algorithm are described in the Dynamic Interactions section.