DocsUser GuidesExploration Using Epsilon

Introduction

Exploring your prediction space by making a portion of your recommendations at random on an on going basis has a number of benefits. These include having a clean set of data to use for further modelling, mitigating fixation for Dynamic Interaction algorithms and providing a performance baseline. This type of exploration is known as \(\epsilon\) exploration and is configured by setting an \(\epsilon\) parameter which is the proportion of recommendations that are made at random.

There are currently three approaches for implementing \(\epsilon\) exploration in the ecosystem.Ai platform:

  1. Project level \(\epsilon\) which randomises the order of the items recommended in the post scoring logic.
  2. Model level \(\epsilon\) for the Ecosystem Rewards algorithm which randomises the arm reward generated by the Ecoystem Rewards algorithm before passing the results to the post scoring logic.
  3. The \(\epsilon\)-greedy Dynamic Interactions algorithm which presents randomised offers to a proportion of customers and presents the best performing offers to the rest of the customers.

Below we outline the implementation of each of these approaches to highlight to advantages and disadvantages of each approach.

Project level epsilon

🔥
Note
  • There is a known issue with Project level epsilon when using the Eccosystem Rewards Dynamic Interaction algorithm with runtime versions 0.9.5.0 and earlier. In these past versions of the runtime explore in params is not set correctly when using the Ecosystem Rewards algorithm.
  • To work around this issue, explore should be set in the the post scoring logic using the project level epsilon value.

Configuring epsilon

Project level \(\epsilon\) is configured in the Deployment settings. The \(\epsilon\) parameter is a number between 0 and 1 which specifies the proportion of recommendations that should be made at random. For example, if \(\epsilon\) is set to 0.1, then 10% of the recommendations will be made at random. In order to set the \(\epsilon\) parameter in the workbench enable New Knowledge in the Deployment and set the epsilon value in the accordion that appears at the bottom of the screen. To set the \(\epsilon\) parameter in the python package, set the epsilon parameter in the define_deployment_multi_armed_bandit function. The following truncated example shows how to set the \(\epsilon\) parameter in the python package:

from prediction.apis import deployment_management as dm
from prediction.apis import ecosystem_generation_engine as ge
#Configure epsilon
new_knowledge = dm.define_deployment_multi_armed_bandit(epsilon=0.1)
#Create a deployment using the configured value of epsilon
deployment_step = dm.create_deployment(
    auth,
    project_id=project_id,
    deployment_id=deployment_id,
    version=version,
    plugin_post_score_class="PlatformDynamicEngagement.java",
    plugin_pre_score_class="PreScoreDynamic.java",
    scoring_engine_path_dev=runtime_path,
    parameter_access=parameter_access,
    multi_armed_bandit=new_knowledge,
    setup_offer_matrix=offer_matrix,
)
#Push the deployment and print the resulting properties file
push_result = ge.process_push(auth,deployment_step)
if "ErrorMessage" in push_result:
    print(push_result["ErrorMessage"])
else:
    print(push_result["properties"])

The properties file that is generated by pushing the configured Deployment will contain the following entry:

predictor.epsilon=0.1

This value will be used by the runtime to allocate API calls for the exploration approach. These changes will require a push or /refresh of the deployment to take effect.

How epsilon is used

The \(\epsilon\) parameter is used to allocate API calls for exploration during the scoring process. As part of the scoring process a random number is generated and compared to the \(\epsilon\) parameter. If the random number is less than the \(\epsilon\) parameter, then the API call is allocated for exploration. If an item is allocated for exploration, the explore integer in the params JSONObject is set to 1, otherwise it is set to 0. If the getTopScores method is then used in the post scoring logic to generate the object that is returned then this value of explore will automatically be used to determine whether the top scores are selected randomly or not. The following code snippet shows how this is done:

	/**
	 * @param params
	 * @param predictResult
	 * @return
	 */
	private static JSONObject getTopScores(JSONObject params, JSONObject predictResult) {
		int resultCount = 1;
		if (params.has("resultcount")) resultCount = params.getInt("resultcount");
		if (predictResult.getJSONArray("final_result").length() <= resultCount)
			resultCount = predictResult.getJSONArray("final_result").length();
 
		/* depending on epsilon and mab settings */
		if (params.getInt("explore") == 0) {
			predictResult.put("final_result", getSelectedPredictResult(predictResult, resultCount));
			predictResult.put("explore", 0);
		} else {
			predictResult.put("final_result", getSelectedPredictResultRandom(predictResult, resultCount));
			predictResult.put("explore", 1);
		}
		return predictResult;
	}

The explore integer can also be extracted from the params JSONObject earlier in the post scoring logic and used to construct additonal logic if required.

Model level epsilon for Ecosystem Rewards

Configuring epsilon

Model level \(\epsilon\) can be set for the Ecosystem Rewards algorithm as part of the Dynamic Interaction set up in the engagement tab. \(\epsilon\) can be set in the Advanced Setting accordion. Once \(\epsilon\) is set and the configuration has been saved, the value of \(\epsilon\) should be set if in randomisation object of the configuration stored in the dynamic_engagement collection in the ecosystem_meta database. This change will require a push or /refresh of the deployment to take effect. The following truncated example shows how to set the \(\epsilon\) parameter in the python package:

from prediction.apis import online_learning_management as ol
from prediction.apis import deployment_management as dm
from prediction.apis import ecosystem_generation_engine as ge
#Configure epsilon
online_learning_uuid = ol.create_online_learning(
        auth,
        name=deployment_id,
        description=dynamic_interaction_description,
        feature_store_collection=ol_feature_store_collection,
        feature_store_database=ol_feature_store_database,
        options_store_database=options_collection,
        options_store_collection=options_db,
        randomisation_success_reward = 0.5,
        randomisation_fail_reward = 0.05,
        randomisation_processing_count = 200,
        randomisation_processing_window = 604800000,
        randomisation_epsilon = 0.1,
        contextual_variables_offer_key="offer"
)
#Configure the deployment to use the Dynamic Interaction configuration
new_knowledge = dm.define_deployment_multi_armed_bandit(epsilon=0, dynamic_interaction_uuid=online_learning_uuid)
#Create a deployment using the configured value of epsilon
deployment_step = dm.create_deployment(
    auth,
    project_id=project_id,
    deployment_id=deployment_id,
    version=version,
    plugin_post_score_class="PlatformDynamicEngagement.java",
    plugin_pre_score_class="PreScoreDynamic.java",
    scoring_engine_path_dev=runtime_path,
    parameter_access=parameter_access,
    multi_armed_bandit=new_knowledge,
    setup_offer_matrix=offer_matrix,
)
#Push the deployment and print the resulting properties file
push_result = ge.process_push(auth,deployment_step)
if "ErrorMessage" in push_result:
    print(push_result["ErrorMessage"])
else:
    print(push_result["properties"])

How epsilon is used

The \(\epsilon\) parameter is used to allocate API calls for exploration during the scoring process. As part of the scoring process a random number is generated and compared to the \(\epsilon\) parameter. If the random number is less than the \(\epsilon\) parameter, then the API call is allocated for exploration. If an item is allocated for exploration, then the arm_reward score is set to a number sampled from a uniform distribution between 0 and 1 instead of being sampled from the Beta distribution defined by the \(\alpha\) and \(\beta\) parameters.

In contrast to the project level \(\epsilon\) approach, the model level \(\epsilon\) approach applies the exploration before the post scoring logic so the scope for further processing and additional logic is limited.

Epsilon-greedy Dynamic Interactions

The \(\epsilon\)-greedy Dynamic Interactions algorithm uses \(\epsilon\) exploration as the core prediction approach rather than as an addition to another approach as in the previous two approaches. The details of the algorithm are described in the Dynamic Interactions section.

Last updated on