Coverage-Aware Thompson

An extension of Thompson Sampling that boosts under-exposed offers so niche and long-tail items get a fair chance alongside popular ones. It adjusts priors by exposure, draws from Beta posteriors, and applies an inverse-popularity factor; optional \(\epsilon\) mixing adds uniform exploration across the catalog.

Algorithm

Config value: "approach": "behaviorAlgos", "sub_approach": "coverageAwareThompson"

Prior adjustment (exposure-shaping of the Beta prior):

\(\beta_{\text{prior}} = 1.0 + \text{exposure}^{\gamma}\)

Thompson draw:

\(\theta \sim \mathrm{Beta}(\alpha, \beta)\)

Score (inverse popularity and optional multipliers):

\(\text{score} = \theta \cdot \frac{1}{(\text{exposure} + 1)^{\gamma}} \cdot \text{variableMultiplier}\)

Optional \(\epsilon\) exploration (after normalization):

\(\text{finalScore} = (1 - \epsilon) \cdot \text{normalized} + \frac{\epsilon}{|\text{offers}|}\)

Higher \(\epsilon\) allocates more mass uniformly across arms, improving coverage at the cost of short-term exploitation.

Parameters

gamma (\(\gamma\)): Popularity-penalty exponent; controls how strongly under-exposed offers are boosted. Default: 1.0.
epsilon (\(\epsilon\)): Additional uniform exploration over the offer set. Default: 0.0. Typical values for extra coverage: 0.05–0.1.
Processing Window: Time window in milliseconds for historical data.
Historical Count: Max records to process per update cycle.

Cold Start

Recommendations are always returned. Coverage-Aware Thompson has the strongest cold-start handling among behavioral algorithms:

No history: The RollingBehavior layer assigns uniform random scores to every offer. The algorithm itself also handles this well — default \(\alpha = 1.0\), \(\beta = 1.0\), and \(\text{exposure} = 0\) yield \(\mathrm{Beta}(1,1)\) (uniform sampling).
Inverse-popularity \((\text{exposure}+1)^{-\gamma}\) is maximized when exposure is zero, so unseen offers receive the strongest relative boost.
As data accumulates, the Beta posteriors sharpen and the inverse-popularity term naturally balances popular vs. niche offers.

The scored options are then sorted by arm_reward and handed to the configured dynamic post-score class, which controls the final offer selection and response formatting.

Coverage-Aware Thompson is the best cold-start choice when catalog fairness and tail coverage matter. Its Beta(1,1) prior and inverse-popularity boost give unseen offers maximum exposure from day one. The post-score class determines the final presentation.

When To Use

Catalog coverage or fairness requirements (every offer should eventually get trials)
Promoting the long tail of niche offers
Mitigating popularity bias where raw counts dominate rankings

When NOT To Use

When you want to converge quickly to a single best offer with minimal exploration
When catalog coverage is not a business concern and raw performance ranking is enough

Example


from prediction.apis import deployment_management as dm
from prediction.apis import online_learning_management as ol
from prediction import jwt_access
 
auth = jwt_access.Authenticate("http://localhost:3001/api", ecosystem_username, ecosystem_password)
 
deployment_id = "demo-coverage-aware-thompson"
 
online_learning_uuid = ol.create_online_learning(
        auth,
        algorithm="ecosystem_rewards",
        name=deployment_id,
        description="Coverage-Aware Thompson configuration",
        feature_store_collection="set_up_features",
        feature_store_database="my_mongo_database",
        options_store_database="my_mongo_database",
        options_store_collection="demo-deployment_options",
        randomisation_processing_count=5000,
        randomisation_processing_window=604800000,
        contextual_variables_offer_key="offer",
        create_options_index=True,
        create_covering_index=True
)
 
online_learning = dm.define_deployment_multi_armed_bandit(epsilon=0.05, dynamic_interaction_uuid=online_learning_uuid)
 
parameter_access = dm.define_deployment_parameter_access(
    auth,
    lookup_key="customer_id",
    lookup_type="string",
    database="my_mongo_database",
    table_collection="customer_feature_store",
    datasource="mongodb"
)
 
deployment_step = dm.create_deployment(
    auth,
    project_id="demo-project",
    deployment_id=deployment_id,
    description="Coverage-Aware Thompson demo deployment",
    version="001",
    plugin_post_score_class="PlatformDynamicEngagement.java",
    plugin_pre_score_class="PreScoreDynamic.java",
    scoring_engine_path_dev="http://localhost:8091",
    mongo_connect=f"mongodb://{mongo_user}:{mongo_password}@localhost:54445/?authSource=admin",
    parameter_access=parameter_access,
    multi_armed_bandit=online_learning
)

Set approach to behaviorAlgos and sub_approach to coverageAwareThompson in the randomisation object. Configure gamma and epsilon there to tune inverse-popularity strength and uniform exploration; the Python define_deployment_multi_armed_bandit(epsilon=...) API controls deployment-level epsilon separately—keep both layers consistent with your intent.