Skip to Content
Meet the New ecosystem.Ai Resources Hub! 🚀
DocsConfigurationDynamic AlgorithmsOverview

Algorithms Overview

The ecosystem.Ai runtime includes a comprehensive library of dynamic interaction algorithms for real-time offer scoring and recommendation. Each algorithm implements a different strategy for learning which offers, actions, or content to present to users.

All algorithms share the same operational architecture: a background rolling process periodically updates offer statistics in the options store, and real-time scoring reads those pre-computed statistics to rank offers for each API request.

Architecture

Algorithm selection is driven by the randomisation object stored in the dynamic recommender configuration document in MongoDB:

{ "randomisation": { "approach": "binaryThompson", "sub_approach": "", "epsilon": 0.0, "success_reward": 1.0, "fail_reward": 1.0, "processing_window": 86400000, "processing_count": 5000, "decay_gamma": 1.0, "interaction_count": 0 } }

The approach field selects the top-level algorithm. When approach is behaviorAlgos, the sub_approach field selects the specific behavioral economics algorithm.

Algorithm Routing

approach ValueAlgorithmRolling Processor
binaryThompsonEcosystem Rewards (Thompson Sampling)RollingEcosystemRewards
epsilonGreedyEpsilon GreedyRollingEcosystemRewards
naiveBayesBayesian ProbabilisticRollingNaiveBayes
QLearningQ-LearningRollingQLearning
NetworkNetwork Analysis (PageRank)RollingNetwork
behaviorAlgosBehavioral Economics (see sub_approach)RollingBehavior

Behavioral Sub-Approaches

When approach = "behaviorAlgos", the sub_approach field selects one of:

sub_approach ValueAlgorithm
lossAversionLoss Aversion
riskAversionRisk Aversion
prospectTheoryProspect Theory
sentimentalEquilibriumSentimental Equilibrium
coverageAwareThompsonCoverage-Aware Thompson
longTailBoostMFLong-Tail Boost MF
generativeGenerative Model

Exploration

All algorithms share a deployment-level epsilon exploration mechanism. On each API request, the runtime rolls a random number against the configured epsilon. When exploration is triggered (explore = 1), the algorithm is bypassed entirely and all offers receive uniform random scores. This applies to every algorithm without exception.

In addition, some algorithms have their own built-in exploration that operates during normal (non-explore) scoring:

AlgorithmDeployment EpsilonAlgorithm-Level Exploration
Ecosystem Rewards (Thompson)YesAutomatic via Beta distribution overlap
Epsilon GreedyYesEpsilon IS the algorithm (fixed-rate random arm selection)
Loss AversionYesUCB exploration term boosts under-sampled offers
Risk AversionYesNone — relies on deployment epsilon for exploration
Prospect TheoryYesAdaptive drift + built-in epsilon-greedy mixing
Sentimental EquilibriumYesN/A (aggregate equilibrium, not per-offer)
Coverage-Aware ThompsonYesThompson sampling + inverse-popularity boost + optional epsilon mixing
Long-Tail Boost MFYesInverse-popularity reweighting (diversity, not explicit exploration)
Network Analysis (PageRank)YesNone — relies on deployment epsilon for exploration
Q-LearningYesEpsilon-greedy policy within Q-table
Bayesian ProbabilisticYesUniform sampling for missing offers (if configured)
Generative ModelYesLLM temperature controls stochasticity

Algorithm Comparison

AlgorithmConvergence SpeedComputational CostStochasticContext-AwareBest Use Case
Ecosystem Rewards (Thompson)MediumLowYesNoGeneral-purpose recommendations
Epsilon GreedyFast (exploit)Very LowPartialNoSimple A/B testing, low complexity
Loss AversionMedium-FastLowNoYesHigh-cost-of-rejection scenarios
Risk AversionFastVery LowNoNoConsistency-valued domains
Prospect TheoryMediumLowPartialNoMarketing with psychological modeling
Sentimental EquilibriumSingle-shotLowNoNoEngagement intensity optimization
Coverage-Aware ThompsonSlowMediumYesYesFairness / catalog coverage
Long-Tail Boost MFMediumHigh (ALS)NoYesCollaborative filtering with diversity
Network Analysis (PageRank)FastMediumNoNoInter-offer relationships
Q-LearningSlowHighPartialYes (state)Sequential decision-making
Bayesian ProbabilisticVery FastLowNoYesFeature-rich contexts
Generative ModelN/AVery High (API)YesYesComplex reasoning, experimental

Cold-Start Behavior

Recommendations are always returned, regardless of algorithm or data availability. The RollingBehavior (and equivalent Rolling classes) always produces a scored options array that is passed to the post-score class:

  • No history at all: Every offer in the options store receives a uniform random score.
  • Partial history: Offers scored by the algorithm use their computed scores; unscored offers receive a random fallback score.
  • Explore triggered: When deployment-level epsilon triggers exploration, all offers receive random scores regardless of available history.

The post-score class then controls the final offer selection, eligibility filtering, and response formatting. The table below documents what each algorithm contributes on top of this platform-level behavior.

Cold-Start Summary

AlgorithmAlgorithm-Level Cold Start BehaviorPrior / Seed MechanismQuality of Early Recommendations
Ecosystem Rewards (Thompson)Beta(1,1) samples uniformly in [0,1]. All arms get equal random chance.Configurable alpha_zero and beta_zero per arm. Default Beta(1,1) is uninformative.Good. Uniform exploration by design.
Epsilon GreedyAll arms start with arm_reward = 0 and tie. Exploit phase picks randomly among ties.None beyond deployment epsilon.Moderate. Set epsilon >= 0.1 for better early coverage.
Loss AversionNew offers get smoothing alpha = 1.5. UCB gives under-sampled offers a boost.Smoothing alpha = 1.5 + UCB exploration term.Moderate. UCB helps under-sampled offers surface.
Risk AversionDoes not score offers without history. Platform random fallback applies.None. Relies on platform-level random scoring.Random until data accumulates. Algorithm begins influencing after sufficient history.
Prospect TheorySeeds all known offers with baseDriftRate (default 0.05). Adaptive drift + epsilon-greedy mixing.baseDriftRate seed + drift + epsilon.Good. Seeding gives all offers non-zero scores from day one.
Sentimental EquilibriumComputes aggregate engagement equilibrium, not per-offer scores.Model parameters serve as the “prior”.N/A. Does not rank individual offers.
Coverage-Aware ThompsonBeta(1,1) + exposure=0 gives maximum inverse-popularity boost for unseen offers.Beta(1,1) prior + max coverage boost.Excellent. Best cold-start of all behavioral algorithms.
Long-Tail Boost MFRandom latent vectors (~0.01 Gaussian noise) for unseen users/items. Near-zero scores.Random Gaussian initialization.Random until data accumulates. Needs substantial interaction volume.
Network Analysis (PageRank)Cannot build a graph without co-occurrence data. Platform random fallback applies.Personalization based on acceptance weight (once data exists).Random until data accumulates. Needs co-occurrence data.
Q-LearningQ-table is empty. First iteration uses random state/action.initial_q = 0 + epsilon-greedy policy.Random until data accumulates. Needs sequential interactions.
Bayesian ProbabilisticRandom score for all offers, or uniform sampling for missing offers if configured.Laplace smoothing (alpha = 1.0) for unseen feature combinations.Good. Laplace handles unseen combinations well.
Generative ModelSends minimal context to LLM. Output depends on LLM behavior and prompt.Prompt and system instructions serve as the “prior”.Variable. Depends on LLM quality with minimal context.
ScenarioRecommended AlgorithmWhy
Brand new deployment, zero dataEcosystem Rewards (Thompson)Beta(1,1) prior gives uniform exploration. Converges naturally as data arrives.
New deployment, need guaranteed coverageCoverage-Aware ThompsonMaximum boost for unseen items. Thompson + inverse-popularity + optional epsilon.
Adding new offers to existing catalogEcosystem Rewards or Coverage-Aware ThompsonNew offers get default priors and are naturally explored.
New customer segment, have general historyProspect TheorySeeds all offers from product data. Adaptive drift ensures variety.
Need immediate results, no tolerance for randomnessEpsilon Greedy with high epsilon (0.3-0.5)Simple, predictable. Reduce epsilon over time.
Fastest algorithm-level convergenceRisk Aversion, Long-Tail Boost MF, PageRankThese converge quickly once data is available, but rely on platform random scoring during cold start.

Configuring Priors for Thompson Sampling

The alpha_zero and beta_zero fields in the options store control the Thompson Sampling prior. Different priors can be set per offer to encode domain knowledge:

Prioralpha_zerobeta_zeroMeaning
Uninformative1.01.0No prior belief. Uniform sampling. Default.
Optimistic2.01.0Assume the offer is probably good. Explore less.
Pessimistic1.02.0Assume the offer is probably bad. Explore more.
Strong prior (popular offer)10.05.0Equivalent to 10 successes and 5 failures. Stable from start.
Weak but positive1.51.0Slight optimism. Good for offers you believe will perform above average.

Rolling Process and Background Learning

Every dynamic interaction algorithm follows the same two-phase architecture:

  1. Background learning (write path): A scheduled process periodically reads logging data (offers presented) and response data (offers accepted), computes updated statistics for each offer arm, and writes the results to the options store in MongoDB.

  2. Real-time scoring (read path): When an API request arrives, the runtime reads the pre-computed arm statistics from the options store, applies the explore/exploit logic for the selected algorithm, and returns ranked offers. No database writes occur during scoring.

This separation means learning is decoupled from serving. The options store acts as the bridge.

Scheduling

Background learning is triggered in two ways:

Automatic Scheduler (recommended for production): The MultiCampaignScheduler runs on a Spring @Scheduled fixed-delay loop controlled by the monitoring.delay property (in seconds, default 60).

On-Demand via /learning API: Useful for forcing an immediate update after bulk data loads, testing, or manual intervention.

curl -X POST http://localhost:8091/learning

Options Store Update Cycle

Each background learning cycle performs these steps for every option:

  1. Read current options from the options store
  2. Aggregate logging data (presentations) — filtered by time window, count limit, decay
  3. Aggregate response data (acceptances) within the same window
  4. Compute arm_reward using the algorithm-specific reward strategy
  5. Write updated options back to the options store via upsert

Process vs Score

Aspectprocess() (Background)score() (Real-Time)
When calledScheduler or /learning APIEvery API scoring request
PurposeUpdate arm statisticsRank offers for a customer
Reads fromLogging + response collectionsOptions store only
Writes toOptions store + time seriesNothing (read-only)
LatencySeconds to minutesMilliseconds
Explore/exploitNot appliedApplied

For more detail on the options store structure, see Options Store. For the processing pipeline configuration, see Process.


Scenario Guide

Scenario 1: New Product Launch (Cold Start)

Problem: New catalog of offers with no interaction history.

Recommended: binaryThompson with default priors

{ "approach": "binaryThompson", "epsilon": 0.0, "success_reward": 1.0, "fail_reward": 1.0, "processing_window": 86400000, "processing_count": 1000, "decay_gamma": 1.0 }

Scenario 2: Mature Catalog with Popularity Bias

Problem: Top offers get disproportionate impressions. Long-tail offers are never shown.

Recommended: behaviorAlgos with coverageAwareThompson

{ "approach": "behaviorAlgos", "sub_approach": "coverageAwareThompson", "epsilon": 0.05, "processing_count": 5000, "decay_gamma": 1.0 }

Scenario 3: High Cost of Rejection

Problem: Showing irrelevant offers damages customer trust (financial products, insurance).

Recommended: behaviorAlgos with lossAversion

{ "approach": "behaviorAlgos", "sub_approach": "lossAversion", "processing_count": 5000, "processing_window": 604800000, "decay_gamma": 1.0 }

Scenario 4: Conservative / Regulated Industry

Problem: Consistency and predictability matter more than maximizing acceptance rate.

Recommended: behaviorAlgos with riskAversion

{ "approach": "behaviorAlgos", "sub_approach": "riskAversion", "processing_count": 5000, "processing_window": 2592000000, "decay_gamma": 1.0 }

Scenario 5: Simple A/B Testing

Problem: Basic performance tracking with minimal complexity.

Recommended: epsilonGreedy with epsilon between 0.05 and 0.20

{ "approach": "epsilonGreedy", "epsilon": 0.1, "processing_count": 1000, "processing_window": 86400000, "decay_gamma": 1.0 }

Scenario 6: Sequential Customer Journey

Problem: Optimal next offer depends on what the customer has already seen/accepted.

Recommended: QLearning

{ "approach": "QLearning", "learning_rate": 0.25, "discount_factor": 0.75, "random_action": 0.2, "max_reward": 10, "processing_count": 5000, "training_data_source": "logging" }

Scenario 7: Marketing Campaign with Behavioral Nudging

Problem: Recommendations should be informed by how humans actually make decisions.

Recommended: behaviorAlgos with prospectTheory

{ "approach": "behaviorAlgos", "sub_approach": "prospectTheory", "epsilon": 0.1, "processing_count": 5000, "processing_window": 604800000, "decay_gamma": 1.0 }

Scenario 8: Engagement Fatigue Analysis

Problem: Determine optimal frequency and intensity of customer engagement.

Recommended: behaviorAlgos with sentimentalEquilibrium

{ "approach": "behaviorAlgos", "sub_approach": "sentimentalEquilibrium", "processing_count": 5000, "decay_gamma": 1.0 }

Configuration Reference

Common Parameters

ParameterTypeDescription
approachstringTop-level algorithm: binaryThompson, epsilonGreedy, behaviorAlgos, Network, QLearning, naiveBayes
sub_approachstringBehavioral sub-algorithm (only when approach=behaviorAlgos)
epsilondoubleExploration probability. Range: 0.0 to 1.0. Typical: 0.05-0.20
success_rewarddoubleWeight for successful interactions when updating alpha. Default: 1.0
fail_rewarddoubleWeight for failed interactions when updating beta. Default: 1.0
processing_windowlongTime window in milliseconds. 0 = no limit. E.g. 86400000 (24h), 604800000 (7d)
processing_countintMax records per update cycle. 0 = no limit. Typical: 1000-10000
decay_gammadoubleGeometric decay for older interactions. 1.0 = no decay
interaction_countintCap on interactions per customer. 0 = no cap

Processing Window Examples

Use Caseprocessing_windowprocessing_countdecay_gammaEffect
Last 24 hours only8640000001.0Fresh data, responsive to changes
Last 7 days, max 500060480000050001.0Balanced freshness and volume
Last 30 days with decay259200000001.5Long history, recent data weighted more
Last 1000 interactions010001.0Fixed sample size, most recent
All time, no limit001.0Maximum data, slowest adaptation

Reward Strategy Factory

The RewardStrategyFactory maps the approach to a reward computation:

ApproachStrategyComputation
binaryThompsonThompsonRewardBeta(alpha + success * responses, beta + fail * (logs - responses)).sample()
epsilonGreedyEpsilonGreedyRewardresponse_count / logging_count
NetworkNetworkRewardReturns 1.0 (PageRank handles scoring)
behaviorAlgosBehaviorRewardReturns 1.0 (behavioral algorithms handle scoring)
QLearningQLearningRewardReturns 1.0 (Q-table handles scoring)
naiveBayesNaiveBayesRewardReturns 1.0 (Naive Bayes handles scoring)

For custom formulas, see Custom Reward Functions.

Last updated on