Skip to Content
DocsModulesTwo-TowerArchitecture & Theory

Architecture & Theory

The dual-encoder idea

A two-tower model learns two functions that project different entities into a single shared vector space:

  • the user tower \(f_u(\cdot)\) maps a customer’s features \(x_u\) to a vector \(u = f_u(x_u) \in \mathbb{R}^p\),
  • the item tower \(f_i(\cdot)\) maps an offer’s features \(x_i\) to a vector \(v = f_i(x_i) \in \mathbb{R}^p\).

The model is trained so that customers and the offers they engage with land close together in that space. Compatibility is then just a similarity between two vectors.

The similarity score

ecosystem.Ai scores a customer/offer pair with the dot product of the L2-normalized embeddings:

\[\hat{u} = \frac{u}{\lVert u \rVert_2}, \qquad \hat{v} = \frac{v}{\lVert v \rVert_2}, \qquad \text{score}(u, v) = \hat{u} \cdot \hat{v}\]

Because both vectors are unit length, the dot product equals the cosine similarity, bounded in \([-1, 1]\). Higher means more compatible.

L2-normalization makes scores comparable across customers and offers (vector magnitude no longer affects ranking, only direction), and it makes the Java runtime’s cosine and the training-time dot product agree.

The two towers in ecosystem.Ai

The built-in workbench implementation trains both towers as H2O Deep Learning networks and reads the first hidden layer as the embedding.

TowerInputs (features)TargetEmbedding source
Usercustomer_id, price, rank, scoreaccepted (0/1)deepfeatures(layer=0)
Itemoffer, price, rank, scoreaccepted (0/1)deepfeatures(layer=0)

Each tower is a classifier of “did the customer accept?”, and the hidden-layer activations become the embedding. With hidden=[embedding_dim], the single hidden layer is the p-dimensional vector.

Retrieval versus ranking

Two-tower models shine at retrieval: with item vectors precomputed, finding the best offers for a customer is a nearest-neighbour search over vectors — cheap even across very large catalogues.

PropertyTwo-tower (retrieval)Cross-feature ranker
User/item interactionlate (dot product only)early (joint features)
Item vectors precomputableyesno
Cost per candidateone dot producta full model score
Typical useshortlist thousands → hundredsre-rank a small shortlist

In ecosystem.Ai the same dot-product is used directly for ranking the offer matrix at request time, because the offer matrix is already a curated candidate set. For very large catalogues you would precompute item vectors and use an approximate nearest-neighbour (ANN) index in front of the runtime.

Engine independence

Nothing about the scoring math depends on how the towers were trained. The runtime consumes embedding vectors:

  • H2O Deep Learning towers (workbench) → deepfeatures(layer=0) + L2 norm.
  • PyTorch towers (notebooks sidecar) → the tower’s output vector.

Both produce p-dimensional vectors that the runtime compares with cosine / dot product. See Model Training and PyTorch Serving.

Last updated on