EUOS25 challenge 優勝の技術的背景

2026年2月に開催された「EUOS25 challenge」において、私たちのチーム（Team microsomes）が蛍光予測部門で優勝しました。本記事では、優勝の鍵となったマルチモーダル戦略と、連鎖的スタッキング・アーキテクチャについて解説します。

コンテストの概要

約10万化合物のライブラリを対象に、化合物の光学特性（Optical Properties）を予測するタスクでした。コンテストは以下の2つの独立したトラックで構成されていました。

透過率（Transmittance）予測
蛍光（Fluorescence）予測

私たちは、物理的因果関係を重視したモデル設計により、蛍光予測において最高精度を達成しました。

優勝の鍵：Multimodal Strategy

本コンテストでは、化合物を単一の表現手法で捉えるのではなく、物理化学・グラフ構造・記述子という異なる視点から統合的に捉える「マルチモーダル」なアプローチを採用しました。

Precision QM Strategy (量子化学)
- MOPAC（6種類のハミルトニアン）と MACE-xTB ワークフローを組み合わせた精密な量子化学計算を実行。
- DMSO溶媒環境下での熱力学物性（HOMO-LUMOギャップ等）を算出し、光学的特性に直結する物理的エビデンスをモデルに与えました。
Massive Informatics (大規模記述子)
- 1,800次元超の Mordred 記述子に加え、長距離相関を捉えるための PathCounts（最大50次）を独自拡張。
- $\pi$共役系の品質を評価する Conjugation Features を統合し、蛍光予測に特化した特徴量空間を構築しました。
GNN Feature Generation (グラフニューラルネットワーク)
- カスタムの GINE-Net を活用し、グラフ構造から抽出されたエムベディングを特徴量として統合しました。

Architecture: Sequential Stacking (連鎖的スタッキング)

蛍光（F）という現象が透過（T）というプロセスに物理的に依存している点に着目し、その因果関係を模倣した 「Sequential Stacking」 アーキテクチャを構築しました。

Tier 1: 全特徴量から、透過率と蛍光の初期予測値（OOF Predictions）を生成。
Tier 2 (Chaining):
- 前段で得られた「透過率の予測値」を、蛍光予測モデルの新たな入力として動的に注入（Dynamic Injection）。
- 現象の物理的依存性をモデル構造自体に組み込むことで、単なる統計的相関を超えた精度の向上を達成しました。

厳密な検証と実証

過学習を防ぐため、4x4 Nested Cross-Validation による厳密な性能評価を実施しました。SHAP値を用いた解析により、量子化学記述子とGNNの特徴量が相補的に機能していることを実証しています。

本成果の詳細は、学術誌 SLAS Technology へ投稿準備中です。[Status: In preparation]

Technical Review: EUOS25 Challenge Win

← Back to Blog List

In February 2026, our team (Team microsomes) won the Fluorescence Prediction Track of the 2nd Joint Machine Learning Challenge. This post details the multimodal strategy and sequential stacking architecture that led to our success in predicting molecular optical properties.

Challenge Overview

The objective was to predict the optical properties of a diverse library comprising approximately 100,000 compounds. The challenge consisted of two distinct tracks:

Transmittance Prediction
Fluorescence Prediction

By focusing on physically-inspired model design, we achieved the highest accuracy in the Fluorescence prediction category.

Key to Winning: Multimodal Strategy

Instead of relying on a single molecular representation, we adopted a multimodal approach that integrates physicochemical, structural, and descriptor-based perspectives:

Precision QM Strategy (Quantum Chemistry)
- Executed precise quantum chemical calculations using 6 MOPAC Hamiltonians and a MACE-xTB workflow.
- Calculated thermodynamic properties (e.g., HOMO-LUMO gap) in a simulated DMSO environment, providing direct biophysical evidence for optical property prediction.
Massive Informatics
- Utilized 1,800+ Mordred descriptors, augmented with custom PathCounts (up to Order 50).
- Integrated specialized Conjugation Features to evaluate the quality of $\pi$-conjugation systems, constructing a feature space specialized for fluorescence prediction.
GNN Feature Generation
- Leveraged a custom GINE-Net to extract and integrate structural embeddings.

Architecture: Sequential Stacking

We developed a “Sequential Stacking” architecture that mimics the physical dependency of fluorescence (F) on transmittance (T):

Tier 1: Generated Out-Of-Fold (OOF) predictions for both T and F.
Tier 2 (Chaining):
- Refined T predictions were dynamically injected as additional inputs for the F prediction model.
- Chaining these predictions based on phenomenological causality allowed the model to learn complex biophysical dependencies.

Rigorous Validation

Unbiased performance was ensured through 4x4 Nested Cross-Validation. SHAP analysis confirmed that QM-based descriptors and GNN embeddings functioned complementarily to drive high accuracy.

Full technical details and source code are being prepared for submission to SLAS Technology. [Status: In preparation]