1. UniDiffuser
1.1 Architect
1.2 Optimization
- Block:
- in-block: local
- mid-block: Transition to prevent the network from experiencing rapid gradient decay.
- out-block: global, skip connection
- Regularization
- pos-dropout
- residual dropout
- attn dropout
- proj dropout
- label smoothing: smooth one-hot encoding
- stochastic depth: skip block
- weight decay
- Generalization
- residual connection
- layer norm
- global token
- data augmentation(crop flip)
- model ensemble(multi-model combined to predict)
- transfer learning
- sufficient capacity(scale up the capacity and train steps)
1.3 Unidiffuser 的多模态对齐工作有:
虽然没有使用 cross attention,但是在 encoder 阶段相当于进行了一些图文的交互:
- GPT 2 的img embedding 作为 prefix 重构 text
- clip img 生成图像的 语义表征
decoder 阶段:
- multi head self attention,内部图文交互
- 考虑加入 cross attention 进行对齐,并进行微调
- 考虑加入 prior preservation 正则
- 考虑优化 clip text encoder 添加 modifier token
- 考虑加入 lora