Category: Idea | Yiming Shi's blog

Hang on a sec...

Yiming Shi

Post Categories

Vision Transformer (ViT) -> Towards a Modality-Agnostic Transformer?

This blog revisits the underlying principles of the Vision Transformer (ViT) and proposes explorations on extending the Transformer architecture to other modalities. For instance, in tasks such as "Weight2Weight," prior approaches have often simply flattened weights into one-dimensional tensors, without leveraging positional encodings.

2024-09-14 Idea

Diffusion Transformer Weight2Weight