Introduces the Transformer architecture using self-attention for sequence modeling, removing recurrence and convolution.
Read notes →Recent Reading Log
Short, opinionated notes on papers I have read recently. Each entry highlights the core idea and what I found useful.
Residual connections allow very deep networks to train by mitigating vanishing gradients.
Read notes →Normalizes layer inputs to reduce internal covariate shift and stabilize training.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →Imported from your local PDF library. Add a summary here.
Read notes →