“to learn dependency beyond a fixed length without disrupting temporal coherenc”

[1901.02860] Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
http://jhavelikes.tumblr.com/post/184177841820

Comments are closed.