xLSTM, The One To Overcome Transformers?

In the current iteration of our internal journal club, Jonathan Decker presented the paper “xLSTM: Extended Long Short-Term Memory” by Beck et al. published in 2024. If you are interested in exploring xLSTM on our cluster, Jonathan has prepared a small example on our NHR cluster Grete that you can use as a starting point for your own experiments. The code below was executed on glogin9 using the new NHR software stack.

export PREFERRED_SOFTWARE_STACK=nhr-lmod
source /sw/etc/profile/profile.sh

git clone https://github.com/NX-AI/xlstm
cd xlstm
module load miniconda3
conda env create -p /scratch-grete/usr/$USER/.conda/envs/xlstm -f environment_pt220cu121.yaml 
source activate /scratch-grete/usr/$USER/.conda/envs/xlstm
pip install numpy==1.26.4
module load cuda

srun -p grete --pty -n 1 -c 64 -t 1:00:00 -G A100:1 bash
export PYTHONPATH=$(pwd)
python experiments/main.py --config experiments/parity_xlstm10.yaml

Author

Jonathan Decker | Hauke Kirchner

Categories

Archives

--