{"id":23973,"date":"2024-07-30T10:09:09","date_gmt":"2024-07-30T08:09:09","guid":{"rendered":"https:\/\/info.gwdg.de\/news\/?p=23973"},"modified":"2024-07-30T10:09:21","modified_gmt":"2024-07-30T08:09:21","slug":"xlstm-the-one-to-overcome-transformers","status":"publish","type":"post","link":"https:\/\/info.gwdg.de\/news\/xlstm-the-one-to-overcome-transformers\/","title":{"rendered":"xLSTM, The One To Overcome Transformers?"},"content":{"rendered":"<p dir=\"auto\" data-sourcepos=\"5:1-5:581\">In the current iteration of our internal journal club, Jonathan Decker presented the paper <a href=\"https:\/\/arxiv.org\/abs\/2405.04517\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" data-sourcepos=\"5:92-5:200\" class=\"external\">&#8222;xLSTM: Extended Long Short-Term Memory&#8220; by Beck et al. published in 2024<\/a>. If you are interested in exploring xLSTM on our cluster, Jonathan has prepared a small example on our NHR cluster Grete that you can use as a starting point for your own experiments. The code below was executed on <a href=\"https:\/\/docs.hpc.gwdg.de\/usage_guide\/slurm\/gpu_usage\/index.html\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" data-sourcepos=\"5:417-5:490\" class=\"external\">glogin9<\/a> using the <a href=\"https:\/\/docs.hpc.gwdg.de\/software\/nhr_lmod\/index.html\" target=\"_blank\" rel=\"nofollow noreferrer noopener\" data-sourcepos=\"5:502-5:580\" class=\"external\">new NHR software stack<\/a>.<\/p>\n<pre class=\"code highlight\" lang=\"shell\"><span class=\"nb\">export <\/span><span class=\"nv\">PREFERRED_SOFTWARE_STACK<\/span><span class=\"o\">=<\/span>nhr-lmod\r\n<span class=\"nb\">source<\/span> \/sw\/etc\/profile\/profile.sh\r\n\r\ngit clone https:\/\/github.com\/NX-AI\/xlstm\r\n<span class=\"nb\">cd <\/span>xlstm\r\nmodule load miniconda3\r\nconda <span class=\"nb\">env <\/span>create <span class=\"nt\">-p<\/span> \/scratch-grete\/usr\/<span class=\"nv\">$USER<\/span>\/.conda\/envs\/xlstm <span class=\"nt\">-f<\/span> environment_pt220cu121.yaml \r\n<span class=\"nb\">source <\/span>activate \/scratch-grete\/usr\/<span class=\"nv\">$USER<\/span>\/.conda\/envs\/xlstm\r\npip <span class=\"nb\">install <\/span><span class=\"nv\">numpy<\/span><span class=\"o\">==<\/span>1.26.4\r\nmodule load cuda\r\n\r\nsrun <span class=\"nt\">-p<\/span> grete <span class=\"nt\">--pty<\/span> <span class=\"nt\">-n<\/span> 1 <span class=\"nt\">-c<\/span> 64 <span class=\"nt\">-t<\/span> 1:00:00 <span class=\"nt\">-G<\/span> A100:1 bash\r\n<span class=\"nb\">export <\/span><span class=\"nv\">PYTHONPATH<\/span><span class=\"o\">=<\/span><span class=\"si\">$(<\/span><span class=\"nb\">pwd<\/span><span class=\"si\">)<\/span>\r\npython experiments\/main.py <span class=\"nt\">--config<\/span> experiments\/parity_xlstm10.yaml\r\n<\/pre>\n<h3 lang=\"shell\">Author<\/h3>\n<p lang=\"shell\"><a href=\"mailto:jonathan.decker@uni-goettingen.de\" data-sourcepos=\"3:9-3:67\">Jonathan Decker<\/a> | <a href=\"mailto:hauke.kirchner@gwdg.de\" data-sourcepos=\"3:71-3:117\">Hauke Kirchner<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the current iteration of our internal journal club, Jonathan Decker presented the paper &#8222;xLSTM: Extended Long Short-Term Memory&#8220; by Beck et al. published in 2024. If you are interested in exploring xLSTM on our cluster, Jonathan has prepared a small example on our NHR cluster Grete that you can use as a starting point &#8230; <a title=\"xLSTM, The One To Overcome Transformers?\" class=\"read-more\" href=\"https:\/\/info.gwdg.de\/news\/xlstm-the-one-to-overcome-transformers\/\" aria-label=\"Mehr Informationen \u00fcber xLSTM, The One To Overcome Transformers?\">Weiterlesen<\/a><\/p>\n","protected":false},"author":166,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[118,133],"tags":[],"class_list":["post-23973","post","type-post","status-publish","format-standard","hentry","category-gwdg-nachrichten-2","category-kuenstliche-intelligenz"],"_links":{"self":[{"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/posts\/23973","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/users\/166"}],"replies":[{"embeddable":true,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/comments?post=23973"}],"version-history":[{"count":3,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/posts\/23973\/revisions"}],"predecessor-version":[{"id":23977,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/posts\/23973\/revisions\/23977"}],"wp:attachment":[{"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/media?parent=23973"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/categories?post=23973"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/info.gwdg.de\/news\/wp-json\/wp\/v2\/tags?post=23973"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}