Friday, September 17, 2021

Announcing PyMarlin, a new PyTorch extension library for agile deep learning experimentation

Bill Baer, Senior Product Manager, MSAI 


We are excited to announce the release of PyMarlin! Today, AI is often bound by limitations of infrastructure, effectiveness of machine learning models, and ease of development - PyMarlin is a step closer to breaking through these barriers. 


About PyMarlin 

PyMarlin is a lightweight PyTorch extension library for agile experimentation. PyMarlin was designed with the goal of simplifying the end-to-end deep learning experimentation lifecycle, agnostic of the compute environment. 


PyMarlin enables a way to quickly prototype a new AI scenario in development environments and effortlessly scale it to multiple processes or a multi-node Azure ML cluster with no code change needed to allow for rapid acceleration of AI innovation. 


With PyMarlin we are simplifying how developers and data scientists can easily use deep learning capabilities at scale for their work.  


Some of the key features you will find in PyMarlin include: 

  • Data pre-processing module which enables data preprocessing recipes to scale from single CPU to multi-CPU and multi node.  
  • Infra-agnostic design: native Azure ML integration implies the same code running on local dev-box can also run directly on any VM or Azure ML cluster. 
  • Trainer backend abstraction with support for Single Process (CPU/GPU), distributed Data Parallel, mixed-precision (AMP, Apex) training. Microsoft offers ORT and Deepspeed libraries to get the best distributed training throughputs. Checkout this Summarization scenario demoing this: PyMarlin/ORT. We will soon be offering ORT+DS as native trainer backend for you to use directly in your scenarios. 
  • Out-of-the-box Plugins that can be used for typical NLP tasks like Sequence Classification, Named Entity Recognition and Seq2Seq text generation. 
  • Utility modules for model checkpointing, stats collection and Tensorboard events logging which can be customized based on your scenario. 
  • Custom arguments parser that allows for saving all the default values for arguments related to a scenario in a YAML config file, merging user supplied arguments at runtime. 


In addition, all core modules are thoroughly unit tested. You can learn more about PyMarlin’s library core architecture here: Hello from PyMarlin  


Why PyMarlin? 

PyMarlin was developed within Microsoft Search, Assistant, and Intelligence (MSAI) in collaboration with Azure Machine Learning at Microsoft. MSAI brings together multiple areas of research to innovate in the products that millions of people use every day. We power features in Outlook, Teams, Word, SharePoint, Bing, Windows, and others. We work in areas including machine learning, information retrieval, data mining, natural language processing and human computer interaction, bringing together research and engineering to deliver impactful user experiences. 


PyMarlin’s primary goal was to make the library code easily readable and customizable. Where PyMarlin shines is that data scientists with knowledge of only PyTorch should be able to understand the entire library code within an hour. We are incredibly excited about this release as everyone has a role to play in AI transformation, and we believe PyMarlin is just one more step in achieving that goal. 


To learn more about PyMarlin and how to use it visit the GitHub link here: microsoft/PyMarlin.


Keep up with the latest developments in AI at Scale at The AI Blog. 

Posted at