Member-only story
From Jupyter notebooks to Sagemaker Pipelines
Tools to support your data science team’s creative high as they transition from notebooks to pipelines
I enjoy working with our data scientists. I’ve made it my goal to keep their model development velocity high. This means respecting their creative process and providing them with tools (Jupyter widgets, data automation, labeling automation, etc.) to keep them in the “creative zone”. Unfortunately, they have to break out of this creative zone as they transition from Jupyter notebooks to pipelines.
Sagemaker Pipelines is a Sagemaker workflow tool for building and managing end-to-end ML pipelines. It’s great for creating reproducible model training and evaluation. But in contrast to the creative process, pipelines operate in a rigid environment.
In this post, we’ll explore tools to help facilitate the handover from notebooks to Sagemaker Pipelines.
If your data scientists are starting from Jupyter notebooks and are expected to write their own pipeline from it via Sagemaker Pipelines (or at least prep for it), the tools below could help with the transition.
sagemaker-run-notebook
— This setup adds the ability for your Sagemaker Processing Containers to execute Jupyter notebooks (.ipynb). Normally, processing containers will only run scripts (.sh, .py, etc). So while an executable notebook configuration is not common (Or is it?), it comes with useful benefits when transitioning to Sagemaker Pipelines:
- Transition to a shared data repository — While it’s easier to run notebook experiments when we just need to reference local files, it becomes a problem when we move our code into pipeline scripts that reference files outside our own little notebook world (File not found!).
By enabling the notebook to run as a processing job (see diagram above), you can tease out local file reference errors when the notebook is…