Fork me on GitHub

HDFS Contents Manager for Jupyter Notebooks

I implemented a contents manager for Jupyter notebooks that uses HDFS as a storage backend to store notebooks. I have two versions. The main difference is the library used to read/write HDFS.

  • The first version uses HDFS3 which is based on libhdfs3, a native C/C++ library to interact with the Hadoop File System (HDFS).
  • The second version uses Pydoop which is based on the official libhdfs, a JNI based C API for Hadoop’s Distributed File System (HDFS).

The HDFS Contents Manager is used to add Jupyter support to the Hops Big Data platform. Check out the HDFS Contents Manager poster at the SICS Open House 2017 event for more details.

Source Code

links

social