Skip to content

How to import data for use by Apache Zeppelin ?

Apache Zeppelin is a web-based notebook that enables interactive data analytics. It allows you to make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.

Until the ability to easily import data files (e.g. .CSV) for use by Apache Zeppelin component of Data Scientist Workbench is made available, here are two options you can utilize:

1. Use 'wget' through %sh in Zeppelin
The data file should be uploaded to somewhere on the internet and have a http or https URL for accessing it. You can then pull it using the "wget" command in Zeppelin, as follows:

%sh
wget   <file_name>   <url_file>


<file_name> =  a new local file name after pull it by using 'wget'  
<url_file> =  URL of data file

For example :

%sh
wget   bank.csv   https://s3.amazonaws.com/apache-zeppelin/tutorial/bank/bank.csv


All downloaded files by using 'wget'  are placed in the current directory and you can always access it within Zeppelin cells by using the path "./<file_name>", e.g.: "./bank.csv".

2. Drag/Drop files in iPython notebook
Various components of the Data Scientist Workbench share the "/resources" folder on the system. The iPython/Jupyter Notebooks in Data Scientist Workbench allow data files to be dragged from your desktop and dropped onto a notebook. A file dragged in such a manner in iPython notebooks will be available under the "/resources" path, which you can then reference in your Zeppelin notebooks.

For example:

1) Drag and drop a data file into iPython notebook, like 'bank.csv'

2) (Optional) Check if file exists by running command in Zeppelin,  like :

     %sh
     ls  -l  /resources/bank.csv


 Once it is verified,  you are able to use "/resources/bank.csv" in your Zeppelin notebooks.



Feedback and Knowledge Base