However Data Scientist Workbench is intended to be a lot more than just an iPython (Jupyter) Notebook environment. It has been envisioned to cater to diverse needs of Data Science enthusiasts and professionals, and aims to provides additional capabilities such as:
- Built-in Connectivity to Multiple Data Sources :
- importing or loading data from various data sources
- Integrated Data Sourcing and Refining Tools:
- sourcing, preparing, filtering, refining, and transforming data
- Choice of Analytics Interface: from Notebooks to RStudio
- Choice of Execution Engines: Local/Remote Spark, Hadoop, etc.
- In-Cloud or On-Premise: run workbench on the cloud or locally on mac, windows, or linux workstations
1. Built-in Connectivity to Multiple Data Sources
DSWB comes with the ability to easily connect to and import data from variety of data sources whether relational or non-relational residing in-cloud and on-premise. It also comes with built-in connectivity to BigSQL data sources (HDFS, HBase), Cloudant NoSQL data store, dashDB data warehouses, SQL Database and DB2 databases so you don't have to worry about installing and configuring special drivers or connectors. More importantly, DSWB allows you to bring data from all sorts of data sources in to Spark Data Frames and work with it using Spark SQL. So, one can combine data from dashDB, HDFS and say MySQL in a Spark Data Frame and manipulate it together with Spark SQL.
2. Integrated Data Sourcing and Refining
More than just algorithm development, 80% of the work that data scientists and engineers do revolves around data sourcing, preparation and management. While some of these task can be done by programming in a notebook in a language like Python, for many notebooks are just not a great fit. We are integrating a comprehensive data sourcing and preparation tools for non-programmers in to DSWB.
3. Choice of Analytics Interface
DSWB is more than just notebooks. Notebooks have become very popular but IDEs are also growing in popularity. Specifically, RStudio has a grip on data scientists who work with R. You can expect to see RStudio as an option integrated right in to DSWB. If you like R, you will be able to develop in either RStudio or in a notebook. And RStudio is just one example.
4. Choice of Execution Engines
When you create analytics in DSWB you can choose where they are executed - whether directly in the notebook engine or a external execution engine that is better suited for the task at hand. First and foremost, Spark is built right in! This means, data scientists looking to perform data exploration using Spark don't have to install anything special or worry setting up a Spark cluster. We are also working on the ability to push execution of algorithms to additional engines. For example once you have performed initial exploration on a data sample using the local Spark environment, you can have the option to perform the production job execution on a remote Spark cluster and work with a much larger data set. We are also working on giving you the ability to push execution of R code to fast R execution engines i.e. BigR and dashDB.
5. In-Cloud or On-Premise
A cloud service is a great option to get started with data science with nothing to setup, install or maintain. However we recognize that some of our customers must run on premises and cloud is not an option for all of their analytics needs. DSWB is available for deployment behind firewall completely under control of the customer. It is also light-weight enough to be downloaded and run on your Mac or Windows laptop, allowing you to develop algorithms and models even when you are not connected.
To sum up, our intent with the Data Scientist Workbench is to look holistically at the job of the data scientist and to address all of the needs rather than just provide iPython notebooks. We will continue to evolve DSWB to ease challenges faced by Data Scientists and endeavour to make their jobs easier.