Skip to content

Access Wikipedia Clickstream dataset in Jupyter Scala notebook

Sample Wikipedia data sets are available within the DSWB Spark Sandbox.

Wikipedia Clickstream


This dataset gives the counts of monthly visitors for every page on English Wikipedia. The visitors are broken down by source, such as a specific search engine or a different article.

Access the dataset by consulting this Jupyter Scala notebook.

https://share.datascientistworkbench.com/#/api/v1/workbench/10.114.214.147/shares/nSFFMxEmmNgi3PF/Wikipedia%20Clickstream.ipynb
Copy the url above and import it in your Data Scientist Workbench by pasting it in the appropriate box.



Submit an idea if you'd like a different data set to be made available.

Feedback and Knowledge Base