Sven Löffler
16. August 2017 0
Digitization

Cloudera Data Science Workbench

Cloudera announces General Availability of Data Science Workbench to Accelerate Data Science and Machine Learning in the Enterprise. The workbench is a self-service-tool for data scientists which helps at building, scaling and deploying machine learning and advanced analytics solutions using the current most powerful technologies.
The Cloudera Data Science workbench (CDSW), which was announced in beta at the Strata+Hadoop World San Jose 2017, can be accessed via web browser, where it allows data scientists to use their favorite open source libraries and languages — including R, Python, and Scala — directly in secure environments. The workbench also integrates many deep learning frameworks, including BigDL. This will help data scientists to better use deep learning libraries and tactics on CPUs, without additional hardware investments.

Benefits of the Cloudera Data Science Workbench

With CDSW, data scientists can:

  • Use R, Python or Scala on the cluster from a web browser, with no desktop footprint.
  • Install any library or framework within isolated project environments.
  • Directly access data in secure clusters with Spark and Impala.
  • Share insights with their team for reproducible, collaborative research.
  • Automate and monitor data pipelines using built-in job scheduling.
  • Meanwhile, IT professionals can:

  • Give their data science team the freedom to work how they want, when they want.
  • Stay compliant with out-of-the-box support for full platform security, especially Kerberos.
  • Run on-premises or in the cloud, wherever data is managed.

  • Architecture of the Cloudera Data Science Workbench

    These benefits are achieved with the underlying architecture of the CDSW. The workbench runs on one or more dedicated gateway hosts on a CDH cluster. Having the libraries and configuration necessary to securely access the CDH cluster is ensured by the Cloudera Manager, without additional configuration. By using Docker containers Data Scientists can run isolated user workloads with their preferred tools and libraries. Isolated CPU and memory also ensures reliable and scalable execution in a multi-tenant setting. Each Docker container provides a virtualized gateway to securely access Cloudera Hadoop services like HDFS, Spark 2, Hive and Impala (incubating). The workbench is divided into Master and Worker Nodes. Each Installation starts with a Master Node, which keeps track of all critical persistent and stateful data. Worker Nodes can be removed or added to increase the total capacity. To transparently schedule all these containers across multiple nodes, the CDSW uses Kubernetes, a container orchestration system.

    T-Systems Data Science Workstation

    The CDSW will be available in combination with the Cloudera Hadoop distribution on the Open Telekom Cloud, Microsoft Azure and T-Systems Bare Metal Offering.
    In addition to the CDSW, T-Systems offers a Data Science Workstation. In contrast to the CDSW, which is made for production environments, the Data Science Workstation covers all functionalities needed during the development and testing of Big Data Usecases or Proof of Concepts. The Workstation is made for small data volumes and can be used standalone or in combination with other PaaS Services from the T-Systems AppAgile container repository. Its also possible to deploy the Workstation on T-Systems vCloud, Open Telekom Cloud, Microsoft Azure and in the future on Bare Metal.
    The Workstation includes all relevant technologies and tools from the Hadoop Ecosystem like HDFS, Map Reduce 2, Hue, Hive and Spark with support for Python, R and Scala. Those tools are available with the latest versions from the Apache Project, which is an major advantage in comparison to the Hadoop Distributions.
    Offering the CDSW and the Data Science Workstation T-Systems is able to serve customers who are starting investigating Big Data and Analytics as well as customers who need a Data science environment ready for production.

    Happy Data,
    Sven Löffler

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    By sending this comment you accept our comment policy.

    a) Blog visitors are always invited to comment.

    b) Comments are supposed to increase the value of this weblog.

    c) Comments will be activated only after validation.

    d) Comments which do not relate to the topic, obviously violate copyrights, have offensive content or contain personal attacks will be deleted.

    e) Links can be inserted to the comment but should refer to the topic of the blog post. Links to other websites or blogs which do not refer to the posting will be considered as spam and will be deleted.

     
     

    Twitter

    tsystemsCom @tsystemsCom
    T-Systems  @tsystemsCom
    "30 million meals are processed in Germany each day" says #TSystems VP @ihofacker at smart #foodlogistics workshop… https://t.co/57Dkzm2qBW 
    T-Systems  @tsystemsCom
    "In a few years from now each first contact with a company will be a #bot contact", believes #TSystems CMO @svnkrgr https://t.co/TSmp4VZoMf 
    T-Systems  @tsystemsCom
    "Now is the time to really integrate content marketing" says #TSystems CMO @svnkrgr #next17 #futureofmarketing https://t.co/UK8HHZpUKP 
    T-Systems  @tsystemsCom
    IT chiefs should make implementing the #digitalworkplace core to their digitisation strategy @Computerweekly: https://t.co/WFf4YECw7K 
    T-Systems  @tsystemsCom
    Fast #VPN access, firewalls with a click: #TSystems launches global #SDWAN. As one of the first @ngenagmbh partners… https://t.co/4gkVMNULd7 
    T-Systems  @tsystemsCom
    #Security strategy: The European Union (EU) has detailed how it intends to improve its cyber defences @zdnet: https://t.co/uv44WEg0q9 
    T-Systems  @tsystemsCom
    Kurt Koch: 50 years at @deutschetelekom/T-Systems. HR Director Georg Pepping congratulates on the central anniversa… https://t.co/5UrjRHYh0m