}

Choosing a Machine Learning Platform That's Right for You

2018-08-14

Choosing a machine learning (ML) platform can be a cause of considerable stress. It's a major commitment you will have to live with for years and an unwise decision might even affect your job prospects. Fortunately, asking yourself a few fundamental questions can simply the decision considerably.

Are you primarily a Windows Shop?


In recent years Microsoft has made a great effort to provide machine learning tools for both the Windows and Linux platforms. However, the preponderance of folks using Microsoft tools are running them on a Windows OS. If you run Microsoft tools on Linux, there are likely to be fewer resources to draw upon if you need help.

There are a number of different but closely related machine learning platforms provided by the folks from Redmond. Though not strictly speaking machine learning platforms, both Microsoft R Client and Microsoft R Server can provide advantages. Both include the valuable RevoScalR packages. In R client, this enables the use of two cores for many problems. In R Server, these packages can use many cores and can process large data volumes in chunks, but, of course, you must pay for these advantages. With the introduction of SQL Server 2017, Microsoft added Python to the capabilities of R Server and changed the name to Machine Learning Server to reflect these new capabilities.

What is Your Preferred Scripting Language?


We say "scripting language" because machine learning tools themselves are very likely to be written in C or C++ for purposes of performance. Will you be assembling these components into working systems using C# or F#? Python? R? Perhaps Java or Scala?

If your preferred language is a .Net language like C# or F# you will likely want to focus on CNTK

Python


Python has become the de facto standard language for machine learning scripts. Virtually all ML platforms provide direct API support for Python.

R


R remains popular among people who require the techniques of what you might call "classic" statistics. Fortunately, while most of today's machine learning platforms do not provide a direct interface for R, workers who prefer R are not out of luck. Keras provides a powerful and easy R API for many platforms.

Keras and the Blessings of Interoperability


Keras is not a machine learning platform itself, but rather a high-level API for a number of platforms. At present, the Keras API is available for CNTK, Tensorflow, and MXNet. R programmers who would like to use any of these platforms need only install Keras. The Keras R package will then permit ML development from R Studio or whatever IDE the R analyst might prefer.

What sort of data will you be working with?

Data sets the hardware requirements


The architecture of a machine learning system is often determined by the data itself. People interested in image recognition often face difficulties with the computational demands of complex algorithms. They are limited by CPU cycles. In contrast, workers performing sentiment analysis of tweets or newsfeeds may be limited by the sheer volume of data to be analyzed. Folks in this latter category may well benefit from Apache Spark.

Apache Spark


Apache Spark is so often used in conjunction with Hadoop there are many who think that it must be used with Hadoop. This is not the case. Whether standing alone or integrated with hadoop, Spark is useful anywhere the large volumes of data to be analyzed are spread across distributed servers. The default installation of Spark includes, among other things, an engine for executing distributed R and a machine learning library.

Down for the Long Haul


Selecting a machine learning platform involves considerations that go beyond technical considerations. Will the platform continue to evolve as machine learning does? Are there books, blogs, and tutorials? Are there places to turn for help if needed?

Tensorflow from Google seems be enjoying a great surge in popularity. While there are many aspects of tensorflow where I think Google has fallen a little short, there are many valuable resources for learning tensorflow. Other vendors have provided tensorflow support within their own products. For example, Intel's Myriad series of video processing chips provide support for Tensorflow execution graphs (as well as support for Caffe).

Conclusion


Considering the high quality of many of today's machine learning platforms it would be difficult to make a choice that was truly poor. In my opinion, however, I would lean towards Microsoft's ML Server and CNTK for a Windows platform. On Linux I the availability of code examples, pre-trained models and documentation would push me towards Tensorflow. In both cases, I would supplement the functionality of these platforms with Keras.



Related Training:
Business Intelligence

Dan Buskirk

Written by Dan Buskirk

The pleasures of the table belong to all ages.” Actually, Brillat-Savaron was talking about the dinner table, but the quote applies equally well to Dan’s other big interest, tables of data. Dan has worked with Microsoft Excel since the Dark Ages and has utilized SQL Server since Windows NT first became available to developers as a beta (it was 32 bits! wow!). Since then, Dan has helped corporations and government agencies gather, store, and analyze data and has also taught and mentored their teams using the Microsoft Business Intelligence Stack to impose order on chaos. Dan has taught Learning Tree in Learning Tree’s SQL Server & Microsoft Office curriculums for over 14 years. In addition to his professional data and analysis work, Dan is a proponent of functional programming techniques in general, especially Microsoft’s new .NET functional language F#. Dan enjoys speaking at .NET and F# user’s groups on these topics.

Chat With Us