Data Analytics for Connected Cars
Hillol Kargupta, President, Agnik
“Analytics-driven connected car technology opens up possibilities like blending fuel consumption data with many other types of data such as driver behavior”
Need for Deep Data Analytics in Connected Cars
Data analytics driven by advanced statistics, machine learning and data mining algorithms have many applications in the connected car environment. It offers the promise of a very horizontal market where analytics can truly demonstrate powerful capabilities in many areas such as insurance, warranty management, predictive maintenance, and personalized infotainment services. Here are a few specific use cases in more details:
1. Driver Scoring for Insurance Actuaries: Typically, driver scoring for a connected car environment means measuring different parameters of the driving (e.g. miles driven, harsh driving events, where and when you are driving) and correlating those with business outcome observables (e.g. insurance claims loss data) in order to build a predictive model. This model computes a score value based on the driving behavior. Among other things, this usually involves sampling the accelerometer sensor data. For example, you may want to sample it at 100Hz (i.e. 100 samples per second) and analyze that to detect driver events. If you are supporting a connected car application based on a smartphone-only in-vehicle platform then you also need to pay serious attention to the battery power consumption.
2. Vehicle Performance Monitoring: Monitoring and managing vehicle performance efficiency require advanced predictive modeling and associative link analysis among others. For example, fuel consumption of one’s car depends on the vehicle’s condition and how it is driven. Reporting fuel economy is easy and most modern cars do that. However, identifying the root cause behind poor fuel economy offers strong ROI and requires advanced analytics based modeling of vehicle sub-systems. Analytics-driven connected car technology opens up possibilities like blending fuel consumption data with many other types of data such as driver behavior. One can build predictive models and classifiers from such information and provide tips to save fuel cost for single or multiple vehicles.
Next, we discuss some of the technical challenges that show up in such applications.
Data volume, Communication Bandwidth, and Distributed Computing
First, let us note that we are dealing with relatively significant volume of data compared to the wireless network bandwidth. Just the accelerometer data sampled at the rate of 100Hz may generate about 5MB/hour. We have a few choices; for example: (1) process the data onboard or (2) send it to the server over the wireless network or (3) do a combination of both. Your choice should depend on answer to various questions like—(1) Can your wireless data plan support this? (2) How quickly do you need to analyze the data? (3) What kind of computing power does your in-vehicle device have?
Lessons to take home:
a. Computing Framework: Consider a more general definition of cloud for your connected car cloud that includes the in-vehicle compute nodes that are closer to the end users. Where you do your number crunching for the analytics and how much, should depend on answers to the questions listed above. This is actually the model of computing extensively studied in the distributed data mining literature over the last decade. This is also the main thesis of the so called Fog/Edge computing.
b. Distributed Data Analytic Algorithms: Consider truly distributed algorithms for computing analytics. Analytics can often be viewed as a mapping function. Computing that in a distributed environment whether in-cloud or onboard or a blend of both would require algorithms that are decomposable. Otherwise, analytics computation is unlikely to scale. A quick look at the distributed data mining literature will point out many predictive and statistical modeling algorithms are difficult to decompose. So, a bandwagon approach based on traditional distributed file management systems used in common commercial clouds may not automatically scale unless you pay attention to the algorithms.
Connected car applications often collect privacy sensitive data (e.g. location data). Are the consumers willing to share their 24/7 location data with the insurance carrier? Jury is still out there. However, some states are imposing strict regulations about what kind of data cannot be shared with insurance carriers. Many countries, particularly in Europe, have strict privacy policies. Analytics may play a key role in addressing the privacy concerns. The field of distributed data analytics explored ways to protect privacy while computing analytics from the data. For example, new products in the field of connected cars are emerging in the connected car space that blends pattern-preserving “cryptography” with data analytics that allows analyzing data directly from “encrypted” data without having to decrypt it first. Using this technology, for example insurance carriers may score you and compare your driving with that of others without having to know where you were last night at 2 am.
How to Conquer the Last Mile of Analytics
Ritesh Ramesh, Data and Analytics Leader, Consumer Markets Vertical, PricewaterhouseCoopers (PwC)
Launch a Data Science Practice with These Five Questions
Seth Dobrin, VP & Chief Data Officer, IBM Analytics
The Role of the IT in the Analytically Driven Organization
Dr. Kenneth Elliott, Global Director of Analytics, Hewlett Packard Enterprise Services
What's next in Business Analytics?
Rich Clayton, VP of Business Analytics & Big Data Product Group, Oracle