I have been looking at this problem over a few years now
The IoT industry often speaks of handling both high volumes and high throughputs of data
However, currently, I find that there are not many use cases for IoT streaming analytics which are unique
The ‘unique’ and ‘currently’ bits are important i.e. identifying use cases that need the analytics primarily implemented in the stream (and not applied to data when at rest)
Do you know of companies/ vendors actually implementing this? (in production today)
On one hard, every spark/hadoop vendor could claim so
But are customers actually deploying these models in stream?
Just because you are spark or hadoop – your use case may not involve streaming analytics
There is a cost involved
and also there needs to be a unique use case to provide benefits
Ex one could argue that fraud detection needs streaming analytics for large financial institutions
One example is: HPE Vertica and Kafka
Vertica provides a high-performance mechanism for streaming data both to and from third party message buses. Because Vertica can both receive and send data to a streaming message bus, you can use it as part of an automated analytics workflow. Vertica can retrieve data from the message bus, perform analytics on it, and then send the results back to the message bus for consumption by other applications.
So, the object of analytics is the ‘message bus’ i.e. data in motion
So, my qs is
a) Do you see other such use cases for IoT? (in production)
b) Using which platform / techniques
c) i am not convinced that all spark use cases are actually streaming – but happy to explore
d) confluent/kafka also be an option (outside of the Vertica example)
e) as also databricks for streaming iot applications (as opposed to spark itself)
f) also newer examples like timescaledb
curious to see if anyone is implementing these techniques in scale and in production
Image source wikipedia By Eleassar – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=21827540