Streaming Platforms at London Summit - We Were There
Lukáš Matějka, Head of the Brno Development Branch and Head of Lundegaard’s Data Platform Activity Streaming, shares his observations from Kafka Summit conference.
On Monday, 22nd April 2018 was held for the first time Kafka Summit in London. In recent years, it took place only on the American continent in San Francisco and New York. Thanks to the great success all over the world, this year's conference was also held in London, which I could personally take part in. The conference was hosted by Confluent, which sponsors development of key open-source technology Apache Kafka.
At the very beginning Jay Kreps (co-founder and CEO of Confluent) pointed to the re -emergence of event-driven architectures that best support today's digital business. It doesn’t represent anything else than a set of different events and (if possible) instant responses to them, which the paradigm of event-driven systems is closest to. Such systems are not new, but thanks to the new distributed technologies (and the possibilities of horizontal scaling), they are starting to be used on a large scale.
More and more companies are aware of the need to provide their services / cater to their clients on an automated large scale, and thus they are progressively evolving. Apache Kafka as distributed streaming platform is key technology for such a transition, as it plays irreplaceable part in the gradual progress of enterprise architecture.
The Death and Rebirth of the Event Driven Architecture, Jay Kreps, CEO, Confluent
One of the conference stars was Martin Fowler from ThoughtWorks (one of the major software development influencers). He talked about the undisputed benefits of event-based systems, in a time when a system becomes "big" and thus distributed. Such systems better reflect the real world, which is in a constant flux. And, they allow digital businesses to be more agile by responding quickly to these changes. Martin Fowler also drew attention to their possible dangers (transferred data responsibility to consumers may eventually cause paralysis to change anything at the data flow level). Martin Fowler's colleague, Toby Clemson, shared a practical example and lesson from the evolution of a green field system, where they gradually moved from a simple monolithic application to a distributed event-based system.
Enabling Experimentation Using Event-based Systems, Martin Fowler, Chief Scientist and Toby Clemson, Lead Consultant, ThoughtWorks
Some other of the topics talked of the use of related technologies, so called Streaming engine - Kafka Streams. Streaming engine in general, can process, aggregate, and enhance continuous data streams in a distributed environment. For example, a client changes his / her preferences in his / her profile and, accordingly, relevant content is displayed - thus the changing events of preferences and content are continually merged, which secures the content reflects preferences.
Andreas Lundsten of Forefront Consulting Group illustrated on a case study, an AML (anti money laundering) solution for a banking company using Kafka Streams. The primary requirement was the ability to handle large volumes of transactions in near real-time and to be able to check them, as to prevent fraud (achieved through applying sets of existing rules), and also provide a basis for the possibility of advanced analytics and machine learning (general search for other potential patterns of fraud).
Consultant Peter Hannan from 6point6 helps companies in the banking environment switch to event-based systems using Kafka. He presented a specific case study using Kafka Streams with a very detailed description, including the setting of specific parameters in a production environment. He illustrated how easy it is to make aggregated distributed tasks with Kafka Streams. At the same time, he pointed out that although the creation of such a job is relatively easy, it makes up only about 20% of the overall effort needed. The rest needs to be spent on testing, deployment, and management, where is still lack of other tools support.
Timor Timuri and Richard Bras shared some of their experience with technology at ING for whom Kafka has become a global technology standard. They started with a case by improving their anti-fraud system four years ago. Today they employ more than 50 uses, and vast majority of ING teams uses central Kafka to independently develop data-intensive applications.
Nina Hanzlíková from Zalando. also had a very interesting presentation. Zalando sells fashion, always considering the real meaning of the world of fashion. It collects all the fashion data, which can change very quickly. It automatically analyses the data, based on which it offers specific goods and trends. Zalando employs a strict agile methodology, allowing teams choose any technology they like, while being responsible for it, including having to carry out reliable operation. Nina Hanzlíkova introduced what her small team quickly managed to create with Kafka and Kafka Streams, and what compromises they had to make within their small team. She also detailed some of the pitfalls associated with deploying in AWS cloud using ESB volumes.
The topic of one of main presentations, by Neha Narkhede (co-founder and CTO of Confluent), was Event-driven Enterprises. The presentation talked of a typical way for companies to adopt Event Streaming, while they gradually change their view on so-called event-centric approach.
• The first stage involves the realization that there is a need for the Event Streaming, including getting acquainted with the basic principles of Apache Kafka and related technologies. This stage is usually the beginning of the first pilot project.
• The second stage introduces the first production deployment in several minor uses. In most cases, existing data sources (primary sources) will be merged or, if we go more advanced, some of the applications or systems will directly generate new events.
• The third stage is about breaking up so-called forces - breaking monoliths into smaller both organizational a technology autonomous units, and transmission of the primary data source to the platform.
• The fourth stage deals with deploying and using global streaming platforms.
• The last stage handles creating a so-called central nervous system - all data is managed and processed over a single platform, which is called Event-Driven Enterprise (e.g. Netflix, Uber, Royal bank of Canada)
The Present and Future of the Streaming Platform, Neha Narkhede, CTO, Confluent.
Stefan Bauer from Audi introduced a project that has the ambition to collect all data from cars, analyse and offer more comfort to its users. This way, Audi wants to go fulfil their future vision of “giving back” to its users about an hour of the average daily time drivers spend in their cars. Car as such is perceived as a mere IoT device (with many sensors) that generates 4 TB of data per day (a set of cameras, impact sensors, sonic sensors, etc.). These data will be linked to information of current traffic situation, road conditions, weather information, other cars, places and duration of battery recharging, etc. Further advanced analyses and predictions will allow cars to "see around the corner" and change permitted speed limits so that transport is smoother, and more. Stefan Bauer emphasized the importance of personalization and knowledge of the current user context for Audi. This importance will certainly increase in the future (compared with Amazon, where 30% of sales are now instantly personalized). The project uses Apache Kafka to collect large amounts of data coming from cars, and will be launched this year together with the new Audi.
Fast Cars in a Streaming World: Reimagining Transportation at AUDI, Stefan Bauer, Head of Development Data Analytics, AUDI Electronics Venture, AUDI.
Casper Konig’s from Gracenote had a talk on Powering Olympics Games through Kafka Streams, enlightening us how to take advantage of Apache Kafka and Kafka Streams technology so that the 14 days of the Olympics could offer various websites widgets that display the progress of the Olympic Games in real time. It was up-to-date information on current results, who was on the track, how many seconds he was behind, current comparison of competitors, etc. To be able to do this, it was necessary to compile and utilise a lot of real-time data including official data from various Olympics sport venues. The project was also interesting by having created the infrastructure for only 14 days, during which the widgets were used by millions of people, which it made it quite difficult to test it beforehand.
Finally, it is worth mentioning the talk of Mikhail Khasina from Sberbank on the topic of Distributed computing - Key player in Corebanking Platforms. Mikhail Khasin has been working on Sberbank’s transformation, which a few years back would globally processed mere 10-15 transactions per second in their branches. In this digital age, Sberbank wants to match up the volume today's largest digital players can handle (such as Alibaba with over 200 thousand payment transactions per second). Mikhail maintains, banks have to become technology providers with banking environment knowledge, or simply technology companies, among other things, will "transform" into banks. Technologically, it certainly won’t happen through vertical scaling as it did in the past, but it has to go through a fully distributed environment on a horizontal level that can automate hundreds of thousands of transactions per second. In automated data analysis, Sberbank focuses, in the first phase, on automatic fraud detection and personalized offers. In subsequent phases, they want to reflect on key trends, such as counselling, smart interaction of chatbots, IoT and automated efficiency of the operation itself. Apache Kafka currently serves as a carrier of all events, while the plan is to further expand it by a streaming engine. This will be the basis for further building of automated, advanced analytics.
In the world of vast amounts of data, their fast processing and imaging in relevant contexts and in a clear way is key. At Lundegaard, we have been working on this topic, using technologies such as Apache Kafka. One of the examples is our Lifetrace product, a technical monitoring system for web portals.