#KafkaLondon Inaugural Meetup

So last night I went to the first @ApacheKafka meetup in London. This was something I’d mentioned on twitter about it being surprising that there wasn’t one!  The guys did an excellent job, and the meetup sold out typically quickly – just like a lot of the other uber cool tech meetups in London.

You can find details on who talked here: http://www.meetup.com/Apache-Kafka-London/events/226160410/

So we were hosted at connected homes (british gas) owners of hive, and various other IOT devices. They explained how even with 10k customers you soon hit problems with scaling efficiently.  i.e. without throwing money at the problem! They are not at 250k customers.  Average of 9 devices per house.

Paul Makkar (previous particle physicist) then presented about “Why Streaming”
Half the audience seem to be newbs, and half experts. 1/3 have written kafka code.
Mentioned how you may choose to throw away a degree of the data.  This ties into a comment at another data science meetup a few months ago, where the guy was pointing out if you have terrabytes of sensor data, which covers 10 component failures, then actually – you don’t have big data do you.  You ONLY have 10 failures. Interesting.
Kafka is basically your redo logs from a relational DB.  Good analogy!
Logs survive a specific time
Ticker tape comparison
Mentions of stagnnt data lakes, and kafka can feed into these, so it’s a data river? hmm

Flavio from confluent @fpjunqueira
You use brokers so you can easily handle consumption  at different rates
And Independent failures
It Might crash! So Replicate it.
Timing! Use a sequencer. One leader several followers
Multiple Web servers
3 streams. ‘Topics’ can  be replicated and partitioned.
Can partition a massive topic over several replica sets.
Key based partitioning, round Robin, or custom, usual stuff here.
Each consumer stores their offset and are responsible for that, the new consumer is better at this though
More consumers means faster processing
Offset persistence is important to avoid dupes (It’s not really a dupe – it’s just you’ve consumed it twice)
Can now store offset in kafka!
Zookeeper for replica management
Log compaction – this is clever.  With a KV store it just keeps the latest value for each key.
End to end compression
Originally From LinkedIn
Asf top level
0.8.2.2
Confluent adds a platform
Hiring of course
Lots of good questions!

Ben stopford practical kafka
Showed some pretty easy to write code
Use callbacks
Tuned for latency over throughout but tunable, so you can choose what you want
Isr – “in sync replica”
Demo will it work?
Running as root!
New consumer approach is polling  Eh?  The OLD tech was streaming. This is bizarre? Did I get that right?
New api is better for tracking offsets (API handles it)
Fail over is nice and dynamic – Demo shows this

Advertisements

One thought on “#KafkaLondon Inaugural Meetup

  1. Pingback: RabbitMQ and PDI | Codeks Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s