August | 2017 | Advanced Analytics

One to One or Many to One

Pull Only: A consumer needs to call pull method to get the data

Zookeeper: Used by Kafka

Topic: Pull the data from queue using topics

Topic : named database of log; a group of partitions together

Partition: A single log within a topic. You can have one or more partitions in a topic. When you pull data from a partition, the messages are in order.

An objection – only topics. In JMS, you have topics for publish/scribe and queue. In Kafka, you only have topics, the consumer group is used to model queue. If two consumers are in the same consumer group, for a given topic, only one consumer with receive the message, not both, for a given partition.

Why Partitions are cool?

You can make it Topic like or Queue like.

Trade-off between throughput and ordering

Two consumers, one can treat like a topic in JMS, the other can treat it like a queue in JMS.

One to one case. One producer and one consumer. If you care about the order, create a single partition. If you don’t care about the order, you can create multiple partitions.

One producer to Two consumers, one consumer as the main and the other for backup. When the 1st consumer is dead, the other one will fail over.

Pub/Sub: Each consumer uses its own consumer group.

Easy Install

kafka.apache.org/quickstart

It includes Zookeeper already.

The best doc: Kafka consumer Javadoc

—

Database has auto-commit

Oracle always in transactions.

Kafka behaves differently from database auto-commit.

Message Log

You can replay

“Simple consumer” is actually complex.

Last Committed Offset – from consumer perspective, after consumer processes the data, send a message. In the case of Auto-Commit (Default), it does every second and you do not need to commit, but you will lost messages.

High Watermark – How many messages are distributed

Always use UTF8. Otherwise, you have limitation on the languages you can support. JSON is a text oriented format.

Talend can have issues in commit. If you have a long running thread, you may have issues.

Scala client is old. Use java client or C – librdkafka

Advanced Analytics

Big Data|Data Science|R Programming|Machine Learning

Month: August 2017

Streaming API

Kafka

Amazon Kinesis