When read a large volume of data from a source system, we should avoid read line by line or read the data into a string.
Stream Processing API out of Jackson is the way to go.
When read a large volume of data from a source system, we should avoid read line by line or read the data into a string.
Stream Processing API out of Jackson is the way to go.
One to One or Many to One
Pull Only: A consumer needs to call pull method to get the data
Zookeeper: Used by Kafka
Topic: Pull the data from queue using topics
Topic : named database of log; a group of partitions together
Partition: A single log within a topic. You can have one or more partitions in a topic. When you pull data from a partition, the messages are in order.
An objection – only topics. In JMS, you have topics for publish/scribe and queue. In Kafka, you only have topics, the consumer group is used to model queue. If two consumers are in the same consumer group, for a given topic, only one consumer with receive the message, not both, for a given partition.
Why Partitions are cool?
You can make it Topic like or Queue like.
Trade-off between throughput and ordering
Two consumers, one can treat like a topic in JMS, the other can treat it like a queue in JMS.
One to one case. One producer and one consumer. If you care about the order, create a single partition. If you don’t care about the order, you can create multiple partitions.
One producer to Two consumers, one consumer as the main and the other for backup. When the 1st consumer is dead, the other one will fail over.
Pub/Sub: Each consumer uses its own consumer group.
Easy Install
kafka.apache.org/quickstart
It includes Zookeeper already.
The best doc: Kafka consumer Javadoc
—
Database has auto-commit
Oracle always in transactions.
Kafka behaves differently from database auto-commit.
Message Log
You can replay
“Simple consumer” is actually complex.
Last Committed Offset – from consumer perspective, after consumer processes the data, send a message. In the case of Auto-Commit (Default), it does every second and you do not need to commit, but you will lost messages.
High Watermark – How many messages are distributed
Always use UTF8. Otherwise, you have limitation on the languages you can support. JSON is a text oriented format.
Talend can have issues in commit. If you have a long running thread, you may have issues.
Scala client is old. Use java client or C – librdkafka
Similar to Kafka