Kafka如何保证消息不丢-CFANZ编程社区

要解决消息不丢的问题，要从整个链路去看，只有保证全链路的不丢失，才能完全保证消息不丢

首先是生产者

生产者保证消息不丢的方式，网上的教程很多，无非就是配置ACK级别，增加重试，非常重要的消息，可以尝试无限重置（并不觉得是个好的解决方案），个人感觉重试一定次数之后，仍然失败的消息，持久化即可，这样需要考虑消息量大的时候，持久化工具的承受能力，一般mysql数据库的承受能力约千TPS级别（也是要看机器配置，可以参考一下腾讯云做的性能测试）。

其次是broker端

broker端如何保证数据可靠性，这个可以参考一些书籍，如深入理解Kafka:核心设计与实践原理，一般是刷盘前宕机导致数据丢失，这个

另一个问题：为什么采取拉取而不是broker端推送的方式

官网的文档说明：

There are pros and cons to both approaches. However, a push-based system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred. The goal is generally for the consumer to be able to consume at the maximum possible rate; unfortunately, in a push system this means the consumer tends to be overwhelmed when its rate of consumption falls below the rate of production (a denial of service attack, in essence). A pull-based system has the nicer property that the consumer simply falls behind and catches up when it can. This can be mitigated with some kind of backoff protocol by which the consumer can indicate it is overwhelmed, but getting the rate of transfer to fully utilize (but never over-utilize) the consumer is trickier than it seems. Previous attempts at building systems in this fashion led us to go with a more traditional pull model.

基于推送的系统难以处理不同的消费者，因为代理控制了数据传输的速率。目标通常是让消费者能够以最大可能的速率消费；不幸的是，在推送系统中，这意味着当消费者的消费率低于生产率时，消费者往往会不知所措（本质上是拒绝服务攻击）。拉式系统具有更好的特性，即消费者只是落后并在可能的情况下赶上。这可以通过某种退避协议来缓解，消费者可以通过该协议表明它已经不堪重负，但是让传输速率充分利用（但从不过度利用）消费者比看起来更棘手。

Another advantage of a pull-based system is that it lends itself to aggressive batching of data sent to the consumer. A push-based system must choose to either send a request immediately or accumulate more data and then send it later without knowledge of whether the downstream consumer will be able to immediately process it. If tuned for low latency, this will result in sending a single message at a time only for the transfer to end up being buffered anyway, which is wasteful. A pull-based design fixes this as the consumer always pulls all available messages after its current position in the log (or up to some configurable max size). So one gets optimal batching without introducing unnecessary latency.

基于拉取的系统的另一个优点是，它有助于对发送给消费者的数据进行积极的批处理。基于推送的系统必须选择立即发送请求或累积更多数据，然后在不知道下游消费者是否能够立即处理它的情况下稍后发送。如果针对低延迟进行了调整，这将导致一次发送一条消息，只是为了传输最终被缓冲，这是一种浪费。基于拉取的设计解决了这个问题，因为消费者总是在日志中的当前位置之后拉取所有可用消息（或达到某个可配置的最大大小）。因此，可以在不引入不必要延迟的情况下获得最佳批处理。

The deficiency of a naive pull-based system is that if the broker has no data the consumer may end up polling in a tight loop, effectively busy-waiting for data to arrive. To avoid this we have parameters in our pull request that allow the consumer request to block in a "long poll" waiting until data arrives (and optionally waiting until a given number of bytes is available to ensure large transfer sizes).

一个简单的基于拉的系统的缺点是，如果代理没有数据，消费者可能最终会在一个紧密的循环中轮询，实际上是忙于等待数据到达。为了避免这种情况，我们在拉取请求中设置了参数，允许消费者请求在“长轮询”中阻塞，等待数据到达（并且可选地等待给定数量的字节可用以确保大传输大小）。