Welcome to Wunderkafka's documentation!
The power of librdkafka for
humanspythons
Wunderkafka provides handful facade for C-powered consumer/producer. It's built on top of the confluent-kafka.
Rationale
What we are about?
- Cloudera installation with its own schema registry
- Apache Avro™ is used
- Installation requires features which are fully supported by librdkafka, but not bundled in confluent-kafka python wheel
- Constant need to use producers and consumer, but without one-screen boilerplate
- Frequent need to consume not purely events, but fairly recent events
- Frequent need to handle a large number of events
So, that's it.
If you suffer from the same problems, you may don't need to reinvent your own wheel, you can try ours.
What about other projects?
Corresponding to ASF wiki there are plenty of python clients.
- confluent-kafka is a de-facto standard, but doesn't work out-of-the-box for us, as mentioned above
- Kafka Python is awesome, but not as performant as confluent-kafka
- pykafka here and here looks unmaintained: has been archived
- pykafkap has only producer and looks unmaintained: no updates since 2014
- brod is not maintained in favor of Kafka Python.
What's next?
For now, it's a homebrew, so it lacks some of the features which may be useful outside of our use-cases.
ToDo:
- add configurations for multiple versions of librdkafka
- check against confluent installation
- add
async
/await
syntax - parallelize (de)serialization on CPU
- add distributed lock on producers
- add on-the-fly model derivation to consumer
- ???
Errata
Kerberos Thread
Version librdkafka < 1.7.0
Introduction
Prior to version 1.7.0, updating kinit via librdkafka could result in a lockup. Because of this, when using a version less than 1.7.0, by default used our thread to update kerberos tickets.
Standard behavior
By default, 1 thread is raised to update all tickets and the timeout for all updates is 60 seconds.
To set timeout manually you need to
from wunderkafka.hotfixes.watchdog import KrbWatchDog
if __name__ == '__main__':
krb = KrbWatchDog()
krb.krb_timeout = 10 # complete
krb.krb_timeout = 'something' # raised TypeError
Bottleneck
Updating of kerberos tickets is done sequentially one by one.