Skip to content

Welcome to Wunderkafka's documentation!


The power of librdkafka for humans pythons

Wunderkafka provides handful facade for C-powered consumer/producer. It's built on top of the confluent-kafka.

Rationale

Das ist wunderbar!

What we are about?

  • Cloudera installation with its own schema registry
  • Apache Avro™ is used
  • Installation requires features which are fully supported by librdkafka, but not bundled in confluent-kafka python wheel
  • Constant need to use producers and consumer, but without one-screen boilerplate
  • Frequent need to consume not purely events, but fairly recent events
  • Frequent need to handle a large number of events

So, that's it.

If you suffer from the same problems, you may don't need to reinvent your own wheel, you can try ours.

What about other projects?

Corresponding to ASF wiki there are plenty of python clients.

  • confluent-kafka is a de-facto standard, but doesn't work out-of-the-box for us, as mentioned above
  • Kafka Python is awesome, but not as performant as confluent-kafka
  • pykafka here and here looks unmaintained: has been archived
  • pykafkap has only producer and looks unmaintained: no updates since 2014
  • brod is not maintained in favor of Kafka Python.

What's next?

For now, it's a homebrew, so it lacks some of the features which may be useful outside of our use-cases.

ToDo:

  • add configurations for multiple versions of librdkafka
  • check against confluent installation
  • add async/await syntax
  • parallelize (de)serialization on CPU
  • add distributed lock on producers
  • add on-the-fly model derivation to consumer
  • ???

Errata

Kerberos Thread

Version librdkafka < 1.7.0

Introduction

Prior to version 1.7.0, updating kinit via librdkafka could result in a lockup. Because of this, when using a version less than 1.7.0, by default used our thread to update kerberos tickets.

Standard behavior

By default, 1 thread is raised to update all tickets and the timeout for all updates is 60 seconds.

To set timeout manually you need to

from wunderkafka.hotfixes.watchdog import KrbWatchDog

if __name__ == '__main__':
    krb = KrbWatchDog()
    krb.krb_timeout = 10  # complete
    krb.krb_timeout = 'something'  # raised TypeError

Bottleneck

Updating of kerberos tickets is done sequentially one by one.