So; Lets install rabbit and get going. The server is incredibly small, about 4mb. Wow, really?
It doesn’t install out of the box on ubuntu precise, so you have to fudge around with erlang as described here but then after that away you go. Server starts up and whoosh you have a queue. (well actually you have nothing, but you can start creating queues!)
So, lets populate the queue with some data. On the PDI marketplace there is a step called “IC AMQP” plugin. Install that and you should be able to both consume and populate a queue. Unfortunately I couldn’t get it to populate, so I just did it in a user defined java step instead. Pretty easy and I can populate the queue (single threaded) at a rate of 25,000 messages per second (200 byte messages), this pushes my CPU pretty high, so is probably as far as my 4year old laptop is going to go. Oh; you need to download the Rabbit client jars into PDI to do this – being careful not to duplicate one of the commons libraries. (PDI has a newer one so stick with that)
I was a bit surprised there’s no command line tools, or maybe I missed them – I assumed there’d be some way of creating queues and configuring routes/topics and such things in advance.
On the consumption side I was able to read the data using the IC plugin but it was a bit more sluggish. By reading with multiple copies I was able to read at 7,000 messages per second. I suspect this may relate to the implementation of the IC AMQP step rather than anything to do with Rabbit.
Additionally the IC step doesn’t allow streaming from a queue. Once the queue is emptied, or once you hit a record limit, it exits. We’ll ultimately need an option that’ll sit there and listen forever. (Pretty simple to code, so looks like a tiny modification or re-write of that step)
What next? I’m curious to see if the message size affects the population/consumption rate. That’ll be important when scaling this up. Potentially we can then use an AWS auto scaling group to scale out the PDI servers if they are unable to consume from the queue at a sustainable rate.
And beyond that? well clearly exactly the same approach can be used with Kafka. It should be pretty easy to build an input/output step.. Something for a rainy(er) day I think!