Thursday, December 18, 2014

Choosing Transfer Propotocol



Once our app architecture is designed, we have to define a communication interface between clients and server. This interface shows how client and server can interact  in a non ambiguos way. This dialog have to be determinist in order to make an affordable implementation.
You can see the detailed flow or commands in the following diagram. 




 If we want to create a new type of client we just need to focus on implementing this interface  contract. It doesn’t matter the tecnology below our implementation. At the lowest level, The interface contract just tells us which stream of bytes need to be sent in order to achieve some funcionality.
Going one step down, it has to rely on a network protocol. Since we doesn’t want to reinvent the wheel, we want to interchange data through internet and in a fast way (minimizing latency)  there are just two options realistic: TCP or UDP protocols.
For non familiars, these two propotocols are capable of sending and receiving packets of data between to points. TCP is a step upper udp (ISO tower) since it assures you that packets are going to be received at the other side and at the same order that they have been sent. Of course, this capabilities are not for free. In a non visible way, this controls is done internally by TCP so it’s never an easy choice which protocol to choose.

Let’s suppose that we choose TCP because of these out of the box funcionalities. Later we are going to review this decission. By choosing TCP, we know that network will be such  a data pipeline. Data are going to arrive to our peer and in the order we wanted.
In order to optimize TCP protocol with the objective to deal with high latencies you have to take some decissions:
  • Deactivate  Nagles algorithm. By doing this, packets are going to be sent once ordered. It would seems quite obvious but it’s not. Nagles algorithm tries to fill every packet sent  in order to minimize global congestion. This works fine when you want exchange big amounts of data but this solution hits you severely if you want to reduce your latency. Through our tests we have saved  an average of 200-300 ms by deactivating this algorithm. Easy to do but difficult to know. At the end, one of the best decissions we made.
  • Related to the last point , we have to assure to flush buffers. This is not a TCP protocol issue or not just a protocol issue. At a prior level, but also a problem, java streams try to gather all information together in order to send once a packet is full. Once again, we are struggling against the latency. We can’t afford this behaviour so, let’s flush everything. Always send and flush.
  • Personal serialization when needed. Java serialization are so easy to do but it’s a real big black box. You know that you are  sending your classes  across the net but you don’t know at all how this really works. You just implement serializable  interface and it works! But… again, we want to minimize our transfer data. This is not an issue related to the latency but the size of the information that we deal with. Java Serialization puts in the wire recursively all the hierarchy of a  class. You can calculate  the amount of data that  we are  going to send. A sniffer will show you this in detail. How to improve? Implement Externalizable interface and override writeExternal and readExternal methods in order to just send and read what you really need to rebuild your class at the other point of the net. It’s not difficult to do and you can reduce an incredible amount of data transfered.

No comments:

Post a Comment