Wednesday, March 11, 2015

Accessing databases through multiple threads



Most applications require store information. The options and scenarios are diverse. Let's focus on the scenario in which multiple remote clients need to store centralized information. No matter if the remote client is a standalone application, website or mobile app

   In general, it is not advisable to let the remote clients communicate with the database directly. It is always better to create a middleware to manage the persistence of the entire system. Not only for safety reasons but because we have a single point of entry to the database so that we can enhance and manage performance in a unified way.

So customers send and receive data from the server and he will be responsible for managing the persistence of this information. In small databases  with few connections, not much to worry about. With notions of SQL you can make a small server with good performance.

But when you are facing scenarios involving large numbers of users or large transactions, you must use more advanced techniques. There are two major problems to deal:

  •      Large number of users:

      This is more a question of architecture, but it is always good practice create a new thread for each new connection to a remote client. Of course, you have to limit the number of threads depending on your environment if you want to ensure quality to  everyone connected.
You cannot share the same connection with all the threads or rather, you can do but as database providers implement sessions  in a synchronized way, the commands will not be executed until their turn. You will be within a FIFO (first in, first out) queue.

The idea is to create a new database session for each external connection ( owned by the new thread). Each remote client will get or keep your data in parallel as if he just connected to the database.

So we're done ... isn’t it?

Unfortunately, this is not possible or at least not always. You can’t open a new  session for each incoming connection (and therefore the same number of threads) because sessions database are costly both in terms of resources and licenses (depending on how each provider licensing).

 The solution to this problem is to create a pool of database connections. Each thread will request a free connection pool when needed and released when he finishes what he has to do. Thus, each thread will not own a connection. Actually there is no sense to maintain a connection to the database without using. In short, each thread will request a connection when needed, will do the job, and then release the connection.

…but  it can still happen that when needed, there are no available  sessions  to the database in the pool. In this case, the thread waits until another thread release a database session. In this scenario, the user will likely experience some delay. If this happens frequently, you should ask if you can increase the number of sessions in the pool, or if we have reached the maximum capacity, purchasing more resources to enable your server to handle this load.

In relation to pool connections, there are some providers (providers including databases) that provide implementations that can be used via API. We have used some of them and they work quite well. You also have the option of making your own implementation. It will cost you some time, but on the other hand, you will have more control over what you do and you might particularize and optimize it as it suits us.

If you choose to do it yourself, a good idea is to open the database sessions when the server starts because  the creation of a database connection has a high cost. Therefore, if the connection is opened at start up, there will be no perception of the cost of establishing the connection for remote clients


  •   Long transactions or lot of queries / commands involved.

      There are other scenarios or mixed scenarios in which the issue is that you have to run a lot of queries for the same user request. The perception of the users is that your request takes longer than expected. How to deal with this?

Depends deeply on the scenario in which we find ourselves. In fact, to solve this problem requires insight into the performance of your application. We can use a similar strategy to that we used on the last point.

We will create a specific thread for each group of linked commands. Take an example, if you are reading data from a client and their orders, you can think of parallel load data from customer and every order, because, no apparent dependence. So, again, we will create different threads, requesting a connection to the pool, and run these commands in parallel. As a result of this approach, users will wait  only the slowest group of commands that are executed in parallel.

If a branch is heavier than the rest, we can choose to return part of the data while the heavier parts are still running. If you do this, the remote client will need to know in order to inform the user about it. At the end of the heavier parts, send the information to the client so that you can complete the user request.

This approach has some risks that you need to know:

    1.   You will  deal with some threads in parallel, so you are in charge of joining together when they finish their duties. Be careful because you have to ensure that the information you are accessing is already loaded. Use semaphores or other mutual exclusion mechanism to control access to these areas.  
    2. Beware of deadlocks that can cause connection pool usage. Release the session as long as you do not need it. For example, if you create a new thread from another thread already has a session, the latter may end up locked (depending on the size of the pool and context). You could be waiting for her son before releasing your session while he is also waiting for an  available session. If this behavior is widespread we could have a global lock the entire application. This scenario may seem unlikely, but possible oop often suffer this problem. To address this, as mentioned, let's release the database session when we don't need anymore. But be careful, when dealing with hierarchies of objects and multithreading is not as straightforward ensure this behavior. A simple solution is to restrict parallel to the objects that have no dependency downwards. In a graph of tree , we would only  put in parallel the leaves because they have no dependencies.
Combining this two techniques you can get a good performance on your intensive database access applications.



No comments:

Post a Comment