Problem Report written to help others understand the port fix. None patches exist here, check the CVS (or appropriate control version system) for net/dctc port looking for this PR number. The fix is there in version 0.83.2 of the port. Explanation begins below dctc is a Direct Connect(TM) client. Amongst its advanced features are both bandwidth throttling and multiple part file download from multiple hubs. It employs a combination of multi-thread and multi-process programming models to achieve its goals. It uses one process for each Direct Connect(TM) hub it connects to. These processes use semaphores to insure that all processes are correctly synchronized so that all processes bandwidth usage summed up do not surpass user chosen bandwidth limits. On a per process basis, there are threads. One thread communicates with the hub while other thread manages bandwidth throttling. It periodically checks the overall bandwidth usage of the summed concurring processes throttling its process accordinly. PROBLEM: 1. Semaphore versus multi-threading dctc utilizes semaphores within each process to manage throttling cooperation amidst concurrent processes. Furthermore, each process has concurrent threads which manage throttling on a per process basis. Whenever a thread within the process accesses a semaphore to verify the current bandwidth usage of all other processes, it may block. This might lead to a deadlock scenario. For instance, pretend there is only one process running. It still has to both update and check bandwidth usage. Moreover, it has to check if other processes exist so that it can cooperate with them. Nevertheless, suppose one thread checks the semaphore then another thread from the same program tries to obtain the semaphore, it might block. This is specially true with upload bandwidth limitting. This is of importance since we are working with multiple threads. I have mentioned 2 threading models. Recall that a blocking call blocks the whole process. Consequently, the semaphore will block a single thread in model 1. Nonetheless, it will effectively block ALL threads of the program in model 2 since it will block the process containing all threads of the program. Therefore, blocking calls should be avoided in multi-threaded scenarios since it is not guaranteed that using a blocking call will not block ALL threads instead of only the calling thread. This affects all BSD implementations. Of course, blocking calls can still be used if the programmer plans carefully for this. Whenever an upload would begin, dctc would block in a semwait state requiring a kill(1) command invocation. 2. Hide absolute option not working Hide absolute is a dctc option to hide the leading / in a directory absolute reference when returning search results. Also, it prefixes all search results with character . Besides, it also triggers removal of leading / when building the available file database if enabled. However, when checking if a file requested for upload is available with int file_in_db(char *filename, int *virtual); inside src/dc_manage.c, files should be processed to remove the leading if hide absolute is enabled. Since this does not happen, all upload requests do not work except for the available file list (dcflist). Fix: The fix is 2 fold. First, we need to prevent the client from blocking. Then, we need the client to correctly implement the hide absolute option so that working it can properly process upload requests. 1. Prevent blocking Since we are having problems with semaphores blocking ALL threads. We could tell them not to block then write them as a busy wait construct with a small time interval between retries. Investigating src/sema.c source code, the only blocking calls are semop(2) with -1 as operation parameters. Consequently, we will both add IPC_NOWAIT to flag parameters in all of those and rewrite them as busy wait constructs. Nevertheless, this does not solve the contention problem. Semaphores are built for protection of a shared resource; thus, we should add a thread appropriate mutual exclusion mechanism. The most appropriate seems to be mutexes. Also, whenever we busy wait, we will call void pthread_yield(void); from pthread(3), increasing the chance that a concurrent thread releases the semaphore before our next try. See the example Example 1-1 Example 1-1. Replacing a blocking semaphore operation with a non-blocking busy wait one Replace void get_slice(int semid, SPD_SEMA semnum) { while(1) { 5 struct sembuf local={0,-1,0}; /* slave sema */ local.sem_num=semnum; if(semop(semid,&local,1)==0) { 10 /* we have what we want */ return; } } } 15 with hopefully portable #include <sys/param.h> /* interval between busy wait tries measured in microseconds */ #define MUTEX_BUSY_WAIT_TIME 5000 5 void get_slice(int semid, SPD_SEMA semnum) { #if !(defined(BSD) && (BSD >= 199103)) struct sembuf local={0,-1,0}; /* slave sema */ 10#else struct sembuf local={0,-1,0|IPC_NOWAIT}; /* slave sema */ (1) #endif local.sem_num=semnum; 15 (void) lp_mutex_lock_(semaphore_mutex); (2) while(1) { switch (semop(semid,&local,1)) { case 0: (void) lp_mutex_unlock_(semaphore_mutex); (3) 20 /* we have what we want */ return; break; case -1: switch(errno) { (4) case EAGAIN: /* triggers busy wait */ 25 case EINTR: /* interrupted by system call, try again */ pthread_yield(); (5) usleep(MUTEX_BUSY_WAIT_TIME); /* busy wait with a small time out */ (6) continue; break; 30 } } } } (1) Have a non-blocking semaphore (2) Add a mutex protection around the shared semaphore. Acquire mutex lock before trying semaphore (3) Add a mutex protection around the shared semaphore. Release mutex lock when we are done with the semaphore (4) If both the semaphore fails AND it signals that it would have blocked if it could, we will have to try the semaphore again (5) Before trying again, we yield the processor by letting another thread run. Another thread might release the required semaphore (6) To avoid hogging the processor with multiple retries, we will wait a time interval before retrying 1.1 Fix i. Copy my lp_mutex.c BSD licensed mutex handling routines to subdirectory src ii. Apply patch patch-configure.in against configure.in to enable detection of header file sys/param.h which is used to detect if current system is BSD based iii. Apply patch patch-src::Makefile.in against src/Makefile.in to connect lp_mutex.c to the build iv. Apply patch patch-src::sema.c against src/sema.c hopefully adding more portable semaphore code 2. Correct hide absolute option First, we need to pass the correct upload requests to int file_in_db(char *filename, int *virtual); routine that checks if the files exist. Then, we have to make sure that this checking routine understands the requests. We will move the hide absolute option handling routines before the check routine. Then, we will add a specific handling to the check routine. 2.1 Fix i. Apply patch patch-src::dc_manage.c against src/dc_manage.c so that the checking routine receives a correct file request ii. Apply patch patch-src::mydb.c against src/mydb.c so that the checking routine understands the file requested 3. PENDING problems: * Upload bandwidth limitation option has to be enabled for upload to work at all. If it is not enabled, clients are dropped after transferring a few kb * When connected to multiple hubs, the client uses a lot of processing power. Checking system load, it is way over 100%. This happened before the non- blocking semaphore busy wait fix as well How-To-Repeat: 1. Install net/dctc version 0. in one of the affected platforms. Or, build it against a user space thread implementation in one of the unaffected ones. 2. Connect to a Direct Connect(TM) hub 3. Ask someone to try fetching either your available file list or any file for that matter 4. Client freezes on semwait state
State Changed From-To: open->closed FIx committed in port version 0.83.2 update
Hi there, I successfully built the latest version 0.83.2 of the port including the patches, but regardless of whether I do enable upload bandwidth throttling in the client via the "-u" command line switch or not, I still get the same behaviour as before - I can start uploads, but after a couple of seconds, dctc's process state goes from "RUN" to "poll" and stays there, and the client trying to download from me gets disconnected. The machine is a K6-2 350 w/ 128MB running 4.6.2-RELEASE, all connections are on a 100mbit network. Any idea what might still be going wrong? Bye, Daniel