Bug 41323 - net/dctc freezes in semwait state if anyone tries an upload
Summary: net/dctc freezes in semwait state if anyone tries an upload
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-ports (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-08-04 18:50 UTC by Mario Sergio Fujikawa Ferreira
Modified: 2002-08-26 22:30 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mario Sergio Fujikawa Ferreira freebsd_committer freebsd_triage 2002-08-04 18:50:02 UTC
	Problem Report written to help others understand the port
fix. None patches exist here, check the CVS (or appropriate control
version system) for net/dctc port looking for this PR number. The
fix is there in version 0.83.2 of the port.

	Explanation begins below


dctc is a Direct Connect(TM) client. Amongst its advanced features
are both bandwidth throttling and multiple part file download from
multiple hubs. It employs a combination of multi-thread and
multi-process programming models to achieve its goals.

It uses one process for each Direct Connect(TM) hub it connects to.
These processes use semaphores to insure that all processes are
correctly synchronized so that all processes bandwidth usage summed
up do not surpass user chosen bandwidth limits.

On a per process basis, there are threads. One thread communicates
with the hub while other thread manages bandwidth throttling. It
periodically checks the overall bandwidth usage of the summed
concurring processes throttling its process accordinly.

PROBLEM:

1. Semaphore versus multi-threading

dctc utilizes semaphores within each process to manage throttling
cooperation amidst concurrent  processes. Furthermore, each process
has concurrent threads which manage throttling on a per process
basis. Whenever a thread within the process accesses a semaphore
to verify the current bandwidth usage of all other processes, it
may block. This might lead to a deadlock scenario.

For instance, pretend there is only one process running. It still
has to both update and check bandwidth usage. Moreover, it has to
check if other processes exist so that it can cooperate with them.
Nevertheless, suppose one thread checks the semaphore then another
thread from the same program tries to obtain the semaphore, it might
block. This is specially true with upload bandwidth limitting.

This is of importance since we are working with multiple threads.
I have mentioned 2 threading models. Recall that a blocking call
blocks the whole process. Consequently, the semaphore will block a
single thread in model 1. Nonetheless, it will effectively block
ALL threads of the program in model 2 since it will block the process
containing all threads of the program.

Therefore, blocking calls should be avoided in multi-threaded
scenarios since it is not guaranteed that using a blocking call
will not block ALL threads instead of only the calling thread. This
affects all BSD implementations. Of course, blocking calls can still
be used if the programmer plans carefully for this.

Whenever an upload would begin, dctc would block in a semwait
state requiring a kill(1) command invocation.

2. Hide absolute option not working

Hide absolute is a dctc option to hide the leading / in a
directory absolute reference when returning search results. Also,
it prefixes all search results with character . Besides, it also
triggers removal of leading / when building the available file
database if enabled.

However, when checking if a file requested for upload is available
with

int file_in_db(char *filename, int *virtual);

inside src/dc_manage.c, files should be processed to remove the
leading  if hide absolute is enabled. Since this does not happen,
all upload requests do not work except for the available file list
(dcflist).

Fix: 

The fix is 2 fold. First, we need to prevent the client from blocking.
Then, we need the client to correctly implement the hide absolute
option so that working it can properly process upload requests.

1. Prevent blocking

Since we are having problems with semaphores blocking ALL threads. We could tell them not to block then write them as a busy wait construct with a small time interval between retries.

Investigating src/sema.c source code, the only blocking calls are semop(2) with -1 as operation parameters. Consequently, we will both add IPC_NOWAIT to flag parameters in all of those and rewrite them as busy wait constructs.

Nevertheless, this does not solve the contention problem. Semaphores are built for protection of a shared resource; thus, we should add a thread appropriate mutual exclusion mechanism. The most appropriate seems to be mutexes.

Also, whenever we busy wait, we will call

void pthread_yield(void);

from pthread(3), increasing the chance that a concurrent thread releases the semaphore before our next try.

See the example Example 1-1

Example 1-1. Replacing a blocking semaphore operation with a non-blocking busy wait one

Replace

       void get_slice(int semid, SPD_SEMA semnum)
       {
               while(1)  
               {
      5                struct sembuf local={0,-1,0};           /* slave sema */
       
      	               local.sem_num=semnum;
                       if(semop(semid,&local,1)==0)
                       {  
     10                        /* we have what we want */
                               return;
                       }
               }
       }
     15 


with hopefully portable

       #include <sys/param.h>
       
       /* interval between busy wait tries measured in microseconds */
       #define MUTEX_BUSY_WAIT_TIME    5000
      5
       void get_slice(int semid, SPD_SEMA semnum)
       {
       #if !(defined(BSD) && (BSD >= 199103))
               struct sembuf local={0,-1,0};                                   /* slave sema */
     10#else
               struct sembuf local={0,-1,0|IPC_NOWAIT};                        /* slave sema */	(1)
       #endif
               local.sem_num=semnum;
       
     15        (void) lp_mutex_lock_(semaphore_mutex);            				(2)
               while(1)
               {
                       switch (semop(semid,&local,1)) {
                               case 0: (void) lp_mutex_unlock_(semaphore_mutex);		(3)
     20                                                /* we have what we want */
                                                       return;
                                                       break;
                               case -1:        switch(errno) {    				(4)
                                                       case EAGAIN:                    /* triggers busy wait */
     25                                                case EINTR:                     /* interrupted by system call, try again */
                                                                               pthread_yield();	(5)
                                                                               usleep(MUTEX_BUSY_WAIT_TIME);   /* busy wait with a small time out */		(6)
                                                                               continue;
                                                                               break;
     30                                        }
                       }
               }
       }
          



(1) Have a non-blocking semaphore

(2) Add a mutex protection around the shared semaphore. Acquire
    mutex lock before trying semaphore

(3) Add a mutex protection around the shared semaphore. Release
    mutex lock when we are done with the semaphore

(4) If both the semaphore fails AND it signals that it would have
    blocked if it could, we will have to try the semaphore again

(5) Before trying again, we yield the processor by letting another
    thread run. Another thread might release the required semaphore

(6) To avoid hogging the processor with multiple retries, we will
    wait a time interval before retrying

1.1 Fix

i. Copy my lp_mutex.c BSD licensed mutex handling routines to
   subdirectory src

ii. Apply patch patch-configure.in against configure.in to enable
    detection of header file sys/param.h which is used to detect
    if current system is BSD based

iii. Apply patch patch-src::Makefile.in against src/Makefile.in to
     connect lp_mutex.c to the build

iv. Apply patch patch-src::sema.c against src/sema.c hopefully
    adding more portable semaphore code

2. Correct hide absolute option

First, we need to pass the correct upload requests to

int file_in_db(char *filename, int *virtual);

routine that checks if the files exist. Then, we have to make sure
that this checking routine understands the requests.

We will move the hide absolute option handling routines before the
check routine. Then, we will add a specific handling to the check
routine.

2.1 Fix

i. Apply patch patch-src::dc_manage.c against src/dc_manage.c so
   that the checking routine receives a correct file request

ii. Apply patch patch-src::mydb.c against src/mydb.c so that the
    checking routine understands the file requested

3. PENDING problems:

* Upload bandwidth limitation option has to be enabled for upload
  to work at all. If it is not enabled, clients are dropped after
  transferring a few kb

* When connected to multiple hubs, the client uses a lot of processing
  power. Checking system load, it is way over 100%. This happened
  before the non- blocking semaphore busy wait fix as well
How-To-Repeat: 
1. Install net/dctc version 0.  in one of the affected platforms.
Or, build it against a user space thread implementation in one of
the unaffected ones.

2. Connect to a Direct Connect(TM) hub

3. Ask someone to try fetching either your available file list or any file for that matter

4. Client freezes on semwait state
Comment 1 Mario Sergio Fujikawa Ferreira freebsd_committer freebsd_triage 2002-08-04 19:03:52 UTC
State Changed
From-To: open->closed

FIx committed in port version 0.83.2 update
Comment 2 razzfazz 2002-08-26 22:28:49 UTC
Hi there,

I successfully built the latest version 0.83.2 of the port including the
patches, but regardless of whether I do enable upload bandwidth throttling
in the client via the "-u" command line switch or not, I still get the same
behaviour as before - I can start uploads, but after a couple of seconds,
dctc's process state goes from "RUN" to "poll" and stays there, and the
client trying to download from me gets disconnected.

The machine is a K6-2 350 w/ 128MB running 4.6.2-RELEASE, all connections
are on a 100mbit network.

Any idea what might still be going wrong?

Bye,
Daniel