[Openswan dev] Openswan OCF : ----- USE_BATCH mode + ERESTART

Thu Dec 7 13:20:12 EST 2006

Hi David,
Thanks for the response. I didn't try out your patch but I have some
comments on your patch. 

This is for crypto_dispatch:
==============================
@@ -674,6 +674,7 @@
 		 * driver unless the driver is currently blocked.
 		 */
 		if (cap && !cap->cc_qblocked) {
+			crypto_drivers[hid].cc_qblocked = 1;
 			CRYPTO_Q_UNLOCK();
 			result = crypto_invoke(crp, 0);
 			CRYPTO_Q_LOCK();
@@ -687,11 +688,11 @@
 				 * order is preserved but this can place
them
 				 * behind batch'd ops.
 				 */
-				crypto_drivers[hid].cc_qblocked = 1;
 				list_add_tail(&crp->crp_list, &crp_q);
 				cryptostats.cs_blocks++;
 				result = 0;
-			}
+			} else
+				crypto_drivers[hid].cc_qblocked = 0;
 		} else {
 			/*
 			 * The driver is blocked, just queue the op
until

My comments:
-------------
The code segment shown above is executed _only_ when USE_BATCH=0. In
this mode we don't invoke the crypto_proc thread. For every packet that
comes in crypto_invoke is called. What you are suggesting here is as
follows:

*one packet comes, we determine that the Q is not blocked, why we are
marking it as blocked before calling crypto_invoke, because
crypto_invoke may have passed through without getting ERESTART from the
driver. 

*moreover since we have marked that Q is blocked, let's assume the
driver didn't assert ERESTART, then for the next packet that comes in we
are not calling the crypto_invoke() and just adding the request to the
list. But in this non batch mode we are not invoking crypto_proc thread
to process the request in the Q. This indicates for the second packet we
have received we have actually done nothing, because we marked the Q as
blocked. I have tested the non-batch mode without your patch and things
seem to work fine, that is to say, we block the Q only when driver
signals ERESTART. Do you think my understanding is correct or am I
missing something obvious?


This is for crypto_kdispatch:
==============================

@@ -738,6 +739,7 @@
 	CRYPTO_Q_LOCK();
 	cap = crypto_checkdriver(krp->krp_hid);
 	if (cap && !cap->cc_kqblocked) {
+		crypto_drivers[krp->krp_hid].cc_kqblocked = 1;
 		CRYPTO_Q_UNLOCK();
 		result = crypto_kinvoke(krp, 0);
 		CRYPTO_Q_LOCK();
@@ -751,10 +753,10 @@
 			 * at the front.  This should be ok; putting
 			 * it at the end does not work.
 			 */
-			crypto_drivers[krp->krp_hid].cc_kqblocked = 1;
 			list_add_tail(&krp->krp_list, &crp_kq);
 			cryptostats.cs_kblocks++;
-		}
+		} else
+			crypto_drivers[krp->krp_hid].cc_kqblocked = 0;
 	} else {
 		/*
 		 * The driver is blocked, just queue the op until


My comments:
-------------
Same like the above. 


This is for crypto_proc:
==============================

@@ -1103,6 +1105,7 @@
 		}
 		if (submit != NULL) {
 			list_del(&submit->crp_list);
+
crypto_drivers[CRYPTO_SESID2HID(submit->crp_sid)].cc_qblocked = 1;
 			CRYPTO_Q_UNLOCK();
 			result = crypto_invoke(submit, hint);
 			CRYPTO_Q_LOCK();
@@ -1117,10 +1120,10 @@
 				 * it at the end does not work.
 				 */
 				/* XXX validate sid again? */
-
crypto_drivers[CRYPTO_SESID2HID(submit->crp_sid)].cc_qblocked = 1;
 				list_add(&submit->crp_list, &crp_q);
 				cryptostats.cs_blocks++;
-			}
+			} else
+
crypto_drivers[CRYPTO_SESID2HID(submit->crp_sid)].cc_qblocked=0;
 		}
 
 		/* As above, but for key ops */
@@ -1139,6 +1142,7 @@
 		}
 		if (krp != NULL) {
 			list_del(&krp->krp_list);
+			crypto_drivers[krp->krp_hid].cc_kqblocked = 1;
 			CRYPTO_Q_UNLOCK();
 			result = crypto_kinvoke(krp, 0);
 			CRYPTO_Q_LOCK();
@@ -1153,10 +1157,10 @@
 				 * it at the end does not work.
 				 */
 				/* XXX validate sid again? */
-
crypto_drivers[krp->krp_hid].cc_kqblocked = 1;
 				list_add(&krp->krp_list, &crp_kq);
 				cryptostats.cs_kblocks++;
-			}
+			} else
+
crypto_drivers[krp->krp_hid].cc_kqblocked = 0;
 		}
 
 		if (submit == NULL && krp == NULL) { 

My comments:
-------------
I guess in _only_ in the batch mode this thread wakes up and does work.
I will test with your patch and let you know the results. 




I also believe that in addition to this change we have to use the
crypto_unblock from the driver?

-ahsan.

-----Original Message-----
From: David McCullough [mailto:david_mccullough at au.securecomputing.com] 
Sent: Thursday, December 07, 2006 5:22 AM
To: Kabir Ahsan-r9aahw
Cc: dev at openswan.org
Subject: Re: [Openswan dev] Openswan OCF : ----- USE_BATCH mode +
ERESTART


Jivin Kabir Ahsan-r9aahw lays it down ...
> Hi All
> I am trying to get the USE_BATCH (=1) mode to work for my crypto
driver.
> I guess I have found some problem, that is, crypto driver signals 
> ERESTART as a function call return (line 1317 below), then the crypto 
> thread sets the cc_qblocked to 1 (line 1333 below).

Ok I have fixed this since the last release.  A patch is attached.
Here is my description:

	Fix a problem where a driver would return ERESTART (full) but
	then unblock itself before the upper layer had marked it as
blocked. This
	caused the code to get stuck in crypto_proc and process no more
	requests. 

Sorry for the slow response,  I'll blame being on vacation :-)

Cheers,
Davidm

> problem where a driver would return ERESTART (full) but then unblock 
> itself before the upper layer had marked it as blocked. This caused
> the code to get stuck in crypto_proc and process no more requests.   

>   1303                 if (submit != NULL) {
>    1304                         /* AK added */
>    1305                         //if (!( gKeepTrackOfCryptoThread %
> 100)) {
>    1306                         //      printk("6," );
>    1307                         //}
>    1308                         list_del(&submit->crp_list);
>    1309                         CRYPTO_Q_UNLOCK();
>    1310
>    1311                         //trace_set_L1(c,30); /* 30 is the id
> for OCF code */
>    1312                         //trace_set_L1(f,7); /* trace point */
>    1313                         //trace_log_L1(trace_var(c),
> trace_var(f), 200); /* log */
>    1314
>    1315                         result = crypto_invoke(submit, hint);
>    1316                         CRYPTO_Q_LOCK();
>    1317                         if (result == ERESTART) {
>    1318
>    1319                                 /* AK added */
>    1320                                 //if (!(
> gKeepTrackOfCryptoThread % 100)) {
>    1321                                 //      printk("7," );
>    1322                                 //}
>    1323                                 /*
>    1324                                  * The driver ran out of
> resources, mark the
>    1325                                  * driver ``blocked'' for
> cryptop's and put
>    1326                                  * the request back in the
> queue.  It would
>    1327                                  * best to put the request
back
> where we got
>    1328                                  * it but that's hard so for
now
> we put it
>    1329                                  * at the front.  This should
be
> ok; putting
>    1330                                  * it at the end does not
work.
>    1331                                  */
>    1332                                 /* XXX validate sid again? */
>    1333
> crypto_drivers[CRYPTO_SESID2HID(submit->crp_sid)].cc_qblocked = 1 ;
>    1334                                 list_add(&submit->crp_list,
> &crp_q);
>    1335                                 cryptostats.cs_blocks++;
>    1336                         }
>    1337                 }
> 
>  
> Because the crypto hardware is not able to cope up with the request 
> now it makes sense for the crypto thread to go to sleep and later be 
> awakened by the crypto driver. But I guess this thread doesn't go to 
> sleep because the wait_on_event condition argument is not turning out 
> to be false. Here is the code segment. Line 1414 should put the thread

> to sleep but it does not because the submit queue is not empty, when 
> crypto driver returned ERESTART.
>  
>    1375                 if (submit == NULL && krp == NULL) {
>    1376                         /* AK added */
>    1377                         //if (!( gKeepTrackOfCryptoThread %
> 100)) {
>    1378                         //      printk("8," );
>    1379                         //}
>    1380                         /*
>    1381                          * Nothing more to be processed.
Sleep
> until we're
>    1382                          * woken because there are more ops to
> process.
>    1383                          * This happens either by submission
or
> by a driver
>    1384                          * becoming unblocked and notifying us
> through
>    1385                          * crypto_unblock.  Note that when we
> wakeup we
>    1386                          * start processing each queue again
> from the
>    1387                          * front. It's not clear that it's
> important to
>    1388                          * preserve this ordering since ops
may
> finish
>    1389                          * out of order if dispatched to
> different devices
>    1390                          * and some become blocked while
others
> do not.
>    1391                          */
>    1392                         //printk("1 : sleeping \n");
>    1393                         dprintk("%s - sleeping\n",
> __FUNCTION__);
>    1394                         CRYPTO_Q_UNLOCK();
>    1395
>    1414
> wait_event_interruptible(cryptoproc_wait,
>    1415                                         cryptoproc == (pid_t)
-1
> ||
>    1416                                         !list_empty(&crp_q) ||
>    1417                                         !list_empty(&crp_kq));
>    1418                         ...
> 
>  
>  
> >From the crypto driver routine I call crypto_unblock and believe the
> purpose of this is to wake up the crypto thread. In the first place 
> this thread didn't go to sleep. Am I missing something? How this 
> scheme is supposed to work? Didn't anyone run into ERESTART condition 
> and was able to get out this condition gracefully? This ERESTART 
> problem will show up more when the injected traffic to the VPN testbed

> is very high. For instance, in the lab I was blasting 100% of line 
> rate (gigabit) with packet size of 1024B.
>  
> Please shed some light.
>  
> Ahsan. 

> _______________________________________________
> Dev mailing list
> Dev at openswan.org
> http://lists.openswan.org/mailman/listinfo/dev


-- 
David McCullough,  david_mccullough at securecomputing.com,   Ph:+61
734352815
Secure Computing - SnapGear  http://www.uCdot.org
http://www.cyberguard.com