Sam’s Network Simulation Cradle Blog

28 Jan 2004

Changing HZ

Filed under: Network Simulation Cradle — sammydre @ 3:02 pm

Maybe changing HZ in simulation from 100 to 1000 effected throughput or something, I don’t know. However, it did not show any difference with respect to duplicate acks. There is still a whole load of extra ones. Something else is creating this problem. At least now I’ve modified the code enough so that I can change hz easily. It involves changing the value in kern/subr_param.c.

27 Jan 2004

Out of ideas

Filed under: Network Simulation Cradle — sammydre @ 4:35 pm

Again. The kernel hacking produced some results that proves more than one packet gets processed at once in emulation. It does show how much ticks has gone up between packets, though, and this seems quite high!

At any rate, this method of investigation again doesn’t show anything conclusive or particularly useful insofar as solving the current “buggy” behaviour.

Kernel compiled

Filed under: Network Simulation Cradle — sammydre @ 11:10 am

And some interesting results to do with softticks vs. ticks. The debugging info wasn’t quite enough for a running kernel, one more small addition added. I think the amount that is printed out probably effects performance quite a bit of the box, which isn’t not. Not really any way around that, I guess…

Kernel hacking

Filed under: Network Simulation Cradle — sammydre @ 9:27 am

Put debug statements into swi_net in netisr.c. Tested on simulation, now compiling on emulation.

23 Jan 2004

Another Friday

Filed under: Network Simulation Cradle — sammydre @ 6:15 pm

This is getting annoying.

I traced through a lot of code and came out with the fact that simulation and emulation are going through exactly the same code path. Why are they getting different results?

I printed out how simulation and emulation read data from the socket buffers. There is certainly a difference here, but I have NO idea why it is happening: all the theory and everything I have looked at so far tells me that they should be the same. Bloody hell.

22 Jan 2004

Just some notes

Filed under: Network Simulation Cradle — sammydre @ 4:41 pm

Packets are dumped to ERF when: received before stack entry and when sending out through the “ethernet driver”. Note that each node has its own ERF dump. (ERF being the format that is converted to PCAP later). Packets are decoded to stderr at the time they are received. Added short message when packets are sent to aid tracking of packets. The first problems we see occur at time 0.42 seconds sending ACKs with id=8 and id=9 sent by stack_id=2. At time 0.410 and 0.4282 we get the 1500 byte packets which trigger the first ACK. The second ACK is triggered after the socket buffer is emptied from reads. This is because the window can be updated by 2*MSS, one of the criteria for sending a packet. Still don’t see why this doesn’t apply to emulation also. More investigation needed (boy this is taking a long time).

Hardcore debugging

Filed under: Network Simulation Cradle — sammydre @ 8:47 am

Normal debugging, even with the help of conditional breakpoints didn’t help. I identified the specific situation in which the funny duplicate ack behaviour occurs first; this happens early on in slow start. I need to look at the specific code path leading and to and producing the behaviour.

Yesterday I noted some interesting things in RFCs. RFC1122 states that TCP must not generate an ACK before all the segments on the receive queue are processed. To my knowledge FreeBSD does not implement this functionality. I can’t be sure, but it certainly looks that way.

Need to:

  1. check when the packets are printed out: when they are received? sent?
  2. see how this relates to when they are dumped (therefore when/where they appear on the tcptrace graphs)

20 Jan 2004

Duplicate ack/window evidence

Filed under: Network Simulation Cradle — sammydre @ 11:48 am

1] time: 1.632286   1] IPv4 HL:5 TOS:00000000 TL:52 ID:103 Off:16384 TTL:64 TCP
Sum:46857 192.168.1.2 -> 192.168.1.1
Src port:6060 Dst port:49152 Seq:1584143507 Ack:4265607931 Off:8  ACK
Win:25340 Sum:14876 Urg:0
Options: No-Op No-Op Timestamp(0000013c0000012d) 
1] time: 1.632563   1] IPv4 HL:5 TOS:00000000 TL:52 ID:104 Off:16384 TTL:64 TCP
Sum:46856 192.168.1.2 -> 192.168.1.1
Src port:6060 Dst port:49152 Seq:1584143507 Ack:4265609379 Off:8  ACK
Win:26152 Sum:12616 Urg:0
Options: No-Op No-Op Timestamp(0000013c0000012d) 
1] time: 1.632840   1] IPv4 HL:5 TOS:00000000 TL:52 ID:105 Off:16384 TTL:64 TCP
Sum:46855 192.168.1.2 -> 192.168.1.1
Src port:6060 Dst port:49152 Seq:1584143507 Ack:4265609379 Off:8  ACK
Win:27688 Sum:11080 Urg:0
Options: No-Op No-Op Timestamp(0000013c0000012d) 
1] time: 1.633118   1] IPv4 HL:5 TOS:00000000 TL:52 ID:106 Off:16384 TTL:64 TCP
Sum:46854 192.168.1.2 -> 192.168.1.1
Src port:6060 Dst port:49152 Seq:1584143507 Ack:4265609379 Off:8  ACK
Win:29224 Sum:9544 Urg:0

We see a few packets sent at the same time (look at the TCP Timestamp option) with the same ack and sequence numbers but with different values in the window field. Why on earth would you get this?

Can’t figure it out. This is still a complete mystery to me. Why this would change in one set of packets sent at once is beyond me. As far as I can tell this behaviour has nothing to do with loss or anything. It just… Happens.

I have looked into delayed acks a little. They don’t seem to be used often. I’m not sure if this is desired or not. Maybe I should read the RFC/Stevens about how delayed acks are really supposed to work.

Gdb init

Filed under: Network Simulation Cradle — sammydre @ 11:19 am

The file .gdbinit allows you to execute commands whenever gdb is loaded. If you put one in your home dir, it is loaded all the time. If it is in another dir, it is executed only for files in that directory. It was very useful for me to put a .gdbinit file in my ns/ directory.

Window update duplicate acks

Filed under: Network Simulation Cradle — sammydre @ 9:46 am

The most obvious difference now are all the little duplicate acks generated in simulation which update the receive window. These don’t appear in emulation. I wonder why this is?

19 Jan 2004

Congestion avoidance? RTT? Something.

Filed under: Network Simulation Cradle — sammydre @ 4:32 pm

Some more time-sequence graphs. This time, slow start is OK but the differences come after this:

There seem to be duplicate acks all over the show in simulation. I wonder what is up with that. I looked at some example and it had different values in the “window” field of the TCP header. It was getting larger: 31856 then 33304. What is up with that? It shouldn’t be changing at all with duplicate acks? Need to think about this.

Slow start

Filed under: Network Simulation Cradle — sammydre @ 3:49 pm

Ok, the slow start problem was mostly due to “local” addresses in simulation. I thought the way it checked local addresses was different. It turns out if it (FreeBSD network stack) finds it is sending to an address on the same subnet as any of its interfaces, it uses a different slow start flight size. 4 by default. There is no easy way to change ip addresses or subnets in simulation at the moment, so I’ve changed the code so I set that number to 1 now. Slow start in simulation and emulation is very similar now. But there are still differences to look for.

Another monday morning

Filed under: Network Simulation Cradle — sammydre @ 9:58 am

Lets have a look at some zoomed in time sequence graphs:

16 Jan 2004

Differences in simulation and emulation

Filed under: Network Simulation Cradle — sammydre @ 3:53 pm

… Continue to mount. Lets have a look at slow start:

These are vastly different graphs. These graphs show how RTO’s differ when zoomed out, also. I created the graphs with tcptrace. I modified my BSDAgent so it can make packet traces in DAG ERF format, which can then be converted to PCAP easily. This also allows viewing what is going on in tcpdump and so on.

15 Jan 2004

RTO and RTT

Filed under: Network Simulation Cradle — sammydre @ 9:24 pm

Simulation and emulation are very different. First, I fixed up the HZ value on both routers to be 1000. Then made the RTT’s of simulation and emulation smaller, 100ms RTT now. Checked RTO differences from that: Simulation: 780ms Emulation: 540ms

I wonder why they are so different? Also, big differences were encountered in congestion met from slow start: lots in simulation, pretty much none in emulation.

Progress?

Filed under: Network Simulation Cradle — sammydre @ 1:19 pm

No progress. Not getting anywhere at the moment. From what I can tell, I never really see RTO’s in emulation; the duplicate acks do their work (as they should in simulation). I wonder what is up with the window number changing then? Perhaps I should see how this number changes in emulation…

14 Jan 2004

Little or no progress

Filed under: Network Simulation Cradle — sammydre @ 7:27 pm

Nothing interesting has been found out today really. TCP does strange stuff. I can’t figure out what NSC is doing at times. I’ve looked at all sorts of things, sequence number graphs, socket buffer graphs, gone through in gdb looking at stuff, rtt graphs, etc.

If the window field changes in what would otherwise be duplicate ack packets, the receiving end does not acknowledge the packets as duplicate acks and only pays any attention to the fact that the window size has changed. In fact, the duplicate ack counter is reset to 0. This is happening in simulation… Why?

I’ve noticed all sorts of behaviour in simulation. I haven’t actually found any of it to be incorrect at all. However, it doesn’t agree with emulation at all. I need to figure out what the next step from here is.

13 Jan 2004

Late night investigation

Filed under: Network Simulation Cradle — sammydre @ 10:09 pm

Me and Perry did lots of investigation on the weird outcomes. We’ve tracked down the weirdness somewhat. It seems that simulations that SHOULD be identical aren’t. For some reason. This needs to be resolved ASAP.

I’ve got to the point where I’m looking at 5 second long TCP transfers in an effort to see exactly what is happening. There are certainly differences, and in these little transfers they are caused by congestion loss. This is OK, but they differ heaps where I don’t believe they should. If the socket buffer is always full in a simulation, then each simulation should get exactly the same results, no matter what the presented load is: the presented load is only there to keep the buffer full. Currently this doesn’t seem to be the case, so next I’ll check whether the socket buffer is staying full. It should be.

RTT’s Today

Filed under: Network Simulation Cradle — sammydre @ 11:32 am

The huge amount of variation NSC shows from the emulation on the graph linked to yesterday is concerning. I created some graphs showing RTT to try and see why the retransmission timeout (RTO) was varying a bit between simulations, as I believe that is what is causing such a discrepancy from simulation to simulation. Haven’t got anything solid to go on, though, all I’ve found is that I can create some pretty graphs. This DOES illustrate how easy it is to get interesting information out of TCP with NSC. Unfortunately it is not giving quite the same results as emulation though…

A couple of RTT graphs: graph for 1.72 presented load and graph for 1.74 presented load. These two selected points of the graph presented yesterday get quite dramatically different throughputs. I’m not sure what the RTT graphs here tell me, very little I believe at the moment. Also, I do not understand why srtt is at the top in those graphs, I think the units are still incorrect. RTO should always be more then srtt!

The main thing I’ve got to go on is the fact that router queus were not the same length in simulation and emulation. Now re-running all simulations and emulations.

Really need to get writing for SIGCOMM. Things to look at include:

  • Performance: look at stuff in 420, + show how data copying is the bottleneck
  • Methodology: 3-4 pages might well be ok. Higher level stuff, like parser interesting, just sum up the hard stuff (init. stuff)
  • Do emulations of the interesting simulations to show correctness

12 Jan 2004

1 bit/s Monday

Filed under: Network Simulation Cradle — sammydre @ 4:05 pm

Some interesting results in emulation today:

64 bytes from 192.168.6.2: icmp_seq=0 ttl=62 time=672100.433 ms
64 bytes from 192.168.6.2: icmp_seq=1 ttl=62 time=1343097.396 ms

Maybe because of this:

machine4:~ $ ipfw pipe 2 show
00002:   1.000 bit/s   100 ms   50 sl. 1 queues (1 buckets) droptail
    mask: 0x00 0x00000000/0x0000 -> 0x00000000/0x0000
BKT Prot ___Source IP/port____ ____Dest. IP/port____ Tot_pkt/bytes Pk
  0 icmp     192.168.3.2/0         192.168.6.2/0     1704   143136 50

1 bit/second is not very fast.

Managed to eventually get some emulation done. Initial results here.

Next Page »

Powered by WordPress