=========================================================================== CCR Review #43A Updated Wednesday 20 Jan 2010 4:49:24am EST --------------------------------------------------------------------------- Paper #43: Investigating the Impact of Service Provider NAT on Residential Broadband Users --------------------------------------------------------------------------- Timeliness: 5. This topic is likely to become hot in the next year Novelty: 1. Little to add to existing work Technical correctness: 3. One major or several minor technical errors Clarity: 3. Clear, but with rough patches Recommendation: 3. Reject ===== Summary of contribution ===== This work uses DSL traffic traces from an ISP to conduct a simulation study of the workload of an SPNAT session table. Two parameters are evaluated: the rate of table insertions+deletions and the peak size of the table. The authors find that a significant number of table entries are due to single-packet UDP flows and propose using a shorter inactivity timeout for single-packet UDP flows to substantially reduce the table size. This is a useful study, but the contributions it makes (the workload characterization and the simple simulation results) are rather limited, not sufficient for a CCR publication. The authors might want to improve their paper based on the detailed comments bellow. ===== Detailed comments ===== The paper tries to answer the question: how many customers should an SPNAT serve? However, it does not explicitly describe the methodology that should be used to provision an SPNAT and does not argue about the suitability of the methodology. The paper focuses on measuring the peak transaction rate per user, which I assume would be used to provision an SPNAT simply by extrapolating the max per-user rate to the capacity of the SPNAT. Why is this a good approach to provision an SPNAT? Wouldn’t it result in dramatically over-provisioning and underutilizing the available resources? Would statistical techniques be suitable here? In any case, the provisioning method should be explained and argued better. Besides, it would be nice to add a Figure of the architecture of an SPNAT. The authors say “The values presented in Figure 1 were produced by dividing the total number of transactions during the busy second by the number of subscribers that were active during the entire thirty minute period”. I do not understand this sentence. Doesn't this approach miss some subscribers that might were active during the busy second, but not during the entire thirty minute period? The reader is left with a mixed understanding from the high number of timed-out UDP flows (that re-appear) due to the described BitTorrent bias. What conclusions should one extract from it? Could this number help in selecting a timeout value? The authors should try to study this in more depth. =========================================================================== CCR Review #43B Updated Saturday 23 Jan 2010 2:31:33pm EST --------------------------------------------------------------------------- Paper #43: Investigating the Impact of Service Provider NAT on Residential Broadband Users --------------------------------------------------------------------------- Timeliness: 4. Hot topic with a considerable amount of active work Novelty: 3. Distinct addition to the state of the art Technical correctness: 3. One major or several minor technical errors Clarity: 4. In good shape, with minor improvements necessary Recommendation: 2. Revise-and-resubmit to next issue ===== Summary of contribution ===== This paper is a first attempt at looking into potential SPNAT resource consumption in ISPs, and the dependence of flow expiry timeouts on NAT session table size. The authors use packet traces from the border of a real ISP. They find that table size can be very large, because of a large number of UDP sessions which lasted for a single packet transmission; payload analysis showed that this is almost always due to a single application (Bittorrent). The authors recommend using a small (few seconds) timeout for expiring UDP sessions. ===== Detailed comments ===== This paper is a first attempt at looking into potential SPNAT resource consumption in ISPs, and the dependence of flow expiry timeouts on NAT session table size. The authors use packet traces from the border of a real ISP. They find that table size can be very large, because of a large number of UDP sessions which lasted for a single packet transmission; payload analysis showed that this is almost always due to a single application (Bittorrent). The authors further explore the impact of changing session expiry timeout for UDP on the table size, and find that bringing it down from 2min. to a couple of seconds significantly changes the distribution of number of entries. As the authors point out, this can lead to an increase in "churn" in the session table, as well as premature expiry of session entries. The paper is well written, and explores the SPNAT timeout configuration problem well. There are a few concerns: - A high level question is that related to the business aspect. The ISP studied did not have an SPNAT implementation, and hence had a suitable public IP address pool for customers. Would this ISP have an incentive to shift to using a small number of NATs, and thus leave most of its IP address pool unused? Maybe a more practical approach for such ISPs involves using more NATs - and thus lesser users/NAT - after considering the cost trade-off? - It is not uncommon to see transparent but stateful firewalls in ISPs (without NATs) today - to avoid DoS attacks. These firewalls do a job similar to SPNATs. What is the state maintenance and processing overhead in these firewall deployments? Can we reuse any lessons from them? - Traces: this may be confidential information, but it would help to know the scale of the ISP being studied. How many users? Geography? How representative is the ISP and its customer base? Does the ISP have a firewall (also see next point)? - It is also mentioned that the ISP did not have an SPNAT implementation when the traces were collected. How significantly do you think the traffic behavioral patterns change when there is a NAT implementation? For example, some applications (such as Skype, Bittorrent, etc.) change their session-initiation behaviour (ex. relaying, hole punching) when they are behind a NAT. - Sec. 5: You recommend that the short session expiry timeout be used only for the first packet in a UDP session. What is the distribution of the number of packets per UDP flow that you have observed in the traces (after considering 1s and 2min. timeouts)? - Can you comment on the processing (both CPU and memory) overhead and latency of having a significant "churn" in the session table? These are factors that need to be considered when we are using shorter timeout values, especially at rates as high as 0.9 transactions/s/subscriber. Overall, the paper is good, but can be made stronger. A noticeable aspect of the analysis that is missing is a small accompanying study of overheads in a real SPNAT, as detailed above. =========================================================================== CCR Review #43C Updated Monday 25 Jan 2010 10:57:48am EST --------------------------------------------------------------------------- Paper #43: Investigating the Impact of Service Provider NAT on Residential Broadband Users --------------------------------------------------------------------------- Timeliness: 4. Hot topic with a considerable amount of active work Novelty: 1. Little to add to existing work Technical correctness: 3. One major or several minor technical errors Clarity: 3. Clear, but with rough patches Recommendation: 2. Revise-and-resubmit to next issue ===== Summary of contribution ===== This paper provides practical recommendations to allow for efficient Service Providers NATs (SPNAT) implementations. Through the analysis of a real world traffic dataset, authors observed that short lived UDP sessions occur often and may have an impact on the resources of SPNAT. An analysis of what value should be used to expire UDP sessions has been performed, showing that the decrease of the expiry threshold would decrease significantly the NAT table utilization. ===== Detailed comments ===== The authors really fail to concretely justify why the study they are providing is important. Implementing differently UDP session expiry threshold, might be important, but it is not clear if there is a great need for this. Furthermore, you provide no concrete evidence that NAT resource utilization actually _is_ a critically important issue, so the paper's entire motivation is weak. (Perhaps there is good evidence -- you need to place it early in the paper.) For instance, I would suggest to evaluate the performance gain that such a technique would have on other streams traversing the SPNAT. The rest of the paper is interesting, but it is hard to be excited when the rationale isn't well motivated. Another concern I had with the results presented in this paper was to show : to what extent the values obtained from this 4-days dataset were generic. Surprisingly, UDP sessions analysis has shown that roughly 95\% of the sessions accounted for BitTorrent DHT messages. It is then well established that BitTorrent DHT messages and usage may greatly impact the results presented here. What about ASes and even countries where BitTorrent DHT usage is not prevalent, and where BitTorrent users are mostly relying on trackers communications! I believe this may have a big impact on the values and results presented in this paper. At least, a thorough analysis is needed to better understand if BitTorrent is really impacting the UDP sessions. Ideally, a comparison with a BitTorrent-UDP free traffic can help, even in a controlled environment. =========================================================================== CCR Review #43D Updated Monday 25 Jan 2010 6:39:26pm EST --------------------------------------------------------------------------- Paper #43: Investigating the Impact of Service Provider NAT on Residential Broadband Users --------------------------------------------------------------------------- Timeliness: 5. This topic is likely to become hot in the next year Novelty: 2. Solid but incremental contribution Technical correctness: 3. One major or several minor technical errors Clarity: 4. In good shape, with minor improvements necessary Recommendation: 2. Revise-and-resubmit to next issue ===== Summary of contribution ===== This paper studies the number of active connections in SPNATs. by replaying a real trace over a NAT emulator, they derive this number, and show that connections are mainly composed of small UDP connections. This is simply because of the large timeout value. The solution proposed to this problem is to reduce the timeout value. ===== Detailed comments ===== The reviewers and myself have several concerns about the paper. The paper can be resubmitted after addressing all of them in a satisfactory way. Here is a summary, but refer to reviewers comments for more details: - better motivation of the problem. why a large number of entries in a NAT is bad ? can this impact be quantified in some way ? - simulation setup. the replayed traffic does not come from a real NAT. why this traffic would not change one a NAT is deployed ? some applications change their behavior. Can the authors explain how well the considered traffic is faithfull of the one traversing NATs ? - what would be the impact of changing timeouts on the application traversing the NAT ? reducing the timeout splits some UDP flows, and splitting UDP flows would change their public IP addresses. how would this impact the applications ?