Massively Scalable…
by Craig Kaes
Scalability has always been a hallmark of Jabber XCP, in fact it was the primary reason we forked away from the jabberd open source code back in 2000 and why we have almost no code in common with that project today.
Scale, followed by extensibility, and broad interoperability were the driving forces for our largest customers back then. When we delivered the real-time messaging service for Disney’s Go network back in the day, we were tops in scalability by just about every measure. Today, scale remains a key issue for really big customers, including many carriers, which is why we just completed some heavy load and scalability testing.
When we stepped into the test center recently, we already knew we could scale, with one telecom customer having already certified Jabber XCP as capable of supporting 100,000 concurrent users on a single box. We walked out a few weeks later in awe of our server and the 1,020,000 concurrent users we have now been independently certified as being able to support across a single domain.
For this test, the baseline was 420,000 concurrent users on a single T2000. Each concurrent user was programmed to methodically exchange messages, presence, and subscription requests to simulate likely scenarios in a carrier environment. At the top end of this test we had 1.8 million registered users with an average roster size of 30 . Each concurrent user performed an action every 120 seconds for the duration of the test, resulting in more than 15,000 delivered stanzas per second…these were not stagnant accounts.
What amazed even us was that as the scale of the test increased, the efficiency of Jabber XCP also increased while CPU utilization decreased significantly. For Jabber, Inc., this furthers our hypothesis that the scale of Jabber XCP - even in a single domain - has no known limit, clearly reaffirming that when it comes to scalability, Jabber XCP is in a class by itself.
If you’re interested, I’m happy to answer publicly or privately any questions about our methodology and results, along with the specific hardware requirements necessary to achieve supreme scale using Jabber XCP.
Please post your questions here as comments to this post or email me personally at ckaes AT jabber.com.

November 16, 2006 at 10:56 am
What was the specific hardware configuration, and how much, and what kind of tuning did you do with the operating system? (I assume solaris 10)
———-
Craig Kaes Says:
November 17, 2006 at 11:31 am
We used 2 4-core T2000s for our connection managers and 6 8-core T2000s for our routers. Because of the nature of the baseline and the constraints of the test, we had 3 of the 8 cores disabled on each of the
6 router boxes. It would be a delight to see what we could do if we used all 8 cores for those tests.
As for system tuning, we got access to a beta tool from Sun called CoolTuner (http://www.sun.com/download/products.xml?id=4501bae1). It added the following to the system file, but I’m not sure that any of it was actually needed. We were using Solaris 10 and the OS really never got in our way.
set ip:ip_squeue_bind=0
set ip:ip_squeue_fanout=1
set ipge:ipge_tx_syncq=1
set ipge:ipge_srv_fifo_depth=16000
set ipge:ipge_bcopy_thresh=512
set ipge:ipge_dvma_thresh=1
set pcie:pcie_aer_ce_mask=0×1 # Not necessary in T2000+ shipping June
2006
set rlim_fd_cur=260000
set rlim_fd_max=260000
set maxphys=1048576
set md:md_maxphys=1048576
November 16, 2006 at 2:00 pm
“What amazed even us was that as the scale of the test increased, the efficiency of Jabber XCP also increased while CPU utilization decreased significantly.” Umm… what? Am I reading that right? You’ve discovered an inexhaustible source of CPU cycles?
——–
Craig Kaes Says:
November 17, 2006 at 4:06 pm
To clarify - As we increased the number of processors, CPU utilization per box decreased, not overall CPU utilization. What this showed was that we could have added more users with each box that we added and that the growth was indeed linear. It would be absurd to assert exponential scalability
November 17, 2006 at 10:10 am
Do you mean exactly what you write or are there some “per user” missing?
——–
Craig Kaes Says:
November 17, 2006 at 4:06 pm
Yeah — cpu decreased per box — not overall. This decrease was linear which, together with the linear addition of the usersa shows overall linear scalability and also shows that we were too conservative in the number of users that we added as we added additional hardware
May 18, 2007 at 12:32 am
Do you have any metrics on XCP (5.2 or earlier) running on a Linux Server vs. the Solaris 10 configuration that you presented here?
Craig Kaes replies:
XCP is a highly configurable platform so despite a number of large load tests we’ve done on Linux, none would serve as a direct comparison to the 1M user Solaris test. The closest one to that configuration that we did was a 200k user test spread over five XCP routers, each running on a dual proc xeon 3.6Ghz machine. Additionally we used 16 connection manager processes spread of eight similar machines. The two largest differences between this test and the Solaris test were 1) PostgreSQL 8.1 (Linux) vs. Oracle 10 (Solaris) and 2) the Linux test was run before a number of configuration and code optimizations were identified and implemented. In the Linux test, the database was definitely the limiting factor where in the Solaris test, we were only using about 1/10 of the resources on the database server.
It would certainly be interesting to run a large scale Linux test similar to what we’ve already done on Solaris — especially with the optimizations in and with the knowledge we’ve gained.
September 23, 2007 at 11:14 am
oovoo.com