A Terminal Anywhere for Web2Py?

Anyterm – Introduction

Anyterm

A Terminal Anywhere

Introduction

Have you ever wanted SSH or telnet access to your system from an
“internet desert” – from behind a strict firewall,
from an internet cafe, or even from a mobile phone? Anyterm is a
combination of a web page and a process that runs on your web server
that provides this access – see the demos.

Anyterm can use almost any web browser and even works through
firewalls. If you join my.anyterm.org you can
access your systems straight away via our server with no software to
install anywhere. Alternatively, you can run the Anyterm software
on your own system – see the deployment
examples
.

We can also help you to integrate Anyterm-type functionality into
your own applicatons, for example to web-enable a legacy system, or
an embedded system. Contact us for details.

How It Works

Anyterm consists of some Javascript on a web page, an
XmlHttpRequest channel on standard ports back to the server, an
HTTP proxy such as Apache’s mod_proxy and the Anyterm daemon.
The daemon uses a pseudo-terminal to communicate with
a shell or other application, and includes terminal emulation. Key
presses are picked up by the Javscript which sends them to the
daemon; changes to the emulated screen are sent from the
daemon to the Javascript which updates its display. Performance is
quite reasonable and SSL can be used to secure the connection.

my.anyterm.org

my.anyterm.org is designed for systems
administrators and others who want the benefit of access from
anywhere using Anyterm, but who don’t want to risk
installing the Anyterm software on their own servers. For a small
charge you can use our Anyterm installation to connect to your own
systems.

Status

Anyterm’s stable 1.0 branch provides a fairly reliable
implementation of the basic Anyterm functionality. Bug fixes will
continue to be applied to this branch if necessary. It uses an
Apache module rather than a separate daemon. This version is now
rather old, and new users are encouraged to instead use the 1.1
development branch.

The 1.1 branch is where development will continue.
Instead of the Apache module, this branch has a stand-alone Anyterm
daemon. It is now quite stable and will be designated “stable”
in due course. If you have a suggestion or would like to help,
do please get in touch via the forums.

Requirements

Anyterm is developed on Linux but there is a good chance that
it will run on other Unix-like operating system.
Mozilla-based browsers and Internet Explorer 6 and 7 work;
Opera 9 also works, and Konqueror was partially functional when last tried.
Feedback about other browsers would be appreciated.

License

The Anyterm code is licensed under the GNU General Public
License
(GPL).

So you are free to use Anyterm in any
application, including commerical use. If you want to distribute
something that includes the Anyterm code, then that must also be
distributed under the same free license.

(Please get in touch if you are unclear about your obligations
under the GPL or if you’d like to discuss other licensing possibilities.)

Support

Anyterm has no warranty. The main support channel for Anyterm is
the online forums. Please do ask in the
forums if you have any problems, questions or suggestions.

If your business would like to deploy Anyterm on your servers,
or add Anyterm-like functionality to your own product, please
get in touch. We may be able to help.

The Author

Anyterm is the work of Phil Endecott. I’m also responsible for
Decimail and QWAZERTY. Contact email here.

Getting started

If you want to install Anyterm, first decide whether you
want to get the old stable version or the much better new
development version. Then just get the code from the download page and follow the
appropriate installation instructions. Alterntively you can just
join my.anyterm.org and get the benefits
without the effort!

0

0
 

COMO hacer bien SITEMAP.xml …Comprendiendo tu sitio web



Formato XML de Sitemaps

Ir a:
Definiciones de la etiqueta XML
Caracteres de escape de entidad
Uso de archivos del índice de Sitemaps
Otros formatos de Sitemap
Ubicación del archivo de Sitemap
Validación de su Sitemap
Ampliación del protocolo Sitemaps
Información para los rastreadores del motor de búsqueda

Este documento describe el esquema XML para el protocolo Sitemap.

El formato del protocolo Sitemap consta de etiquetas XML. Todos los valores de datos de un Sitemap deben incluir caracteres de escape de entidad. El propio archivo debe estar codificado en UTF-8.

El Sitemap debe:

  • Comenzar con una etiqueta de apertura <urlset> y terminar con una de cierre </urlset>.
  • Especificar el espacio de nombres (protocolo estándar) en la etiqueta urlset.
  • Incluir una entrada <url> para cada dirección URL como una etiqueta XML principal.
  • Incluir una entrada secundaria <loc> para cada etiqueta principal <url>.

Las demás etiquetas son opcionales. La compatibilidad de estas etiquetas opcionales puede variar en función del motor de búsqueda. Consulte la documentación específica de cada uno de ellos para obtener información detallada.

Además, todas las direcciones URL de un Sitemap deben proceder de un único host, como www.example.com o store.example.com. Para obtener más detalles, consulte Ubicación de archivos de Sitemap

Ejemplo de un Sitemap en formato XML

El siguiente ejemplo muestra un Sitemap que contiene únicamente una URL y usa todas las etiquetas opcionales. Las etiquetas opcionales están en cursiva.

<?xml version="1.0" encoding="UTF-8"?><
urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
>   <url
>      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url></urlset>

También puede consultar nuestro ejemplo con varias URL.

Definiciones de las etiquetas XML

Las etiquetas XML disponibles se describen a continuación.

Atributo   Descripción
<urlset> obligatorio

Encapsula el archivo y hace referencia al protocolo estándar actual.

<url> obligatorio

Etiqueta principal de cada entrada de URL. Las demás etiquetas son secundarias de esa.

<loc> obligatorio

URL de la página. Esta URL debe comenzar con el protocolo (por ej., http) y acabar con una barra diagonal, si su servidor web así lo requiere. Este valor debe contener menos de 2.048 caracteres.

<lastmod> opcional

Fecha de la última modificación del archivo. Esta fecha debe encontrarse en formato Fecha y hora de W3C. Este formato le permite omitir la parte referente a la hora, si así lo desea, y utilizar AAAA-MM-DD.

Tenga en cuenta que esta etiqueta es independiente de la cabecera “If-Modified-Since (304)” que puede mostrar el servidor y que los motores de búsqueda pueden utilizar la información de ambas fuentes de forma diferente.

<changefreq> opcional

Frecuencia con la que puede cambiar esta página. Este valor proporciona información general a los motores de búsqueda y es posible que no se corresponda exactamente con la frecuencia de rastreo de la página. Valores aceptados:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

El valor “always” (siempre) debe utilizarse para describir documentos que cambian cada vez que se obtiene acceso a ellos. El valor “never” (nunca) debe utilizarse para describir direcciones URL archivadas.

Tenga en cuenta que el valor de esta etiqueta se considera una sugerencia y no una orden. A pesar de que los rastreadores de motores de búsqueda puedan tener en cuenta esta información a la hora de tomar decisiones, pueden rastrear páginas marcadas “hourly” (cada hora) con menor frecuencia de lo que indica la marca, así como rastrear páginas marcadas “yearly” (cada año) con más asiduidad. Asimismo, pueden rastrear periódicamente páginas marcadas “never” (nunca) para poder manejar los cambios inesperados que se produzcan en ellas.

<priority> opcional

La prioridad de esta dirección URL es relativa con respecto a las demás URL de su sitio. Los valores válidos abarcan desde 0,0 a 1,0. Este valor no afecta a la comparación de sus páginas con respecto a las de otros sitios; únicamente permite informar a los motores de búsqueda de las páginas que considera más importantes para los rastreadores.

La prioridad predeterminada de una página es 0,5.

Tenga en cuenta que la prioridad que asigne a la página no suele influir en la posición de sus URL en las páginas de resultados de los motores de búsqueda. Los motores de búsqueda pueden utilizar esta información para elegir entre varias URL del mismo sitio, de modo que puede emplear esta etiqueta para incrementar las probabilidades de que sus páginas más importantes se incluyan en un índice de búsqueda.

Asimismo, tenga en cuenta que la asignación de alta prioridad a todas las URL de su sitio probablemente no le servirá de ayuda, dado que la prioridad es relativa y sólo se utiliza para elegir entre las distintas URL de su sitio.

Volver al principio

Caracteres de escape de entidad

Su archivo de Sitemap debe tener codificación UTF-8; habitualmente puede establecerlo así al guardar el archivo. Al igual que con los archivos XML, los valores de datos (incluidas las URL) deben utilizar caracteres de escape de entidad para los caracteres de la tabla que encontrará más abajo.

Carácter Código de caracteres de escape
Símbolo de unión & &amp;
Comillas simples &apos;
Comillas &quot;
Mayor que > &gt;
Menor que < &lt;

Además, todas las direcciones URL (incluida la de su Sitemap) deben contener caracteres de escape y estar codificadas de modo que el servidor Web en el que se encuentran las pueda leer. No obstante, si emplea cualquier tipo de script, herramienta o archivo de registro para generar sus direcciones URL (cualquier método a excepción de la escritura a mano), esto se suele hacer automáticamente. Asegúrese de que sus direcciones URL se ajustan al estándar RFC-3986 para direcciones URI, al estándar RFC-3987 para direcciones IRI y al estándar XML.

A continuación se incluye un ejemplo de una dirección URL que emplea un carácter no ASCII (ü), así como un carácter que necesita escape de entidad (&):

http://www.example.com/ümlat.php&q=name

A continuación se encuentra la misma URL, con codificación ISO-8859-1 (para su alojamiento en un servidor que utiliza esa codificación) y la URL con caracteres de escape:

http://www.example.com/%FCmlat.php&q=name

A continuación se encuentra la misma URL, con codificación UTF-8 (para su alojamiento en un servidor que utiliza esa codificación) y la URL con caracteres de escape:

http://www.example.com/%C3%BCmlat.php&q=name

A continuación encontrará la misma URL, pero también caracteres de escape de entidad:

http://www.example.com/%C3%BCmlat.php&amp;q=name

Sitemap XML de ejemplo

El siguiente ejemplo muestra un Sitemap en formato XML. El Sitemap en cuestión contiene un número reducido de URL, cada una de las cuales presenta un conjunto diferente de parámetros opcionales.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=12&amp;desc=vacation_hawaii</loc>
      <changefreq>weekly</changefreq>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=73&amp;desc=vacation_new_zealand</loc>
      <lastmod>2004-12-23</lastmod>
      <changefreq>weekly</changefreq>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=74&amp;desc=vacation_newfoundland</loc>
      <lastmod>2004-12-23T18:00:15+00:00</lastmod>
      <priority>0.3</priority>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=83&amp;desc=vacation_usa</loc>
      <lastmod>2004-11-23</lastmod>
   </url>
</urlset>

Volver al principio

Uso de archivos de índice de Sitemap (para agrupar varios archivos de Sitemap)

Puede proporcionar varios archivos de Sitemap, pero cada uno de ellos deberá contener un máximo de 50.000 direcciones URL y no superar los 10 MB (10.485.760 bytes). Si lo desea, puede comprimir sus archivos de Sitemap con gzip para reducir sus requisitos de ancho de banda; no obstante, el archivo de Sitemap comprimido no debe superar los 10 MB. Si desea incluir más de 50.000 direcciones URL, deberá crear varios archivos de Sitemap.

Si proporciona varios Sitemaps, debe enumerarlos todos en un archivo de índice de Sitemap. Los archivos de índice de Sitemap no pueden contener más de 50.000 Sitemaps y no deben superar los 10 MB (10.485.760 bytes), aunque se pueden comprimir. Es posible disponer de más de un archivo de índice de Sitemap. El formato XML de un archivo de índice de Sitemap es muy parecido al formato XML de un archivo de Sitemap.

El archivo de índice de Sitemap debe:

  • Comenzar con una etiqueta de apertura <sitemapindex> y terminar con una de cierre </sitemapindex>.
  • Incluir una entrada <sitemap> para cada Sitemap como una etiqueta XML principal.
  • Incluir una entrada secundaria <loc> para cada etiqueta principal <sitemap>.

La etiqueta opcional <lastmod> también está disponible para archivos de índice de Sitemap.

Nota: Un archivo de índice de Sitemap sólo puede especificar Sitemaps que se encuentren en la misma ubicación que el archivo de índice de Sitemap. Por ejemplo, http://www.susitio.es/sitemap_index.xml puede incluir Sitemaps en http://www.susitio.es, pero no en http://www.ejemplo.es o http://suhost.susitio.es. Igual que ocurre con los Sitemaps, el archivo de índice de su Sitemap debe estar codificado en UTF-8.

Índice XML de Sitemap de ejemplo

El siguiente ejemplo muestra un índice de Sitemap que incluye dos Sitemaps:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.example.com/sitemap1.xml.gz</loc>
      <lastmod>2004-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.example.com/sitemap2.xml.gz</loc>
      <lastmod>2005-01-01</lastmod>
   </sitemap>
</sitemapindex>

Nota: las direcciones URL de Sitemap, como todos los valores de sus archivos XML, deben incluir caracteres de escape de entidad.

Definiciones de etiquetas XML de índice de Sitemap

Atributo   Descripción
<sitemapindex> obligatorio Encapsula información acerca de todos los Sitemaps del archivo.
<sitemap> obligatorio Encapsula información acerca de un Sitemap concreto.
<loc> obligatorio

Identifica la ubicación del Sitemap.

Esta ubicación puede ser un Sitemap, un archivo Atom, un archivo RSS o un archivo de texto.

<lastmod> opcional

Identifica la hora a la que se modificó el Sitemap correspondiente. No es la hora de modificación de alguna de las páginas incluidas en el Sitemap. El valor de la etiqueta lastmod debe encontrarse en formato Fecha y hora de W3C.

Al proporcionar la marca horaria de la última modificación, permite que los rastreadores de motores de búsqueda obtengan únicamente un subconjunto de los Sitemaps del índice, es decir, el rastreador sólo podrá obtener Sitemaps que han sido modificados a partir de una fecha concreta. Este mecanismo de obtención de Sitemap incremental permite un rápido descubrimiento de nuevas URL en sitios de gran tamaño.

Volver al principio

Otros formatos de Sitemap

El protocolo Sitemap permite proporcionar los detalles sobre las páginas a los motores de búsqueda y recomendamos que lo utilice puesto que de este modo puede proporcionar información adicional acerca de las páginas del sitio y no sólo la URL. Sin embargo, además del protocolo XML también admitimos feeds RSS y archivos de texto, que proporcionan información más limitada.

Feed de distribución

Puede proporcionar feeds RSS (Real Simple Syndication) 2.0 o Atom 0.3 o 1.0. Por lo general, si su sitio ya cuenta con un feed de distribución, utilizará únicamente este formato. Tenga presente que este método puede que no permita a los motores de búsqueda conocer todas las URL del sitio, ya que el feed puede que sólo proporcione información sobre las URL recientes, aunque no obstante, los motores de búsqueda pueden utilizar esta información para averiguar sobre otras páginas del sitio durante los procesos normales de rastreo siguiendo los enlaces internos de las páginas del feed. Asegúrese de que el feed se encuentre en el directorio de nivel más alto que desea que rastreen los motores de búsqueda. Los motores de búsqueda extraen la información del feed como sigue:

  • El campo <link> indica la URL
  • campo de fecha de modificación (el campo <pubDate> en feeds RSS y <modified> en feeds Atom) – indica cuándo se modificó la URL por última vez. El uso del campo de fecha de última modificación es opcional.

Archivo de texto

Puede proporcionar un archivo de texto simple que incluya una URL por línea. El archivo de texto debe cumplir las siguientes directrices:

  • El archivo de texto debe contener una URL en cada línea. Las URL no pueden incluir nuevas líneas incrustadas.
  • Debe especificar las URL completas, incluido http://.
  • Cada archivo de texto puede contener un máximo de 50.000 direcciones URL y no superar los 10 MB (10.485.760 bytes). Si su sitio incluye más de 50.000 direcciones URL, puede dividir la lista en varios archivos de texto y agregarlos por separado.
  • El archivo de texto debe utilizar codificación UTF-8. Puede especificarlo cuando guarde el archivo; por ejemplo, en la aplicación Bloc de notas, la opción se encuentra en el menú Codificación del cuadro de diálogo Guardar como.
  • El archivo de texto debe contener exclusivamente la lista de URL.
  • El archivo de texto no debe contener información ni en el encabezado ni en el pie de página.
  • Si lo desea, puede comprimir su archivo de texto de Sitemap con gzip para reducir sus requisitos de ancho de banda.
  • Al archivo de texto le puede poner el nombre que desee. Asegúrese de que sus direcciones URL se ajustan al estándar RFC-3986 para direcciones URI y al estándar RFC-3987 para direcciones IRI
  • Cargue el archivo de texto en el directorio de nivel más alto en el que desee que los buscadores rastreen y asegúrese de que no especifica URL de archivos de texto ubicados en un directorio de nivel superior.

A continuación se muestran unas entradas de ejemplo del archivo de texto.

http://www.example.com/catalog?item=1

http://www.example.com/catalog?item=11

Volver al principio

Ubicación de archivos de Sitemap

La ubicación de un archivo de Sitemap determina el grupo de URL que se pueden incluir en ese Sitemap. Un archivo de Sitemap ubicado en http://ejemplo.es/catalog/sitemap.xml puede incluir URL que empiecen por http://ejemplo.es/catalog/, pero no URL que empiecen por http://ejemplo.es/images/.

Si tiene permiso para cambiar http://ejemplo.org/ruta/sitemap.xml, puede suponer que también tiene permiso para proporcionar información para las URL con el prefijo http://ejemplo.org/ruta/. Los ejemplos de URL que se consideran válidas en http://ejemplo.es/catalogo/sitemap.gz incluyen:

http://example.com/catalog/show?item=23

http://example.com/catalog/show?item=233&user=3453

Las URL que no se consideran válidas en http://ejemplo.es/catalog/sitemap.xml incluyen:

http://example.com/image/show?item=23

http://example.com/image/show?item=233&user=3453

https://example.com/catalog/page1.php

Tenga en cuenta que todas las URL enumeradas en el Sitemap deben utilizar el mismo protocolo (http, en este ejemplo) y residir en el mismo host que el Sitemap. Por ejemplo, si el Sitemap se encuentra en http://www.ejemplo.es/sitemap.xml,no puede incluir URL de http://subdominio.ejemplo.es.

Las URL que no se estiman válidas dejan de tenerse en cuenta. Le recomendamos encarecidamente colocar su Sitemap en el directorio raíz de su servidor web. Por ejemplo, si su servidor web está ubicado en ejemplo.es, el archivo del índice de su Sitemap se encontrará en http://ejemplo.es/sitemap.xml. En determinados casos tal vez necesite crear varios Sitemaps para rutas diferentes, por ejemplo, si los permisos de seguridad dividen el acceso de escritura a diversos directorios.

Si utiliza una ruta con un número de puerto para enviar un Sitemap, debe incluir dicho número de puerto de la ruta en todas las URL que aparecen en el archivo del Sitemap. Por ejemplo, si el Sitemap se encuentra en http://www.ejemplo.es:100/sitemap.xml, todas las URL que aparecen en él deben empezar por http://www.ejemplo.es:100.

Sitemaps y envíos cruzados

Para enviar Sitemaps de varios host desde un único host, tiene que “demostrar” la propiedad de los host cuyas direcciones URL se están enviando en un Sitemap. A continuación se incluye un ejemplo. Supongamos que desea enviar Sitemaps de tres host:

www.host1.com con archivo de Sitemap sitemap-host1.xml
www.host2.com con archivo de Sitemap sitemap-host2.xml
www.host3.com con archivo de Sitemap sitemap-host3.xml

 

Además, desea colocar los tres Sitemaps en un único host: www.sitemaphost.com. Así, las direcciones URL del Sitemap serán:

http://www.sitemaphost.com/sitemap-host1.xml

http://www.sitemaphost.com/sitemap-host2.xml

http://www.sitemaphost.com/sitemap-host3.xml

 

De forma predeterminada, eso provocará un error de “envío cruzado”, ya que está intentando enviar direcciones URL de www.host1.com a través de un Sitemap hospedado en www.sitemaphost.com (y lo mismo ocurre con los otros dos host). Una forma de evitar este error es demostrar que posee (es decir, que tiene autoridad para modificar archivos) www.host1.com. Puede hacerlo mediante la modificación del archivo robots.txt en www.host1.com de modo que apunte al Sitemap de www.sitemaphost.com.

En este ejemplo, el archivo robots.txt en http://www.host1.com/robots.txt incluiría la línea “Sitemap: http://www.sitemaphost.com/sitemap-host1.xml”. Al modificar el archivo robots.txt en www.host1.com y hacer que apunte al Sitemap de www.sitemaphost.com, ha demostrado de forma implícita que posee www.host1.com. En otras palabras, quien controla el archivo robots.txt en www.host1.com confía en que el Sitemap de http://www.sitemaphost.com/sitemap-host1.xml incluye direcciones URL de www.host1.com. Es posible repetir el mismo proceso para los otros dos host.

Ahora puede enviar los Sitemaps de www.sitemaphost.com.

Cuando el archivo robots.txt de un host concreto, por ejemplo http://www.host1.com/robots.txt, apunta a un Sitemap o al índice de un Sitemap de otro host, se espera que todas las direcciones URL de los Sitemaps de destino, como http://www.sitemaphost.com/sitemap-host1.xml, pertenezcan al host al que se apunta. Esto se debe a que, como ya se ha indicado anteriormente, se espera que un Sitemap sólo incluya direcciones URL de un único host.

Volver al principio

Validación de su Sitemap

Los siguientes esquemas XML definen los elementos y atributos que pueden aparecer en su archivo de Sitemap. Puede descargar este esquema desde cualquiera de los siguientes vínculos:

Para Sitemaps: http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd
Para archivos de índice de Sitemap: http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd

Existen varias herramientas que le pueden ayudar a validar la estructura de su Sitemap según este esquema. Puede encontrar una lista de herramientas útiles para XML en las siguientes ubicaciones:

http://www.w3.org/XML/Schema#Tools
http://www.xml.com/pub/a/2000/12/13/schematools.html

Para poder validar su archivo de Sitemap o su archivo de índice de Sitemap en función de un esquema, el archivo XML necesitará cabeceras adicionales, tal y como se muestra a continuación.

Sitemap:

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
         xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      ...
   </url>
</urlset>

Archivo de índice de Sitemap:

<?xml version='1.0' encoding='UTF-8'?>
<sitemapindex xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/siteindex.xsd"
         xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      ...
   </sitemap>
</sitemapindex>

Volver al principio

Ampliación del protocolo Sitemaps

Puede ampliar el protocolo Sitemaps con su propio espacio de nombre. Sólo debe especificar este espacio de nombre en el elemento raíz. Por ejemplo,

<?xml version='1.0' encoding='UTF-8'?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
         xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
         xmlns:example="http://www.example.com/schemas/example_schema"> <!-- namespace extension -->
   <url>
      <example:example_tag>
         ...
      </example:example_tag>
      ...
   </url>
</urlset>

Volver al principio

Información para los rastreadores de motores de búsqueda

Una vez que haya creado el archivo de Sitemap y lo haya colocado en su servidor web, deberá informar a los motores de búsqueda compatibles con el protocolo acerca de su ubicación. Esto se puede realizar:

Los motores de búsqueda podrán obtener su Sitemap y poner las URL a disposición de sus rastreadores.

Envío del Sitemap mediante la interfaz de envío del motor de búsqueda

Para enviar el Sitemap directamente a un motor de búsqueda, que le permitirá recibir información de estado así como los errores de procesamiento, consulte la documentación de los motores de búsqueda correspondientes.

Especificación de la ubicación del Sitemap en el archivo robots.txt

Puede especificar la ubicación del Sitemap utilizando un archivo robots.txt. Para ello, tan solo tiene que añadir la línea siguiente:

Sitemap: http://www.example.com/sitemap.xml

Esta directiva es independiente de la línea user-agent de modo que no tiene importancia el lugar en que se coloca en el archivo. Si tiene un archivo de índice de Sitemap, puede incluir la ubicación de únicamente este archivo. No es necesario que enumere todos los Sitemaps individuales enumerados en el archivo de índice.

Puede especificar más de un archivo de Sitemap por archivo robots.txt.

Sitemap: http://www.example.com/sitemap-host1.xml
Sitemap: http://www.example.com/sitemap-host2.xml

Envío del Sitemap mediante una solicitud HTTP

Para enviar el Sitemap utilizando una solicitud HTTP (sustituya la <URL del motor de búsqueda> con la URL proporcionada por el motor de búsqueda), envíe la solicitud a la URL siguiente:

<searchengine_URL>/ping?sitemap=sitemap_url

Por ejemplo, si su Sitemap está en http://www.ejemplo.es/sitemap.gz, su URL será:

<searchengine_URL>/ping?sitemap=http://www.example.com/sitemap.gz

La URL ha codificado todos los elementos que siguen a /ping?sitemap=:

<searchengine_URL>/ping?sitemap=http%3A%2F%2Fwww.susitio.es%2Fsitemap.gz

Puede enviar la solicitud HTTP utilizando wget, curl o el mecanismo que prefiera. Si la solicitud se procesa correctamente, recibirá un código de respuesta HTTP 200. Si recibe una respuesta diferente, debe volver a enviar la solicitud. El código de respuesta HTTP 200 sólo indica que el motor de búsqueda ha recibido su Sitemap, no que el Sitemap o las URL que incluye sean válidas. Una forma fácil de hacerlo es configurar una tarea automatizada que genere y envíe Sitemaps periódicamente.
Nota: Si proporciona un archivo del índice de Sitemap, bastará con que envíe una solicitud HTTP que incluya la ubicación del archivo del índice de Sitemap; no es necesario que envíe solicitudes para cada Sitemap que se especifica en el índice.

Volver al principio

Exclusión de contenido

El protolo Sitemap permite indicar a los motores de búsqueda qué contenido se quiere indexar. Para indicar a los motores de búsqueda el contenido que no quiere indexar, utilice un archivo robots.txt o la etiqueta meta. Para obtener más información sobre cómo excluir contenido de los motores de búsqueda, visite la página robotstxt.org.

 


0

0
 

Quick Benchmark of Apache, Nginx, Cherokee, Lighttpd (via Arnisoft)


Quick Benchmark of Apache, Nginx, Cherokee, Lighttpd (via Arnisoft)

I was recently searching for some benchmarks of the nginx, Apache, and Cherokee web servers. Frank Arnold (Arnisoft) published this comparison benchmark. This is one of the better comparisons I found that actually used relatively current versions of all three applications. Enjoy!

A short benchmark of this 4 server with dynamic and static content.
All server provides the same functionality like URL rewriting, password protected folders/files etc. but all have their own way to do that
For shared hosting where the user has no access to the server configuration, Apache with the .htaccess file support is a good choice.

I used the benchmark tool from Apache ab -n 50000 -c 20
php-cgi goes over a Unix socket

Dynamic Content (PHP)

Static content (14785 Bytes HTML file)

Server Requests (sec) Transferrate (KB/s)
Apache 2.2.14 2125 11684
Nginx 0.7.65 1734
1861
9436 (php-cgi)
10115 (php-fpm)
Cherokee 1.0.0 2119
2103
11562 (php-cgi)
11454 (php-fpm)
Server Requests (sec) Transferrate (KB/s)
Apache 2.2.14 6768 99593
Lighttpd 2.4.26 15782 213866
Cherokee 1.0.0 7602 111285
Nginx 0.7.65 10912 159828
0

0
 

Como saber si te conviene Cloud Hosting

RealCloud: cuando necesitas un cloud hosting de verdad | dinahosting

Como bien sabes la
realidad de cada proyecto es compleja y cada uno requiere de un
tratamiento especial. ¿Quieres descubrir por qué RealCloud se adapta tan
bien a lo que necesitas? Te explicamos como se comporta en situaciones
concretas.

Cloud hosting para proyectos ambiciosos

Tu web necesita una estructura de alta
calidad y tienes todo el derecho a ser lo optimista que quieras con el
crecimiento que va a tener. ¿Y si te quedas corto a la hora de elegir un
VPS? ¿O por qué correr el riesgo de contratar un servidor demasiado
grande si luego se da la circunstancia de que no lo necesitas? Olvídate
las migraciones, tienes un sistema de cloud hosting que crece contigo:
ve sumando recusos conforme los vayas necesitando. La gente de Tuenti,
al empezar, probablemente ni se imaginaba la que le venía encima,
¿verdad?

Cloud hosting para un proyecto ambicioso

Cloud hosting para páginas con picos cíclicos: horas, días…

¿Hay horas del día o de la semana en que la
actividad crece de forma importante? No hay ningún problema, puedes
programar la asignación para esos ciclos en que necesitas más potencia.
Una tienda online como la de Eroski o Privalia seguro que tienen ciclos
de gran entrada de clientes en determinados momentos del día que se
repiten con una periodicidad muy regular. ¡Sabes lo que tienes que
hacer! Diseñar la estructura básica de servidor que necesitas y combinar
la programación de recursos en previsión de los momentos de gran
demanda. Siempre con la certeza de que, si tus clientes están empeñados
en agotar el stock disponible, el auto-escalador de tu cloud hosting se
encarga de que tu página ni lo note.

Cloud hosting para páginas con picos cíclicos, horas, días...

Cloud hosting para lanzamientos importantes

Es imposible saber cuál es el punto más alto
de recursos que vas a necesitar para portales destinados a un evento
concreto de solo unos días ¿Verdad? La gente se vuelve loca unos días
antes para conseguir las entradas de un concierto de U2 o Miley Cirus
(los indies somos más desapasionados), o está ansiosa por acceder al
streaming de videos una vez ha acabado una convención de bloggers
importante; para luego haber un bajón en las visitas… hasta el año que
viene. Con nuestra solución de cloud hosting, conjugar la programación
de recursos, con un aumento de la sensibilidad del auto-escalador te
asegura estar a pleno rendimiento en los momentos de más actividad.

Para lanzamientos importantes

Cloud hosting para páginas con una evolución irregular

Si trabajas en un portal informativo de
cierto nivel sabes muy bien que el comportamiento de la audiencia es muy
imprevisible, piensa en sites como Menéame o el efecto que éste puede
producir en las páginas de periódicos tradicionales o los blogs más
populares. ¡Cómo vas a saber hacia a dónde te lleva la actualidad
informativa! El auto-escalador de tu cloud hosting se encarga de
prevenir sustos asegurándote el mejor desempeño y estabilidad, con
independencia de la demanda que pueda surgir, pase lo que pase.

Páginas con una evolución irregular

¡Echa cuentas! Cámbiate a RealCloud

Seguro que has estado monitorizando los recursos que necesitas
y cómo es el comportamiento de tu proyecto actual o en el que estás
trabajando. Pues echa cuentas y cámbiate al cloud hosting de verdad:

  • ALTA INMEDIATA: 2 €. ¡Tarifa fija!
  • BLOQUE 1 CPU 2GB de RAM: 0,084€/hora
  • TRANSFERENCIA: 0,04€/GB

¡Echa cuentas! Cámbiate a RealCloud


0

0
 

Real Cloud – TOP cloud hosting

El verdadero cloud hosting se llama RealCloud

RealCloud es mucho más que un simple VPS o Servidor Dedicado. ¿Te imaginas disponer de toda la potencia y recursos que necesita tu proyecto en cada momento?
Es posible ¡al minuto y sin sorpresas! Una plataforma de cloud hosting
muy potente y elástica, diseñada para proyectos que soportan niveles
importantes tanto de visitas como de consumo.

Demo RealCloud

Fíjate qué precios. Sin cuotas mensuales

  • ALTA INMEDIATA

    2€
    ¡Tarifa fija!

  • BLOQUE 1 CPU
    2GB de RAM



    0,084€

    /hora
  • TRANSFERENCIA


    0,04€

    /GB
  • ESPACIO SAS


    0,0014€

    GB/hora
  • ESPACIO SATA


    0,0007€

    GB/hora
  • La tarificación es siempre por minutos.
La potencia que necesitas en cada momento

La potencia que necesitas en cada momento

Levanta al instante cuantos servidores quieras para montar la
estructura que has diseñado: para Apache, para MySQL, balanceadores de
carga… ¡lo que necesites! RealCloud te garantiza una estabilidad del 100% para tu proyecto en la red: desde un mínimo de un núcleo de procesador y 2 GB de RAM hasta un máximo de 15 núcleos y 30 GB de RAM por servidor.

Además nuestro sistema de cloud hosting añade o
retira recursos en caliente, conforme lo vas necesitando y en cuestión
de segundos. Dispones de un modo automatizado que te previene eficazmente de sobrecargas inesperadas,
a la vez que tienes libertad absoluta para programar con tiempo los
recursos que quieres asignar en función del día o la hora. Y lo mejor es
que siempre pagas solo por lo que usas.

Todo el control de RealCloud es tuyo

Tú tienes todo el control sobre el sistema. Cuentas con privilegios de superusuario y dispones de un sistema de monitorización avanzado
para que en ningún momento pierdas detalle de lo que está ocurriendo.
Administra tu proyecto con la tranquilidad y detenimiento que requiere y
planifica con calma los pasos a seguir en el futuro.

Las copias de respaldo están accesibles a golpe de clic, para tu total seguridad: puedes echar mano de ellas y levantar de nuevo tu sistema de cloud hosting en unos pocos segundos. Incluso puedes programar la creación de puntos de restauración con la periodicidad que tú quieras.

Todo el control es tuyo

Diseñado para darte el máximo rendimento

Un cloud hosting diseñado para el máximo rendimento

RealCloud va más allá del espectro que cubren los
VPS más avanzados y los Dedicados de gama alta. No solo por su capacidad
para ajustarse a tus necesidades concretas, sino porque está construido y optimizado sobre hardware de última generación
que te ofrece unas prestaciones y flexibilidad exclusivas y realmente
espectaculares: servidores Dell de gama alta, R410 Quadcore, y discos
SAS, que te recomendamos especialmente. ¿Hay otro cloud hosting que te
ofrezca lo mismo?

0

0
 

Asynchronous Servers in Python

Nicholas Piël » Socket Benchmark of Asynchronous Servers in Python:

There has already been written a lot on the C10K problem and it is known that the only viable option to handle LOTS of concurrent connections is to handle them asynchronously. This also shows that for massively concurrent problems, such as lots of parallel comet connections, the GIL in Python is a non-issue as we handle the concurrent connections in a single thread.

In this post i am going to look at a selection of asynchronous servers implemented in Python.

Asynchronous Server Specs

Since Python is really rich with (asynchronous) frameworks, I collected a few and looked at the following features:

  • What License does the framework have?
  • Does it provide documentation?
  • Does the documentation contain examples?
  • Is it used in production somewhere?
  • Does it have some sort of community (mailinglist, irc, etc..)?
  • Is there any recent activity?
  • Does it have a blog (from the owner)?
  • Does it have a twitter account?
  • Where can i find the repository?
  • Does it have a Thread Pool?
  • Does it provide access to a TCP Socket?
  • Does it have any Comet features?
  • Is it using EPOLL?
  • What kind of server is it? (greenlets, callbacks, generators etc..)

This gave me the following table.

Name Lic. Doc Ex. Prod. Com. Act. Blog Twt Rep. Pool Wsgi Scket Cmet Epoll Test Style
Twisted MIT Yes Yes Yes Huge Yes Lots No Trac Yes Yes Yes No Yes Yes Callback
Tornado Apache Yes Yes F.Feed Yes Yes FB Yes GHub No Lim. Yes No Yes No Async
Orbited MIT Yes Yes Yes Yes Yes Yes No Trac No No Yes Yes Yes Yes Callback
DieselWeb BSD Yes Yes STalk Yes Yes Yes Yes BitB. No Lim. Yes Yes Yes No Generator
MultiTask MIT Some No No No No Yes No Bzr No No No No No No Generator
Chiral GPL2 API No No IRC No No No Trac No Yes Yes Yes Yes Yes Coroutine
Eventlet MIT Yes Yes S. Life Yes Yes Yes No BitB. Yes Yes Yes No Yes Yes Greenlet
FriendlyFlow GPL2 Some One No No No No Yes Ggle No No Yes No No Yes Generator
Weightless GPL2 Yes No Yes No No No Yes SF No No Yes No No Yes Generator
Fibra MIT No No No No No Yes No Ggle No No Yes No No No Generator
Concurrence MIT Yes Yes hyves Yes Yes No No GHub No Yes Yes No Yes Yes Tasklet
Circuits MIT Yes Yes Yes Yes Yes Yes Yes Trac No Yes Yes No No Yes Async
Gevent MIT Yes Yes Yes Yes Yes Yes Yes BitB. No Yes Yes No Yes Yes Greenlet
Cogen MIT Yes Yes Yes No Yes Yes Yes Ggle No Yes Yes No Yes Yes Generator

This is quite a list and i probably still missed a few. The main reasons for using a framework and not implementing something your self is that you hope to be able to accelerate your own development process by standing on the shoulders of other developers. I think it therefore is important that there is documentation, some sort of developers community (mailinglist fe)  and that it is still active. If we take this as a requirement we are left with the following solutions:

  • Orbited / Twisted (callbacks)
  • Tornado (async)
  • Dieselweb (generator)
  • Eventlet (greenlet)
  • Concurrence (stackless)
  • Circuits (async)
  • Gevent (greenlet)
  • Cogen (generator)

To quickly summarize this list; Twisted has been the de-facto standard to async programming with Python. It has an immense community, a wealth of tools, protocols and features. It has grown big and some say it reminds them of shirtless men drinking Jager-bombs complex. This is also one of the biggest reasons why people are looking elsewhere. Recently Facebook released the code of their async. approach called Tornado which is also using callbacks and recent benchmark show that it outperforms Twisted.

A common heard argument against programming with callbacks is that it can get overly complex. A programmatically cleaner approach is to use light-weight threads (imho). This can be achieved by using a different Python implementation; Stackless (such as Concurrence is using) or a plugin for regular python Greenlet (such as Eventlet and Gevent are using). Another approach is to simulate these light-weight threads with Python generators, such as Dieselweb and Cogen are doing.

This should already show that while all these frameworks provide you asynchronous concurrency they do this in each of their own ways. I want to invite you to look at these frameworks as they all have their own code gems. For example, Concurrence has a non-blocking interface to MySQL. Eventlet has a neat thread-pool, Tornado can pre-fork over CPU’s, Gevent offloads HTTP header parsing and DNS lookups to Libevent, Cogen has sendfile support and Twisted probably already has a factory doing exactly what you are planning to do next.

The Ping Pong Benchmark

biba_golic_11In this benchmark i am going to focus on the performance of the framework to listen on a socket and write to incoming connections. The client pings the socket by opening it, the server responds with a ‘Pong!’ and closes the socket. This should be really simple but it is a pain to create something that does this in an asynchronous and non-blocking way from scratch and that is exactly the reason why we are looking at these frameworks. It is all about making our lives easier.

Ok, for this benchmark i am going to use httperf,  a high performance tool that understands the HTTP protocol. If we want httperf to play along in our Ping-Pong benchmark we have to make it understand the ‘PONG!’ response. We can do this by mimicking a HTTP server and have our server respond with:

HTTP/1.0 200 OK
Content-Length: 5

Pong!

instead of just ‘Pong!’. Also, since most default server configurations are not set up to handle a large amount of concurrent requests, we need to make a few adjustments:

  • Raise the per-process file limit by compiling httperf after some adjustments.
  • Raise the per-user file limit, set ‘ulimit -n 10000‘ on both server and client.
  • Raise kernel limit on file handles: ‘echo “128000″ > /proc/sys/fs/file-max’.
  • Increase the connection backlog, ‘sysctl -w net.core.netdev_max_backlog = 2500
  • Raise the maximum connections with ’sysctl -w net.core.somaxconn = 250000

With these settings my Debian Lenny system was ready to hammer the different servers up to rates far beyond the capacity of the frameworks. I used the following command

httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=400 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1

And increased the rate with an interval of 100 from 400 up to 9000 requests per second for a total of 40.000 requests at each interval.

Code

What will now follow, is the implementation of the server side in the different frameworks. It should show the different approaches the frameworks take.

Twisted

Gentlemen start your reactor!

from twisted.internet import epollreactor epollreactor.install()
from twisted.internet.protocol import Protocol, Factory
from twisted.internet import reactor
 
class Pong(Protocol):
 def connectionMade(self):
 self.transport.write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
 self.transport.loseConnection()
 
# Start the reactor
factory = Factory()
factory.protocol = Pong
reactor.listenTCP(8000, factory)
reactor.run()

Tornado

Tornado, does not hide the raw socket interface, which makes this example more lengthy then the others.

import errno
import functools
import socket
from tornado import ioloop, iostream
 
def connection_ready(sock, fd, events):
    while True:
        try:
            connection, address = sock.accept()
        except socket.error, e:
            if e[0] not in (errno.EWOULDBLOCK, errno.EAGAIN):
                raise
            return
        connection.setblocking(0)
        stream = iostream.IOStream(connection)
        stream.write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n", stream.close)
 
if __name__ == '__main__':
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.setblocking(0)
    sock.bind(("", 8010))
    sock.listen(5000)
 
    io_loop = ioloop.IOLoop.instance()
    callback = functools.partial(connection_ready, sock)
    io_loop.add_handler(sock.fileno(), callback, io_loop.READ)
    try:
        io_loop.start()
    except KeyboardInterrupt:
        io_loop.stop()
        print "exited cleanly"

Dieselweb

While this example is beautifully small, i do not really enjoy the generator approach which sprinkles ‘yield’ all over the place.

from diesel import Application, Service
 
def server_pong(addr):
    yield "HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n"
 
app = Application()
app.add_service(Service(server_pong, 8020))
app.run()

Circuits

I think the Circuit code is the most beautiful of them all, very elegent.

from circuits.net.sockets import TCPServer
 
class PongServer(TCPServer):
 
    def connect(self, sock, host, port):
        self.write(sock, 'HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n')
        self.close(sock)
 
PongServer(('localhost', 8050)).run()

Eventlet

The Eventlet uses a Greenlet approach.

from eventlet import api
 
def handle_socket(sock):
    sock.makefile('w').write("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
    sock.close()
 
server = api.tcp_listener(('localhost', 8030))
while True:
    try:
        new_sock, address = server.accept()
    except KeyboardInterrupt:
        break
    # handle every new connection with a new coroutine
    api.spawn(handle_socket, new_sock)

Gevent

Gevent is presented as a rewrite of eventlet focussing on performance.

import gevent
from gevent import socket
 
def handle_socket(sock):
    sock.sendall("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
    sock.close()
 
server = socket.socket()
server.bind(('localhost', 8070))
server.listen(500)
while True:
    try:
        new_sock, address = server.accept()
    except KeyboardInterrupt:
        break
    # handle every new connection with a new coroutine
    gevent.spawn(handle_socket, new_sock)

Concurrence

Concurrence uses the Tasklet approach, it can be run under Greenlet and under Stackless Python. In this benchmark there was not really any performance difference between the two different engines.

from concurrence import dispatch, Tasklet
from concurrence.io import BufferedStream, Socket
 
def handler(client_socket):
    stream = BufferedStream(client_socket)
    writer = stream.writer
    writer.write_bytes("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
    writer.flush()
    stream.close()
 
def server():
    server_socket = Socket.new()
    server_socket.bind(('localhost', 8040))
    server_socket.listen()
 
    while True:
        client_socket = server_socket.accept()
        Tasklet.new(handler)(client_socket)
 
if __name__ == '__main__':
    dispatch(server)

Cogen

Cogen, uses the generator approach as well.

import sys
 
from cogen.core import sockets
from cogen.core import schedulers
from cogen.core.coroutines import coroutine
 
@coroutine
def server():
    srv = sockets.Socket()
    adr = ('0.0.0.0', len(sys.argv)>1 and int(sys.argv[1]) or 1200)
    srv.bind(adr)
    srv.listen(500)
    while 1:
        conn, addr = yield srv.accept()
        fh = conn.makefile()
        yield fh.write("HTTP/1.0 200 OK\r\nContent-Length: 12\r\n\r\nHello World!\r\n")
        yield fh.flush()
        conn.close()
 
m = schedulers.Scheduler()
m.add(server)
m.run()

Results

02000400060008000
02000400060008000Request rate

Succesful Connection Rate

on an increasing amount of requests (more is better)

  • circuits
  • concurrence
  • diesel
  • eventlet
  • tornado
  • twisted
  • gevent
  • cogen

Highcharts.com

The first graph clearly shows at which connection rate (on the horizontal axis) the successful connection rate starts to degrade. It shows a huge difference between the best performer; Tornado with 7400 requests per second and the worst, Circuits with 1400 requests per second (which doesn’t use EPOLL). This connection rate was sustained for at least 40.000 requests. We can see that, when the hammering of the server continues beyond rates the server can handle, the performance drops. This is caused by connection errors or timeouts.

02000400060008000
0100200300400Response Time (ms)

Response Time

on an increasing amount of requests (less is better)

  • circuits
  • concurrence
  • diesel
  • eventlet
  • tornado
  • twisted
  • gevent
  • cogen

Highcharts.com

This graph shows the response time, it is clearly visible that once the maximum connection rate has been reached the overal response time starts to increase.

02000400060008000
0100200300400Error Rate

Error Rate

on an increasing amount of requests (less is better)

  • circuits
  • concurrence
  • diesel
  • eventlet
  • tornado
  • twisted
  • gevent
  • cogen

Highcharts.com

The last graph shows the amount of errors, ie no return of a 200 detected by httperf. We can see a correlation between the performance of the server and the returned errors at a given request rate. The performing servers return less overall errors. There is however, one exception. Cogen was able to return ALL its requests successfully no matter how hard it was hammered. It is therefore not visible in this graph. This is interesting, at 9000 requests per second it was still able to answer all requests. However, the average connection time (from socket open till socket close) was about 7 seconds meaning that Cogen was serving about 28000 concurrent connections somewhat at reduced performance but not dropping them.

Notes

This post should make it clear that Python has a rich set of options toward asynchronous programming. All tested frameworks show great performance. I mean, even Circuits results with 1300 requests per second isn’t too bad. Tornado really blew me away with its performance at 7400 requests per second. But if i had to choose a favorite i would  probably go with Gevent, i am really digging its greenlet style.

The clean Greentlet / Stackless style is really cool, especially since Stackless Python is keeping up nowadays with CPython. There was some talk on a mailing list about Gevent running on Stackless. The concurrence framework already runs on Stackless and can thus be a great option already if you are looking for specific features of Stackless Python such as tasklet-pickling.

I want to make clear that this test only shows  how these frameworks perform at a relatively simple task. It could be that when more stuff is going on in the background the results will change. However, I feel that this benchmark is a great indicator of how each frameworks handles a socket connection.

In the coming days I plan to investigate this some more. I will also check out  how these Python frameworks stack up against its equivalents in different languages, fe Ape, CometD, NodeJS. Stay tuned!

71 Responses to “Asynchronous Servers in Python”

  1. Social comments and analytics for this post…

    This post was mentioned on Twitter by Nichol4s: New post: Asynchronous Servers in Python http://bit.ly/8UHKhK #in…

  2. Vadim S. says:

    Really nice benchmarks! Thanks for you work. It is very valuable.

  3. Lenni says:

    Nice comparison!

    I however found it difficult to read the graphs and tell which line corresponds to which framework. The indicators in the legend are too small.

  4. Henk Punt says:

    Congrats, great benchmark!,

    Some remarks (me being the author of Concurrence :-)

    You list the Concurrence license as ‘Hyves’, but I think it should be MIT (at least that is the intention…, why did you think otherwise?).

    Also in your matrix you did not put an entry for ‘automated tests’. I think that should also count when considering frameworks, e.g. how mature is their developement process. For instance Twisted has a very large and comprehensive test-suite, and this is also something I strive for with Concurrence.

    It would also be nice if you mentioned memory usage for the various frameworks when having many connections at the same time. The stable version of Concurrence you tested for instance has a problem with using more memory than needed (current trunk version has that fixed).

    • Hi Henk,

      Thanks for the remarks and opening up concurrence! I was not able to find any known license on the concurrence website or the repo. I only found this License on Github. As I am not that versed in the license-business I just named it Hyves to be sure. But since you say it is supposed to be MIT, i will update the matrix. Btw, this matrix does have a column ‘test’. But I agree that this doesn’t really say much.

      Benchmarking a lot of different frameworks is hard and i cut some corners here and there. Monitoring memory usage is one of them, maybe i’ll do this in the upcoming cross-language benchmark.

    • Henk, I’m checking out Concurrence now. I like that the buffer management per client is clearly coded. I read the commit history and saw “buffer sharing” reduces memory. What does that really mean though?

  5. Thank you, great post but… where is Kamaelia ?
    http://www.kamaelia.org

    Best Regards

  6. dh1r says:

    event driven != asynch*

    nice graphs.

    *posix

  7. === popurls.com === popular today…

    yeah! this story has entered the popular today section on popurls.com…

  8. Event driven programming with greenlet is great. I’ve written about how to build a webserver using python’s BaseHTTPServer and coroutines at my blog.

    http://erik.gorset.no/2009/12/building-comet-enabled-http-server-in.html

  9. testibus says:

    thanks, great work!

  10. Greg says:

    Was surprised to see Tornado do so well.

  11. a says:

    What about the good old asyncore based medusa?
    http://svn.zope.org/Zope/trunk/src/ZServer/medusa/

  12. [...] Nicholas Piël » Socket Benchmark of Asynchronous Servers in Python (tags: python concurrency benchmark server network webdev library blog) [...]

  13. Absolutely excellent post. It’s extremely gratifying to see so much interest in async/coroutine network servers. Since I started using Twisted in 2000 there has been little understanding or acceptance of this technique, so it’s extremely gratifying to see all the interest it has been getting lately.

    I agree that benching medusa and kamaelia would be excellent.

    One little nit, you did not specify the listen backlog parameter in either the gevent or eventlet cases. Specifying a large backlog might help reduce the error rate.

  14. Michael Fischer says:

    You didn’t post your server specs. How much RAM, how many CPUs (and what speed), and how much memory? Please also post the memory consumption (RSS/VSZ) of each server.

    Also, it would have been useful to rate for each module the quality of documentation (Twisted’s, for example, exists but is abysmal), whether it supports generic TCP connections easily, and whether it supports SSL and peer certificate authentication.

    • The benchmark was run on a pretty old MacBook Pro 2.2Ghz core duo with 4 gigs of ram running Debian Lenny.

      I will look at the memory consumption in the upcoming cross-language benchmark.

      Concerning your other ‘requests’, a qualitative comparison of documentation is really hard, I have been focussing mainly on the framework acting as some sort of Comet / Websocket server and in that situation I don’t think SSL support is an issue as i imagine that in a production setting these daemons would be sitting behind a proxy anyway.

      • Michael Fischer says:

        The reason I asked about generic TCP socket handling is that the title of your post is “Socket Benchmark of Asychronous Servers in Python,” not “Socket Benchmark of Asychronous HTTP Servers in Python.” There is a significant difference between the two.

        Also, SMP scalability is very important, which I why I asked about your server specs. Knowledge of a rate on 1 CPU is interesting (assuming there is no latency in request processing), but not very interesting if it doesn’t scale somewhat linearly with the number of CPUs in the system.

        SSL is important, for Comet or otherwise. Proxies obscure remote IP addresses and are best avoided if you care about DOS attack mitigation.

  15. Hey there

    Great post: I linked to it from my related question on stackoverflow (http://stackoverflow.com/questions/1824418/a-clean-lightweight-alternative-to-pythons-twisted) as I thought it might be useful for anyone landing there (seems to be a popular one from the number of votes).

    One point to note is that some of these frameworks are not fully portable. I looked at a lot of them when trying to decide what to use as a replacement for tornado (not portable) and found this to be an issue. It might be worth adding an extra column to the table at the top.

    Thanks for taking the time to do these tests though.

    Jamie

    • Yes, well i am mainly interested in Linux (and thus Epoll) but i might add that to this or the upcoming cross-language benchmark.

      In this case all frameworks that use Libevent (Gevent fe) should work on Windows.

      Thanks for adding a link from stackoverflow.com

  16. Thomas Hervé says:

    Making sure the backlog is the same everywhere would be nice. For example, you’re specifying 5000 in the Tornado code, whereas the Twisted default value is 50. Passing backlog=5000 to reactor.listenTCP would fix that.

  17. Excellent!

    Good to see Twisted drop down.

    Sad to see eventlet showing more errors than gevent.

  18. slav0nic says:

    what about basic asyncore ? )

  19. Ryan Williams says:

    Great roundup! The Eventlet benchmark could benefit from a little tweaking. I used the following code and achieved significantly better performance.

    from eventlet.green import socket
    from eventlet import api,hubs
    hubs.use_hub("pyevent")
     
    def handle_socket(sock):
        sock.sendall("HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n")
        sock.close()
     
    server = socket.socket()
    server.bind(('localhost', 8030))
    server.listen(500)
    while True:
        try:
            new_sock, address = server.accept()
        except KeyboardInterrupt:
            break
        # handle every new connection with a new coroutine
        api.spawn(handle_socket, new_sock)

    The major difference here is that makefile() and its consequent object creation cost are not called. This alone makes a big difference, and has the side benefit of being more readable and closer in spirit to the other tests.

    Also, this code uses pyevent for event dispatching. Pyevent is disabled by default within Eventlet because it is not compatible with threads, but in this sort of test, it provides better performance.

    • Ryan Williams says:

      Oh wow that is bad formatting. Sorry, I hope you get the idea. Thanks again for doing this great roundup.

      • I fixed the markup, you can enclose code between

        [ + py + ] …code here… [ + /py + ]

        tags, at least on this blog. However, most other WordPress blogs support the [/code] tag, just FYI. I will try the sock.sendall approach + py events somewhere after xmas.

        Thanks for your suggestion!

  20. [...] Nicholas Piël » Socket Benchmark of Asynchronous Servers in Python (tags: python asynchronous benchmark concurrency server performance network twisted) [...]

  21. Denis says:

    I’d like to point out that gevent’s wsgi server is build on top of libevent-http module. As such it should be more efficient than most pure Python alternatives.

  22. [...] Benchmark of Asynchronous Servers in Python [...]

  23. Amir says:

    Great post, thanks a lot! Also looking forward to any updates or followups.

    By the way, Chiral is GPLv2:
    http://chiral.j4cbo.com/trac/changeset?new=51%40%2F&old=50%40%2F

    P.S. it would also be appreciated if anybody who follows up with a comparison across languages includes Erlang’s OTP as well.

  24. SMiGL says:

    Good post! Thanks

  25. Amir says:

    Actually, it appears as though Circuits does have epoll support. You only need to provide the right class from circuits.net.pollers as a kwarg to circuits.net.Server():
    http://trac.softcircuit.com.au/circuits/browser/circuits/net/sockets.py#L394

  26. I feel this is largely misinforming to someone who doesn’t understand how the frameworks work as the approaches and server architectures wildly differ:

    For example, tornado would fork in two processes and that’s the only reason it won the benchmark. It’s a nice feature but one could argue that this should be handled elsewhere (loadbalancer/clustering).

    gevent is largely written in C (well, pyrex apparently) and that’s the reason it’s the second fastest (it would clearly win if tornado would run in single process mode).

    Also, the server code approaches differ, some start coroutines/greenlets/whateverlets on a new connection and some don’t. This is a very important difference and the results can differ much depending on this aspect.

    The error rates worry me – I think it’s a good exercise to find out why they happen.
    I would add portability and test code coverage to the comparison table.

    • The benchmarked tornado does NOT prefork and thus uses only one process. This functionality has only been added very recently to the trunk.

      I don’t really think it is an issue how a certain framework has been implemented. The end user only cares about the ease of use and its performance. Not wether the heavy lifting is being done by an external library (such as libevent) or an optimized inner loop.

      I agree with you that there is some inconsistency with the functionality of the different implementations. A more thorough test could make those irrelevant. Ie, instead of a single ping-pong make the server respond to multiple ‘ping!’ requests by a single client, each fired with a certain interval.

      • True that.

        However, I noticed that you use smaller backlog and response for the cogen sample. Also, if you spawn a coroutine for each new connection and avoid makefile you can get a bit more response rate out of it. Eg:

        import sys
        from cogen.core import sockets
        from cogen.core import schedulers
        from cogen.core.coroutines import coroutine

        @coroutine
        def server():
        srv = sockets.Socket()
        adr = (‘0.0.0.0′, len(sys.argv)>1 and int(sys.argv[1]) or 1200)
        srv.bind(adr)
        srv.listen(5000)
        while 1:
        conn, addr = yield srv.accept()
        m.add(handler, args=(conn,))

        @coroutine
        def handler(conn):
        yield conn.send(“HTTP/1.0 200 OK\r\nContent-Length: 5\r\n\r\nPong!\r\n”)
        conn.close()

        m = schedulers.Scheduler()
        m.add(server)
        m.run()

  27. Landreville says:

    Would using asynchronous code in twisted make a difference. I just notice that in diesel you are using non-blocking code by using the generator syntax where in the Twisted code you are using blocking code instead of returning a deferred or using a generator. Does this make a difference in the tests?

    Just to note, twisted also can use the generator syntax.

    • Denis says:

      Twisted example is asynchronous.

      transport.write() call merely buffers the data. Actual sending happens when the descriptor is ready for writing.

  28. I tried Circuits (on a Linode 360 running Linux 2.6) because its component architecture looked interesting and the example code was clean.

    $ sysctl net.core.somaxconn net.core.netdev_max_backlog
    net.core.somaxconn = 250000
    net.core.netdev_max_backlog = 2500

    $ cat /proc/sys/fs/file-max
    1001000

    $ ulimit -n
    1001000

    I changed port to 10000 in Circuits example code to match your httperf command-line.

    I also followed the instructions for installing a custom httperf. Be careful to use the proper httperf if you already had it installed (e.g. by default it installs to /usr/local/bin/).

    I also made sure that Circuits was using epoll:

    PongServer((‘localhost’, 10000), poller=EPoll, backlog=500).run()

    $ httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=400 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1

    httperf: warning: open file limit > FD_SETSIZE; limiting max. # of open files to FD_SETSIZE

    Maximum connect burst length: 3

    Total: connections 40000 requests 33288 replies 33216 test-duration 204.995 s

    Connection rate: 195.1 conn/s (5.1 ms/conn, <=16450 concurrent connections)
    Connection time [ms]: min 0.1 avg 12921.3 max 45608.6 median 0.5 stddev 20278.0
    Connection time [ms]: connect 12973.1
    Connection length [replies/conn]: 1.000

    Request rate: 162.4 req/s (6.2 ms/req)
    Request size [B]: 62.0

    Reply rate [replies/s]: min 0.0 avg 162.0 max 400.0 stddev 195.6 (41 samples)
    Reply time [ms]: response 16.2 transfer 0.0
    Reply size [B]: header 38.0 content 5.0 footer 0.0 (total 43.0)
    Reply status: 1xx=0 2xx=33216 3xx=0 4xx=0 5xx=0

    CPU time [s]: user 37.28 system 167.49 (user 18.2% system 81.7% total 99.9%)
    Net I/O: 16.6 KB/s (0.1*10^6 bps)

    Errors: total 6784 client-timo 6784 socket-timo 0 connrefused 0 connreset 0
    Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

    No errors (all 2xx).

  29. Sorry, you also need: from circuits.net.pollers import EPoll

    I also tried the same test on a monster 2x quad core xeon 5520

    httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=400 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1
    Maximum connect burst length: 1

    Total: connections 40000 requests 40000 replies 40000 test-duration 100.001 s

    Connection rate: 400.0 conn/s (2.5 ms/conn, FD_SETSIZE; limiting max. # of open files to FD_SETSIZE

    Maximum connect burst length: 40

    Total: connections 40000 requests 39975 replies 20859 test-duration 114.926 s

    Connection rate: 348.0 conn/s (2.9 ms/conn, <=26959 concurrent connections)
    Connection time [ms]: min 233.0 avg 6634.8 max 72470.5 median 3327.5 stddev 9396.5
    Connection time [ms]: connect 4989.3
    Connection length [replies/conn]: 1.000

    Request rate: 347.8 req/s (2.9 ms/req)
    Request size [B]: 62.0

    Reply rate [replies/s]: min 0.0 avg 189.6 max 1384.3 stddev 392.4 (22 samples)
    Reply time [ms]: response 3230.8 transfer 0.0
    Reply size [B]: header 38.0 content 5.0 footer 0.0 (total 43.0)
    Reply status: 1xx=0 2xx=20859 3xx=0 4xx=0 5xx=0

    CPU time [s]: user 2.43 system 112.48 (user 2.1% system 97.9% total 100.0%)
    Net I/O: 28.7 KB/s (0.2*10^6 bps)

    Errors: total 19141 client-timo 583 socket-timo 0 connrefused 0 connreset 18558
    Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

  30. Your comment system ate my last comment and spit it back out in 2 parts.

    httperf –hog –timeout=60 –client=0/1 –server=localhost –port=10000 –uri=/ –rate=4000 –send-buffer=4096 –recv-buffer=16384 –num-conns=40000 –num-calls=1

    httperf: warning: open file limit > FD_SETSIZE; limiting max. # of open files to FD_SETSIZE

    Maximum connect burst length: 40

    Total: connections 40000 requests 39975 replies 20859 test-duration 114.926 s

    Connection rate: 348.0 conn/s (2.9 ms/conn, <=26959 concurrent connections)
    Connection time [ms]: min 233.0 avg 6634.8 max 72470.5 median 3327.5 stddev 9396.5
    Connection time [ms]: connect 4989.3
    Connection length [replies/conn]: 1.000

    Request rate: 347.8 req/s (2.9 ms/req)
    Request size [B]: 62.0

    Reply rate [replies/s]: min 0.0 avg 189.6 max 1384.3 stddev 392.4 (22 samples)
    Reply time [ms]: response 3230.8 transfer 0.0
    Reply size [B]: header 38.0 content 5.0 footer 0.0 (total 43.0)
    Reply status: 1xx=0 2xx=20859 3xx=0 4xx=0 5xx=0

    CPU time [s]: user 2.43 system 112.48 (user 2.1% system 97.9% total 100.0%)
    Net I/O: 28.7 KB/s (0.2*10^6 bps)

    Errors: total 19141 client-timo 583 socket-timo 0 connrefused 0 connreset 18558
    Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0

  31. I take that back – there are a number of timeouts and connreset errors. Interesting.

    • Thanks for your remarks though.

      The client-timeout errors are caused by the timeout (60seconds in your case) set on the httperf command line. You can make most of these go away by increasing the timeout, but it is interesting to see the difference between all the frameworks. I am not really sure what causes the connection reset errors.

      I am still planning to post an update to this benchmark, but i will need some extra machines for that.

      ps,
      I am curious how the maximum / optimum request rates of both machines differ.

0

0
 

Benchmark of Python WSGI Servers

Nicholas Piël » Benchmark of Python Web Servers:

Benchmark of Python WSGI Servers

Nicholas Piël | March 15, 2010

It has been a while since the Socket Benchmark of Asynchronous server. That benchmark looked specifically at the raw socket performance of various frameworks, which was being benchmarked by doing a regular HTTP request against the TCP server. The server itself was dumb and did not actually understand the headers being send to it. In this benchmark I will be looking at how different WSGI servers perform at exactly that task; the handling of a full HTTP request.

I should immediately start with a word of caution. I tried my best to present an objective benchmark of the different WSGI servers. And I truly believe that a benchmark is one of the best methods to present an unbiased comparison. However, a benchmark measures the performance on a very specific domain and it could very well be that this domain is slanted towards certain frameworks. But, if we keep that in mind we can actually put some measurements behind all those ‘faster than’ or ‘lighter than’ claims you will find everywhere. It is my opinion that such comparison claims without any detailed description of how they are measured are worse than a biased but detailed benchmark. The specific domain of this benchmark is, yet again, the PingPong benchmark as used earlier in my Async Socket Benchmark. However, there are some differences:

  • We will fire multiple requests over a single connection, when possible, by using a HTTP 1.1 keepalive connection
  • It is a distributed benchmark with multiple clients
  • We will use an identical WSGI application for all servers instead of specially crafted code to return the reply
  • We expect the server to understand our HTTP request and reply with the correct error codes

This benchmark is a conceptually simple one and you could claim that this is not representable for most common web application which rely heavily on blocking database connections. I agree with that to some extent as this is mostly the case. However, the push towards HTML5’s websockets and highly interactive web applications will require servers that are capable to serve lots of concurrent connections with low latency.

The benchmark

We will run the following WSGI application ‘pong.py’ on all servers.

def application(environ, start_response):
    status = '200 OK'
    output = 'Pong!'
 
    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return [output]

We will also tune both client and server by running the following commands. This basically enables the server to open LOTS of concurrent connections.

echo “10152 65535″ > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w fs.file-max=128000
sysctl -w net.ipv4.tcp_keepalive_time=300
sysctl -w net.core.somaxconn=250000
sysctl -w net.ipv4.tcp_max_syn_backlog=2500
sysctl -w net.core.netdev_max_backlog=2500
ulimit -n 10240

The server is a virtual machine with only one assigned processor. I have explicitly limited the amount of available processors to make sure that it is a fair comparison. Whether or not the server scales over multiple processors is an interesting and useful feature but this is not something I will measure in this benchmark. The reason for this is that it isn’t that difficult to scale up your application to multiple processors by using a reverse proxy and multiple server processes (this can even be managed for you by special applications such as Spawning or Grainbows). The server and clients run Debian Lenny with Python 2.6.4 on the amd64 architecture. I made sure that all WSGI servers have a backlog set of at least 500 and that (connection/error) logging is disabled, when this was not directly possible from the callable I modified the library. The server and the clients have 1GB of ram.

I benchmarked the HTTP/1.0 request rate of all server and the HTTP/1.1 request rate on the subset of servers that support pipelining multiple requests over a single connection. While the lack of HTTP 1.1 keepalive support is most likely a non issue in current deployment situations I expect it to become an important feature in applications that depend heavily on low latency connections. You should think about comet-style web applications or applications that use HTML5 websockets.

I categorize a server as HTTP/1.1 capable by its behaviour, not by its specs. For example the Paster server says that it has some support for HTTP 1.1 keep alives but I was unable to pipeline multiple requests. This reported bug might be relevant to this situation and might apply to some of the other “HTTP 1.0 Servers”.

The benchmark will be performed by running a recompiled httperf (which bypasses the static compiled file limit in the debian package) on 3 different specially setup client machines. To initialize the different request rates and aggregate the results I will use a tool called autobench. Note: this is not ApacheBench (ab).

The command to benchmark HTTP/1.0 WSGI servers is:

httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=1

And the command for HTTP/1.1 WSGI servers is:

httperf –hog –timeout=5 –client=0/1 –server=tsung1 –port=8000 –uri=/ –rate=<RATE> –send-buffer=4096 –recv-buffer=16384 –num-conns=400 –num-calls=10

The Contestants

Python is really rich with WSGI servers, i have made a selection of different servers which are listed below.

Name Version http 1.1 Flavour Repo. Blog Community
Gunicorn 0.6.4 No processor/thread GIT ? #gunicorn
uWSGI Trunk (253) Yes processor/thread repo ? Mailing List
FAPWS3 0.3.1 No processor/thread GIT William Os4y Google Groups
Aspen 0.8 No processor/thread SVN Chad Whitacre Google Groups
Mod_WSGI 3.1 Yes processor/thread SVN Graham Dumpleton Google Groups
wsgiref Py 2.6.4 No processor/thread SVN None Mailing List
CherryPy 3.1.2 Yes processor/thread SVN Planet CherryPy Planet, IRC
Magnum Py 0.2 No processor/thread SVN Matt Gattis Google Groups
Twisted 10.0.0 Yes processor/thread SVN Planet Twisted Community
Cogen 0.2.1 Yes callback/generator SVN Maries Ionel Google Groups
GEvent 0.12.2 Yes lightweight threads Mercurial Denis Bilenko Google Groups
Tornado 0.2 Yes callback/generator GIT Facebook Google Groups
Eventlet 0.9.6 Yes lightweight threads Mercurial Eventlet Mailinglist
Concurrence tip Yes lightweight threads GIT None Google Groups

Most of the information in this table should be rather straightforward, I specify the version benchmarked and whether or not the server has been found capable of HTTP 1.1. The flavour of the server specifies the concurrency model the server uses and I identify 3 different flavours:

Processor / Thread model

The p/t model is the most common flavour. Every requests runs in its own cleanly separated thread. A blocking request (such as a synchronous database call or a function call in a C extension) will not influence other requests. This is convenient as you do not need to worry about how everything is implemented, but it does come at a price. The maximum amount of concurrent connections is limited by your number of workers or threads and this is known to scale badly when you have the need for lots of concurrent users.

Callback / Generator model

The callback/generator model handles multiple concurrent connections in a single thread, thereby removing the thread barrier. A single blocking call will block the whole event loop however and has to be prevented. The servers that have this flavour usually provide a threadpool to integrate blocking calls in their async framework or provide alternative non-blocking database connectors. In order to provide flow control this flavour uses callbacks or generators. Some think that this is a beautiful way to create a form of event driven programming others think that it is snake pit that quickly changes your clean code to an entangled mess of callbacks or yield statements.

Lightweight Threads

The lightweight flavour uses greenlets to provide concurrency. This also works by providing concurrency from a single thread but in a less obtrusive way then with the callbacks or generator approach. But of course one has to be careful with blocking connections as this will stop the event loop. To prevent this from happening, Eventlet and Gevent can monkeypatch the socket module to stop it from blocking so when you are using a pure python database connector this should never block the loop. Concurrence provides an asynchronous database adapter.

Implementation specifics for each WSGI server

Aspen

Ruby might be full with all kinds of rockstar programmers (whatever that might mean) but if i have to nominate just one Python programmer with some sort of ‘rockstar award’ i would definitely nominate Chad Whitacre. Its not only the great tools he created; Testosterone, Aspen, Stephane. But mostly how he promotes them with the most awesome screencasts i have ever seen.

Anyway, Aspen is a neat little Web server which is also able to serve WSGI applications. It can be easily installed with ‘pip install aspen’ and uses a special directory structure for configuration and if you want more information i am going to point you to his screencasts.

CherryPy

CherryPy is actually an object oriented Python framework but features an excellent WSGI server. Installation can be done with a simple ‘pip install cherrypy’. I ran the following script to test out the performance of the WSGI server:

from cherrypy import wsgiserver
from pong import application
 
# Here we set our application to the script_name '/'
wsgi_apps = [('/', application)]
 
server = wsgiserver.CherryPyWSGIServer(('0.0.0.0', 8070), wsgi_apps, request_queue_size=500,     server_name='localhost')
 
if __name__ == '__main__':
    try:
        server.start()
    except KeyboardInterrupt:
        server.stop()

Cogen

The code to have Cogen run a WSGI application is as follows:

from cogen.web import wsgi
from cogen.common import *
from pong import application
 
m = Scheduler(default_priority=priority.LAST, default_timeout=15)
server = wsgi.WSGIServer(
            ('0.0.0.0', 8070),
            application,
            m,
            server_name='pongserver')
m.add(server.serve)
try:
    m.run()
except (KeyboardInterrupt, SystemExit):
    pass

Concurrence

Concurrence is an asynchronous framework under development by Hyves (you might call it the Dutch Facebook) built upon Libevent (I used the latest stable version 1.4.13), I fired up the pong application as follows:

from concurrence import dispatch
from concurrence.http import WSGIServer
from pong import application
server = WSGIServer(application)
# Concurrence has a default backlog of 512
dispatch(server.serve(('0.0.0.0', 8080)))

Eventlet

Eventlet is a full featured asynchronous framework which also provides WSGI server functionality. It is in development by Linden Labs (makers of Second Life). To run the application I used the following code:

import eventlet
from eventlet import wsgi
from pong import application
wsgi.server(eventlet.listen(('', 8090), backlog=500), application, max_size=8000)

FAPWS3

FAPWS3 is a WSGI server build around the LibEV library (I used version 3.43-1.1). When LibEV has been installed, FAPWS can be easily installed with pip. The philosophy behind FAPWS3 is to stay the simplest and fastest webserver. The script I used to start up the WSGI application is as follows:

import fapws._evwsgi as evwsgi
from fapws import base
from pong import application
 
def start():
    evwsgi.start("0.0.0.0", 8080)
    evwsgi.set_base_module(base)
 
    evwsgi.wsgi_cb(("/", application))
 
    evwsgi.set_debug(0)
    evwsgi.run()
 
if __name__=="__main__":
    start()

Gevent

Gevent is one of the best performing Async frameworks in my previous socket benchmark. Gevent extends Libevent and uses its HTTP server functionality extensively. To install Gevent you need Libevent installed after which you can pull in Gevent with PIP.

from gevent import wsgi
from pong import application
wsgi.WSGIServer(('', 8088), application, spawn=None).serve_forever()

The above code will run the pong application without spawning a Greenlet on every request. If you leave out the argument ’spawn=None’ Gevent will spawn a Greenlet for every new request.

Gunicorn

Gunicorn stands for ‘Green Unicorn’, everybody knows that a unicorn is a mix of the the awesome narwhal and the magnificent pony the green does however have nothing to do with the great greenlets as it really has a threaded flavour. Installation is easy and can be done with a simple ‘pip install gunicorn’ Gunicorn provides you with a simple command to run wsgi applications, all I had to do was:

gunicorn -b :8000 -w 1 pong:application

Update: I had some suggestions in the comment section that using a single worker and having a client connect  to the naked server is not the correct way to work with Gunicorn. So I took their suggestions and moved Gunicorn behind NGINX and increased the worker count to the suggested number of workers, 2*N+1 where N is 1 which makes 3. The result of this is depicted in the graphs as gunicorn-3w.

The run Gunicorn with more workers can be done such as:

gunicorn -b unix:/var/nginx/uwsgi.sock -w 3 pong:application

MagnumPy

MagnumPy has to be the server with the most awesome name. This is still a very young project but its homepage is making some strong statements about its performance so it is worth testing out. It does not feel as polished as the other contestants and installing is basically pushing the ‘magnum’ directory on your PYTHONPATH edit ‘./magnum/config.py’ after which you can start the server by running ‘./magnum/serve.py start’

#config.py
import magnum
import magnum.http
import magnum.http.wsgi
from pong import application
 
WORKER_PROCESSES = 1
WORKER_THREADS_PER_PROCESS = 1000
HOST = ('', 8050)
HANDLER_CLASS = magnum.http.wsgi.WSGIWrapper(application)
DEBUG = False
PID_FILE = '/tmp/magnum.pid'

Mod_WSGI

Mod_WSGI is the successor of Mod_Python, it allows you to easily integrate Python code with the Apache server. My first python web app experience was with mod_python and PSP templates, WSGI and cool frameworks such as Pylons have really made life a lot easier.

Mod_WSGI is a great way to get your application deployed quickly. Installing ‘Mod_WSGI’ is with most Linux distributions really easy. For example:

aptitude install libapache2-mod-wsgi

Is all you need to do on a pristine Debian distro to get a working Apache (MPM-Worker) server with Mod_WSGI enabled. To point Apache to your WSGI app just add a single line to ‘/etc/apache2/httpd.conf’:

WSGIScriptAlias / /home/nicholas/benchmark/wsgibench/pong.py

The problem is, that most people already have Apache installed and that they are using it for *shudder* serving PHP. PHP is not thread safe, meaning that you are forced to use a pre-forking Apache server. In this benchmark I am using the threaded Apache version and use mod_wsgi in embedded mode (as it gave me the best performance).

I disabled all unnecessary modules and configured Apache to provide me with a single worker, lots of threads and disabled logging (note: i tried various settings):

<IfModule mpm_worker_module>
    ServerLimit         1
    ThreadLimit         1000
    StartServers          1
    MaxClients          1000
    MinSpareThreads     25
    MaxSpareThreads     75
    ThreadsPerChild     1000
    MaxRequestsPerChild   0
</IfModule>
CustomLog /dev/null combined
ErrorLog /dev/null

Paster

The Paster webserver is the webserver provided with Python Paste it is Pylons default webserver. You can run a WSGI application as follows:

from pong import application
from paste import httpserver
httpserver.serve(application, '0.0.0.0', request_queue_size=500)

Tornado

Tornado is the non-blocking webserver that powers FriendFeed. It provides some WSGI server functionality which can be used as described below. In the previous benchmark I have shown that it provides excellent raw-socket performance.

import os
import tornado.httpserver
import tornado.ioloop
import tornado.wsgi
import sys
from pong import application
sys.path.append('/home/nicholas/benchmark/wsgibench/')
def main():
    container = tornado.wsgi.WSGIContainer(application)
    http_server = tornado.httpserver.HTTPServer(container)
    http_server.listen(8000)
    tornado.ioloop.IOLoop.instance().start()
if __name__ == "__main__":
    main()

Twisted

After installing Twisted with PIP you get a tool ‘twistd’ which allows you to easily serve WSGI applications fe:

wistd –pidfile=/tmp/twisted.pid -no web –wsgi=pong.application –logfile=/dev/null

But you can also run a WSGI application as follows:

from twisted.web.server import Site
from twisted.web.wsgi import WSGIResource
from twisted.internet import reactor
from pong import application
 
resource = WSGIResource(reactor, reactor.getThreadPool(), application)
reactor.listenTCP(8000,Site(resource))
reactor.run()

uWSGI

uWSGI is a server written in C, it is not meant to run stand-alone but has to be placed behind a webserver. It provides modules for Apache, NGINX, Cherokee and Lighttpd. I have placed it behind NGINX which i configured as follows:

worker_processes  1;
 
events {
    worker_connections  30000;
}
 
http {
    include       mime.types;
    default_type  application/octet-stream;
 
    keepalive_timeout  65;
 
    upstream pingpong {
        ip_hash;
        server unix:/var/nginx/uwsgi.sock;
    }
 
    server {
        listen       9090;
        server_name  localhost;
 
        location / {
            uwsgi_pass  pingpong;
            include     uwsgi_params;
        }
 
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }
 
    }
 
}

This made NGINX listen on a unix socket, now all i needed to do was have uWSGI connect to that same unix socket, which i did with the following command:

./uwsgi -s /var/nginx/uwsgi.sock -i -H /home/nicholas/benchmark/wsgibench/ -M -p 1 -w pong -z 30 -l 500 -L

WsgiRef

WsgiRef is the default WSGI server included with Python since version 2.5. To have this server run my application I use the following code which disables logging and increases the backlog.

from pong import application
from wsgiref import simple_server
 
class PimpedWSGIServer(simple_server.WSGIServer):
    # To increase the backlog
    request_queue_size = 500
 
class PimpedHandler(simple_server.WSGIRequestHandler):
    # to disable logging
    def log_message(self, *args):
        pass
 
httpd = PimpedWSGIServer(('',8000), PimpedHandler)
httpd.set_app(application)
httpd.serve_forever()

Results

Below you will find the results as plotted with Highcharts, the line will thicken when hovered over and you can easily enable or disable plotted results by clicking on the legend.

HTTP 1.0 Server results

gunicorn-3w
1200: 1199r/s
01000200030004000
01000200030004000Reply rate

Reply Rate

on an increasing amount of requests (more is better)

  • aspen
  • cherrypy
  • eventlet
  • fapws3
  • gevent
  • gunicorn
  • modwsgi
  • tornado
  • twisted
  • uwsgi
  • gunicorn-3w

Highcharts.com

Disqualified servers

From the above graph it should be clear that some of the web servers are missing, the reason is that I was unable to have them completely benchmarked as they stopped replying when the request rate passed a certain critical value. The servers that are missing are:

  • MagnumPy, i was able to obtain a reply rate of 500 RPS, but when the request rate passed the 700 RPS mark, MagnumPy crashed
  • Concurrence, I was able to obtain a successful reply rate of 700 RPS, but it stopped replying when we fired more than 800 requests a second at the server. However, since Concurrence does support HTTP/1.1 keep alive connections and behaves correctly when benchmarked under a lower connection rate but higher request rate you can find its results in the HTTP/1.1 benchmark
  • Cogen, was able to obtain a reply rate of 800 per second but stopped replying when the request rate was above 1500 per second. It does have a complete benchmark under the HTTP/1.1 test though.
  • WSGIRef, I obtained a reply rate of 352 but it stopped reacting when we passed the 1900 RPS mark
  • Paster, obtained a reply rate of 500 but it failed when we passed the 2000 RPS mark

Interpretation

From the servers that passed the benchmark we can see that they all have an admirable performance. At the bottom we have Twisted and Gunicorn, the performance of Twisted is somewhat expected as well it isn’t really tuned for WSGI performance. I find the performance of Gunicorn somewhat disappointing, also because for example Aspen which is a pure Python from a few years back, shows a significant better performance.  We can see however, that  increasing the worker count does in fact improve the performance as it is able to obtain a reply rate competitive with Aspen.

The other pure python servers, CherryPy  and Tornado seem to be performing on par with ModWSGI. It looks that CherryPy has a slight performance edge over Tornado. So, if you are thinking to change from ModWSGI or CherryPy to Tornado because of increased performance you should think again. Not only does this benchmark show that there isn’t that much to gain. But you will also abandon the process/thread model meaning that you should be cautious with code blocking your interpreter.

The top performers are clearly FAPWS3, uWSGI and Gevent. FAPWS3 has been designed to be fast and lives up the expectations, this has been noted by others as well as it looks like it is being used in production at Ebay. uWSGI is used successfully in production at (and in development by) the Italian ISP Unbit. Gevent is a relatively young project but already very successful. Not only did it perform great in the previous async server benchmark but its reliance on the Libevent HTTP server gives it a performance beyond the other asynchronous frameworks.

I should note that the difference between these top 3 is too small to declare a clear winner of the ‘reply rate contest’. However, I want to stress that with almost all servers I had to be careful to keep the amount of concurrent connections low since threaded servers aren’t that fond of lots concurrent connections. The async servers (Gevent, Eventlet, and Tornado) were happy to work on whatever was being thrown at them. This really gives a great feeling of stability as you do not have to worry about settings such as poolsize, worker count etc..

01000200030004000
01000200030004000Response Time (ms)

Response Time

on an increasing amount of requests (less is better)

  • aspen
  • cherrypy
  • eventlet
  • fapws3
  • gevent
  • gunicorn
  • modwsgi
  • tornado
  • twisted
  • uwsgi
  • gunicorn-3w

Highcharts.com

Most of the servers have an acceptable response time. Twisted and Eventlet are somewhat on the slow side but Gunicorn shows, unfortunately, a dramatic increase in latency when the request rate passes the 1000 RPS mark. Increasing the Gunicorn worker count lowers this latency by a lot but it still on the high side compared with for example Aspen or CherryPy.

01000200030004000
050100150200Error Rate

Error Rate

on an increasing amount of requests (less is better)

  • aspen
  • cherrypy
  • eventlet
  • fapws3
  • gevent
  • gunicorn
  • modwsgi
  • tornado
  • twisted
  • uwsgi
  • gunicorn-3w

Highcharts.com

The low error rates for CherryPy, ModWSGI, Tornado, uWSGI should give everybody confidence in their suitability for a production environment.

HTTP 1.1 Server results

In the HTTP/1.1 benchmark we have a different list of contestants as not all servers were able to pipeline multiple requests over a single connection. In this test the connection rate is relatively low, for example a request rate of 8000 per second is about 800 connections per second with 10 requests per connection. This means that some servers that were not able to complete the HTTP/1.0 benchmark (with connection rates up to 5000 per second) are able to complete the HTTP/1.1 benchmark (Cogen and Concurrence for example).

eventlet
4000: 2298.6r/s
02000400060008000
0200040006000800010000Request rate

Succesful Reply Rate

on an increasing amount of requests (more is better)

  • uwsgi
  • modwsgi
  • cherrypy
  • twisted
  • cogen
  • gevent-spawn
  • gevent
  • tornado
  • eventlet
  • concurrence

Highcharts.com

This graph shows the achieved request rate of the servers and we can clearly see that the achieved request rate is higher than in the HTTP/1.0 test. We could increase the total request rate even more by increasing the number of pipelined requests but this would then lower the connection rate. I think that 10 pipelined requests is a ok generalization of a webbrowser opening an average page.

The graph shows a huge gap in performance difference, with the fastest server Gevent we are able to obtain about 9000 replies per second, with Twisted, Concurrence and Cogen we get about 1000. In the middle we have CherryPy and ModWSGI with them we are able to obtain a reply rate around the 4000. It is interesting that Tornado while being close to CherryPy and ModWSGI seems to have an edge in this benchmark compared to the edge CherryPy had in the HTTP/1.0 benchmark. This is along the lines of our expectations as pipelined requests in Tornado are cheaper (since it is Async) then in ModWSGI or CherryPy. We expect this gap to widen if we increase the number of pipelined requests. However, it falls to be seen how much of a performance boost this would provide in a deployment setup as Tornado and CherryPy will then probably be sitting behind a reverse proxy, for example NGINX. In such a setting the connection type between the upstream and the proxy is usually limited to HTTP/1.0, NGINX for example does not even support HTTP/1.1 keep alive connections to its upstreams.

The best performers are clearly uWSGI and Gevent. I benchmarked Gevent with the ’spawn=none’ option to prevent Gevent from spawning a Greenlet, this seems fair in a benchmark like this. However, when you want to do something interesting with lots of concurrent connections you want each request to have its own Greentlet as this allows you to have thread like flow control. Thus I also benchmarked that version which can be seen in the Graph under the name ‘Gevent-Spawn’, from its results we can see that performance penalty is small.

02000400060008000
050010001500200025003000Response Time (ms)

Response Time

on an increasing amount of requests (less is better)

  • uwsgi
  • modwsgi
  • cherrypy
  • twisted
  • cogen
  • gevent-spawn
  • gevent
  • tornado
  • eventlet
  • concurrence

Highcharts.com

Cogen is getting a high latency after about 2000 requests per second, Eventlet and Twisted show an increased latency fairly early as well.

02000400060008000
0510152025Error Rate

Error Rate

on an increasing amount of requests (less is better)

  • uwsgi
  • modwsgi
  • cherrypy
  • twisted
  • cogen
  • gevent-spawn
  • gevent
  • tornado
  • eventlet
  • concurrence

Highcharts.com

The error rate shows that Twisted, Concurrence and Cogen have some trouble keeping up, I think all other error rates are acceptable.

Memory Usage

I also monitored the memory usage of the different frameworks during the benchmark. The benchmark noted below is the peak memory usage of all accumulated processes. As this benchmark does not really benefit from additional processes (as there is only one available processor) I limited the amount of workers when possible.

36231223024771744276993233157
AspenCherryPyCogenConcurrenceEventletFAPWS3geventGunicornGunicorn-3wMagnumPyModWSGIPasterTornadoTwisteduWSGIWsgiRef
0255075100125150Memory Usage (Megabytes)

Accumulated Peak Memory Usage per WSGI server

Highcharts.com

From these results there is one thing that really stands out and that is the absolutely low memory usage of uWSGI, Gevent and FAPWS3. Especially if we take their performance into account. It looks like Cogen is leaking memory, but I haven’t really looked into that. Gunicorn-3w shows compared with Gunicorn a relatively high memory usage. But it should be noted that this is mainly caused by the switch from the naked deployment to the deployment after NGINX as we now also have to add the memory usage of NGINX. A single Gunicorn worker only takes about 7.5Mb of memory.

Let’s Kick it up a notch

The first part of this post focussed purely on the RPS performance of the different frameworks under a high load. When the WSGI server was working hard enough it could simply answer all requests from a certain user and move on to the next user. This keeps the amount of concurrent connections relatively low making such a benchmark suitable for threaded web servers.

However, if we are going to increase the amount of concurrent connections we will quickly run into system limits as explained in the introduction. This is commonly known as the C10K problem. Asynchronous servers use a single thread to handle multiple connections and when efficiently implemented with for example EPoll or KQueue are perfectly able to handle a large amount of concurrent connections.

So that is what we are going to do, we are going to take the top-3 performing WSGI servers namely Tornado, Gevent and uWSGI (FAPWS3 lack of HTTP/1.1 support made it unsuitable for this benchmark) and give them 5 minutes of ping-pong mayhem.

You see, ping-pong is a simple game and it isn’t really the complexity that makes it interesting it is the speed and the reaction of the players. Now, what is 5 minutes of pingpong mayhem? Imagine that for 5 minutes long every second an Airbus loaded with ping-pong players lands (500 clients) and each of those players is going to slam you exactly 12 balls (with a 5 second interval). This would mean that after 5 seconds you would already have to return the volleys of 2000 different players at once.

Tsung Benchmark Setup

To perform this benchmark I am going to use Tsung, which is a multi-protocol distributed load testing tool written in Erlang. I will then have 3 different machines simulating the ping-pong rampage. I used the following Tsung script.

<?xml version="1.0"?>
<!DOCTYPE tsung SYSTEM "/usr/share/tsung/tsung-1.0.dtd" []>
<tsung loglevel="warning">
 
    <clients>
        <client host="tsung2" use_controller_vm="false" maxusers="800"/>
        <client host="tsung3" use_controller_vm="false" maxusers="800"/>
        <client host="bastet" use_controller_vm="false" maxusers="800"/>
    </clients>
    <servers>
        <server host="tsung1" port="8000" type="tcp"/>
    </servers>
    <monitoring>
        <monitor host="tsung1" type="erlang"/>
    </monitoring>
 
    <load>
        <arrivalphase phase="1" duration="5" unit="minute">
            <users interarrival="0.002" unit="second"/>
        </arrivalphase>
    </load>
 
    <sessions>
        <session name='wsgitest' probability='100'  type='ts_http'>
            <for from="0" to="12" incr="1" var="counter">
                <request>
                    <http url='http://tsung1:8000/' version='1.1' method='GET'/>
                </request>
                <thinktime random='false' value='5'/>
            </for>
        </session>
    </sessions>
 
</tsung>

Tsung Benchmark Results

0100200300400500600
05000100001500020000Concurrent Connections

Concurrent Connections

measured over time (in seconds)

Highcharts.com

0250500
00.511.5

System Load

Highcharts.com

0250500
0255075100125CPU Usage

CPU Usage

Highcharts.com

0250500
7008009001000Memory Free

Free memory

Highcharts.com

Let me first state that all the three frameworks are perfectly capable to handle this kind of load, none of the frameworks dropped connection or ignored requests. Which I must say is already quite an achievement, considering that they had to handle about 2 million requests each.

Below the concurrent connection graph we can see the system load, the cpu usage and the free memory on the system during the benchmark. We can clearly see that Gevent put less strain on the system as the CPU and Load graph indicate. In the memory graph we can see that all frameworks used a consistent amount of memory.

The readers that still pay close attention to this article should note that the memory graph displays 4 lines instead of 3. The fourth line is Gevent compiled against Libevent 2.0.4a, the new release of Libevent has been said to show considerable performance improvements in its HTTP server. But it is still an alpha version and the memory graph shows that this version is leaking memory. Not something you want on your production site.

0100200300400500600
0100200300400500Response Time (ms)

Server Latency

measured over time (in seconds)

Highcharts.com

The final graph shows the latency of the 3 frameworks we can see a clear difference between Tornado and its competitors as Tornado’s response time hovers around 100ms, uWSGI around 5ms and gevent around 3ms. This is quite a difference and I am really amazed by the low latency of both Gevent and uWSGI during this onslaught.

Summary and Remarks

The above results show that as a Python web developer we have lots of different methods to deploy our applications. Some of these seem to perform better than others but by focussing only on server performance I will not justify most of the tested servers as they differ greatly in functionality. Also, if you are going to take some stock web framework and won’t do any optimizations or caching, the performance of your webserver is not going to matter as this will not be the bottleneck. If there is one thing which made this benchmark clear is that most Python Web servers offer great performance and if you feel things are slow the first thing to look at is really your own application.

When you are just interested in quickly hosting your threaded application you really can’t go wrong with Apache ModWSGI. Even though Apache ModWSGI might put a little more strain on your memory requirements there is a lot to go for in terms of functionality. For example, protecting part of your website by using a LDAP server is as easy as enabling a module. Standalone CherryPy also shows great performance and functionality and is really a viable (fully Python) alternative which can lower memory requirements.

When you are a little more adventurous you can look at uWSGI and FAPWS3, they are relatively new compared to CherryPy and ModWSGI but they show a significant performance increase and do have lower memory requirements.

Concerning Tornado and performance, I do not think Tornado is an alternative for CherryPy or even ModWSGI. Not only does it hardly show any increases in performance but it also requires you to rethink your code. But Tornado can be a great option if you do not have any code using blocking connections or are just wanting to look at something new.

And then there is Gevent, it really showed amazing performance at a low memory footprint, it might need some adjustments to your legacy code but then again the monkey patching of the socket module could help and I really love the cleanness of Greenlets. There has already been some reports of deploying Gevent successfully even with SQLAlchemy.

And if you want to dive into high performance websockets with lots of concurrent connections you really have to go with an asynchronous framework. Gevent seems like the perfect companion for that, at least that is what we are going to use.

90 Responses to “Benchmark of Python WSGI Servers”

  1. [...] WSGI server is based on the libevent’s built-in HTTP server, making it super fast. [...]

  2. Hi Nicholas,

    I’m curious if you verified that the threadpools used in each server were of the same size (for those servers using threadpools. This could make a significant difference in the results. It might also be interesting to learn beyond what point increasing the threadpool size no longer helps performance. It’s also worth noting that several of the top performers, by not using threads, are not actually implementing a general purpose, scalable WSGI server. They take a valuable shortcut which aids performance, but this should be considered when selecting a server, since it could lead to disastrous performance for certain applications.

    I’m also curious if you did any analysis of the errors some of the servers encountered. From my investigations, I’ve commonly found that this is closely tied to request throughput rate; when a server begins to lag behind the request rate, if it does not continue to accept new connections, many TCP/IP stacks will begin to reject incoming TCP connections themselves. This is somewhat useful to know, but I think it’s worth separating from a failure that actually occurs within the software being tested, particularly since in this case it’s mostly redundant with the information about the number of requests/second each server can respond to.

    One last thing. :) I wonder if you have any information on the distribution of response times. The graphs of mean (I assume) times are interesting, but knowing what the raw data looks like is also important (and actually necessary in order to correctly interpret the rest of the data you’ve presented here).

    Great work so far. I hope you keep it up. I also hope that at some point there’s something downloadable that people can use to reproduce your results, as well as extend the analysis done on them.

    • I tried to maximize the performance of the various frameworks by optimizing the threadpool, but this is kinda painful to perform because when a pool gets too big it can crash the server. So i assume that some gains are possible, but i suspect that those gains will be relatively small though.

      Concerning the errors, you can see a difference in the kind of errors between the servers that are able to complete the benchmark and those who aren’t. But yes i could have specified wether it was a connection reset or a timeout, but the article was getting really long already.

      The mean values are indeed depicted in the graph, while I agree that the STD would be interesting to show as well the graphs are already very crowded and i find that the curve can give me some indication of the stability of the mean. For example compare uWSGI against FAPWS3 after the 6000 RPS mark with each other.

  3. yml says:

    Very interesting benchmark it confirms my personal experience with the WSGI webserver i have tested mod_wsgi cherrypy and uwsgi.

    I would be interesting to know which version for each web server you have used.

    Regards,
    –yml

  4. Aigars Mahinovs says:

    Please use more distinct colors for your graphs and also make the legend lines thicker, 10 px thick at least – it is impossible to understand what line belongs to what server.

  5. Nicholas,

    Just a note on the gunicorn numbers. Running the server with a single worker process is constraining any ability for concurrency in its responses. Its meant to run with 2-4x the number of processors you have on the machine.

    There’s also a slight gotchya in the motivation for implementing HTTP/1.1. Nginx’s proxy is only HTTP/1.0, so if you need to use it to scale out multiple python server processes there’s no benefit from HTTP/1.1 which is why gunicorn didn’t bother to implement it. (As its designed to be proxied by nginx).

    If I get some time later I’ll try and rerun some of these numbers on bigger hardware. I’ve personally seen gunicorn run that HTTP/1.0 benchmark 10K req/s faster than gevent does.

    Thanks for the writeup,
    Paul Davis

    • I understand these gotcha and i mentioned it in the article. If i could be of any help please let me know.

      I could rerun the bench somewhere in the future with more assigned processors and workers to specifically test out that case, if you want.

      • Nicholas,

        Remember that gunicorn isn’t like the rest of web servers in terms of its process utilization. Even when its only got a single core allocated for use it will still benefit from an increase in the number of workers allocated. Configuring gunicorn with a single worker is like configuring all the threaded servers to use a single thread, its just not how it was intended to be run.

        Also, in your httperf invocation, did you keep the number of connections a constant for every test? Ie, were the 4K r/s tests taking 1/10th of a second? That might explain some of the noise in the graphs.

        HTH,
        Paul

        • Ok,

          I am running Gunicorn right now with 3 workers, thus i have a master processes and 3 workers (and ofcourse the NGINX processes).

          gunicorn -b unix:/var/nginx/uwsgi.sock -w 3 pong:application

          I’ll add it to the benchmark when its done, the main reason why i did not try multiple workers was because this would have a negative influence on the memory statistics and I did not expect any performance increase. From the initial results i’m getting back from the current benchmark it does seem to improve the results for Gunicorn moving it to a more respectable position.

          Cheers,
          Nicholas

          • Nicholas,

            Excellent!

            Feel free to report the sum of the process memory. There are some oddities with copy-on-write semantics but I’ve never heard of a good way to tease those apart for proper usage reports.

            Thanks,
            Paul

  6. Steve Losh says:

    You mentioned putting uWSGI behind nginx but didn’t say anything about doing the same for gunicorn. Does that mean you ran the benchmarks without nginx proxying for gunicorn?

    Gunicorn isn’t designed to be used like that — it’s supposed to live behind nginx (or something similar) just like uWSGI.

    http://gunicorn.org/deployment.html describes the way you’re supposed to deploy gunicorn.

    • Thats correct, however, i did try to put it behind NGINX (via a Unix socket) and that did not give me any performance increase.

      I am also having a difficult time how that would improve performance as I only use a single worker.

      • In this specific case it doesn’t matter whether you run gunicorn behind nginx as the wsgi app and the clients are both super duper fast. Gunicorn depends on having a buffering proxy to deal with client load as described at [1]. Slowloris is obviously an extreme example of slow client behavior but a public facing server will obviously be exposed to the entire spectrum of client speed between super fast and super slow.

        HTH,
        Paul

        [1] http://ha.ckers.org/slowloris/

  7. Twisted is a processor/thread flavor ?

  8. Idan Gazit says:

    Yeah, seconding a request to make the charts legible.

    A 1-pixel-thick legend makes it impossible to pick out which color belongs to which server, making all your hard work practically useless as I’m unable to read the chart.

  9. Fantastic information, thanks.

  10. I noticed that you’ve used “from hello import application” (instead of “from pong”) for FAPWS3 and Paster.
    Is it the same application though?

  11. Name says:

    Can you add Rocket to the tests?

    https://launchpad.net/rocket

  12. Richard Shea says:

    Great article. Enjoyed reading it. Thanks for all your work writing it.

  13. Passy says:

    Really, really impressive post. Thanks for sharing your research. It’s been about time for a comprehensive comparison like that.

  14. Bram Cohen says:

    It looks like in your last set of tests tornado was just barely able to handle the load from a CPU standpoint, which might account for its high delays. If you run a slightly less difficult test, or on a faster machine, so that the CPU load of tornado is more like 70% than 90%, does the server latency drop to being similar to the others, or is that endemic?

  15. Using ‘ThreadsPerChild 16000′ for Apache/mod_wsgi is just plain stupid. It is directly because of that that it had such a large memory footprint. If you drop that value down to below 100 you will probably find the same performance and yet a lot less memory being used. If your test program is merely a hello world application that returns a few bytes, you could possibly get away with a lot less threads than 100. Some high performance sites with a lot of requests, but where Apache and the application has been tuned properly, get away with 2 or 3 threads in single daemon mode process.

    When using Apache/mod_wsgi, forcing use of a single process is also going to make performance suffer due to limitations of the GIL. The strength of Apache/mod_wsgi is that you can fire up multiple processes and avoid these GIL issues, especially where using a multi processor/core system.

    I suggest you go back and redo your Apache/mod_wsgi tests starting with the Apache default of 25 threads in one process for embedded. If you see it start to suffer under high number of concurrent requests, then add more processes as well and not just threads, with more processes and dropping threads actually better.

    • Thanks, for your remarks Graham.

      I obviously did not have the amount of threads set to that insane amount of 16k as this will invoke the OOM killer on my machine. The 16k setting is a left over from when i tried to have Apache competing in the Tsung benchmark, that didn’t work.

      As noted i experimented with some of the settings to obtain an optimal balance of not increasing the error rate (in the HTTP 1.1 benchmark which can force a lot of concurrent connections). For the benchmark i used a thread setting of 1000, lowering this number would raise the error rate. With this setting the memory usage starts at a relatively low of 21Mb but as the benchmark progresses it reaches 64Mb.

      I could try splitting the amount of threads over multiple processes because indeed the issues you mention could indeed hold back ModWSGI its performance. But I suppose that this would increase the memory usage, at least this was the main reason why I decided to limit it to one process.

      I will see, when and if I re-benchmark it. Btw, your other comment just popped in. I did not find out how to disable logging on Apache, can you give me a pointer?

  16. Oh, and turn off Apache access logging, don’t just send it to /dev/null. Turning it off completely is better than sending it to /dev/null as don’t then have to do the actual processing and writing of the messages.

  17. RJ Ryan says:

    Thank you so much for putting the time into writing this. It was very interesting and informative to read.

  18. [...] WSGI web server benchmark was published. It’s a decent benchmark, despite some criticisms. But it benchmarks what [...]

  19. Kyle says:

    Unless Twisted has changed recently, you need to specifically import the epoll reactor, otherwise you get the select reactor which is significantly slower.

    http://twistedmatrix.com/documents/current/core/howto/choosing-reactor.html#epoll

  20. I find interesting folks blaming Nicholas for not using the right configuration whilst not pointing to where each server documents those performance settings. People should get a grip.

  21. Josh says:

    This is a completely off-topic question, but what software do you use to create the charts? Is it open source?

    • As explained in the article, the data gets collected by autobench (which commands httperf) and then gets converted to Highcharts javascript code by a simple Python script.

      Autobench and Highcharts are both open source.

      • Quick correction… Highcharts is under the “Creative Commons Attribution-NonCommercial 3.0 License”. As such, it’s not under one of the ‘approved’ licenses by the OSI. In fact, it fails the first point of The Open Source Definition:

        “””
        1. Free Redistribution
        The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.
        “””

        However, it’s a pretty sharp charting tool and worth every penny (if using in a commercial context). It’s just not Open Source.

        Anyway, I very much enjoyed your post. Thanks for sharing your findings with the rest of us! You’ve saved me (and I’m sure many others) hours of testing :)

        Kind regards,
        Gabe

  22. [...] Piël ha realizzato un’interessante comparativa utilizzando una piccola funzione WSGI su una macchina Debian Lenny/AMD64 con Python 2.6.4. Per ogni [...]

  23. Hi Nicholas,

    Really interesting benchmark, but i think it could be really interesting to turn it in a challenge and make it more ‘real world usage’ by only providing apps:
    – a simple ping/pong app
    – a simple django app
    – a django app + template
    django app + sqlite reads and /or memcache read/write
    and ask each server’s community to tune and provide the optium setup for its specific server to achieve best performance.

    Also using hardware that is more uptodate, because i don’t know any serious job getting done on a single core anymore :)
    I think that quad core and 2/4Go or ram is a better view of the server market.

    this challenge could be run on EC2 instances for example.

    And it should be noted that even the worse server in your benchmark with (1000 r/s on a single core) would provide enought performance for 99% of current website needs.
    So feature and ease of use should be taken into account to choose a wsgi server.

    Great article, can’t wait to read the next one!

  24. DavidG says:

    First of all: nice benchmarks!

    I must admit however that for choosing a Python WSGI server, I’d base my choice for at least 50% on benchmarks with basic POST data handling to see what it’s capable of… Maybe a follow up?

  25. [...] Nicholas Piël » Benchmark of Python Web Servers – An extremely techie article – comparing the performance characteristics of different wsgi server implementations for hosting python sites. [...]

  26. MyEyes! says:

    Is it just my ageing CRT-irradiated eyes, or are the colours in those graphs almost impossible to distinguish? I can’t match the legend to the lines in the graph without opening it in GIMP!

  27. MyEyes! says:

    Ignore me, I just found the interactive mouse-overs! Nice ;-)
    Thanks for the benchmarks!

  28. [...] Nicholas Piel ha scritto un paio di righe di codice allo scopo di testare e fare un rapporto soddisfacente sui vari wsgi server disponibili per Python. Il risultato testimonia come questi server possano offrire ottime soluzioni e buona efficienza. Trovate i dettagli in questo articolo. Io sottolineerei una frase dell’autore: la velocità dipende molto da come scrivete il codice. Una grande verità che spesso gli sviluppatori tendono a dimenticare puntando il focus solo sui framework e sui server. [...]

  29. [...] Nicht nur für Hardcore Pythonistas, sondern auch für Webmaster, die das Optimum aus ihren Kisten herausholen wollen: Benchmark of Python Web Servers. [...]

  30. [...] Filed under: Uncategorized | A few days ago I ran into an interesting post by Ian Bicking about a benchmark that Nicholas Piël ran on WSGI servers. Go ahead and read the original posts, the the skinny is [...]

  31. [...] web applications with an eye to performance. Nicholas Piël has done some great work testing and documenting many of them. Gevent looks like a great option as does CherryPy, but uWSGI caught my eye because it [...]

  32. david fries says:

    Greate post, Nicholas! As luck would have it, I was just looking for a benchmark like this as part of a work project. You saved me a lot of work :)

    I am curious, though. How did you measure the memory footprints? Using free, pmap, good old top or something home-brewed? Memory usage is what I’m most concerned about.

  33. UWSGI FAN says:

    Excellent article. Thanks for putting this benchmark online. I’m looking strongly at uWSGI and Gevent now. I haven’t deployed a Django app for some time now (about 9 months) because it’s previously just been a hobby. I’ve used MOD_WSGI and FCGI with Flup, but now I need something for an enterprise deployment and I’m so effing excited to see that Python is getting the sort of cool tools that RoR has enjoyed. Ahhhhh, this is the first time I’ve been so excited of some effing code :)

  34. [...] I have one particularly critical Pylons app currently deployed with paster, and after reading this Benchmark of Python WSGI servers I decided on uWSGI for this [...]

  35. [...] web | I’ve had more time to work on Labour (originally posted here, inspired by this and that), the WSGI Server Durability Benchmark. I’m relatively happy with progress so far, and will [...]

  36. [...] app服务器决定着整个系统的响应快慢。通过参考Nicholas Piel写的《Benchmark of Python WSGI Servers》我圈定了以下几个服务器(模块):mod_wsgi for [...]

  37. gmong says:

    Excellent article! very helpful.

  38. Hi Nicholas,

    it seems you have failed to configure at least gunicorn properly. Gunicorn can actually use eventlet *or* gevent to serve its requests, see http://gunicorn.org/deployment.html

    Other than that, this only goes to show that libevent is very good at handling concurrency if you ask me. :-)

    Regards

    • Hi Ludvig,

      At the time I wrote this article this was in a separate package (Grainbows, which is mentioned above). The functionality merger is quite recent.

      Cheers,
      Nicholas

      • Peter Portante says:

        Have you any interest in running the gunicorn config with gevent to see if it performs any differently. I am curious. Thanks, -peter

  39. [...] И оттуда, в частности, очень подробный и хороший обзор большого кол-ва wsgi серверов: http://nichol.as/benchmark-of-python-web-servers [...]

  40. Nicholas, you did awesome job! Thanks.
    Only question is why you decided on unix-socket for ucgi and gunicorn under nginx, I’ve read that localhost tcp/ip may actually outperform unix sockets.

  41. Yang Zhang says:

    Hi, thanks for the great benchmark. Would be very interested to see results for gevent with greenlet spawning enabled, and perhaps even results for evio. Thanks!

  42. Andrew Stromnov says:

    Yet another WSGI server for Python: http://pypi.python.org/pypi/meinheld.

  43. Really really cool article. Thanks so much!!!

  44. [...] Servers: Benchmark of Python Web Servers Cache Mechanisms: Evaluating Django Caching [...]

  45. [...] Servers: Benchmark of Python Web Servers Cache Mechanisms: Evaluating Django Caching [...]

  46. [...] threads, and greenlets and such. I also came across the Green Unicorn project that, though not very speedy with its default worker class, has recently integrated gevent to make it a very attractive [...]

  47. [...] by david, on Sep 24, 2010 4:36:19 PM. After seeing Nicholas Piël benchmark a bunch of Python web servers, I was just itching to try some different configurations. So, I [...]

  48. [...] for nginx等等,详情见《Python WSGI服务器大乱斗(Rev.2)》,《Benchmark of Python WSGI Servers》 和《WSGI [...]

  49. [...] that hits throughput, still OOTB the stack is doing Ok. An extensive benchmarking experiment at http://nichol.as/benchmark-of-python-web-servers shows that Java and Jetty even in threaded mode is just as good as some fo the event mode [...]

  50. Great article… thanks man

  51. Impressive coverage of the status of all the servers. However once you put Django or something like that into the mix, most the stats don’t really mean much any more.

  52. Seb says:

    Great article, well done!
    Good to see gevent among the best which was my first choice and I’m happy with it.

  53. FILLY says:

    Do u use a special software to benchmark? (like Apache Benchmark Tool?)
    If Yes which one?

  54. Mayur says:

    Very very useful test. Thanks for sharing it.

    Tornado looks great if all your communications are short bursts, but I’m looking at gevent because some of our responses can be quite large. As a result, we would be unable to use the “normal” gevent.wsgi server (which is really the libevent server if I understand correctly) because we can’t afford to buffer the messages in RAM for all those connections.

    I was wondering whether you have timing for gevent using the pywsgi server, which supports SSL and chunked response. I imagine that it would have very different characteristics.

    Thanks again.

  55. fp says:

    Response time curves look incorrect – look more like a throughput curves. Something is wrong in the tests.

  56. Massimo says:

    Any reason for not including rocket? https://launchpad.net/rocket
    It is quite popular considering that web2py uses it.

  57. fijal says:

    It would be cool to see how pure-python web servers (without parts in C) benefit from PyPy. I know twisted web is sped up something like 2x.

  58. Vladimir says:

    Excelente artículo.

  59. [...] מצויינת של ביצועי ווב סרברז לפייטון שלא רק נכנסת לפרטי הבדיקות, בודקת הרבה סרסבים אלא גם [...]

  60. [...] app服务器决定着整个系统的响应快慢。通过参考Nicholas Piel写的《Benchmark of Python WSGI Servers》我圈定了以下几个服务器(模块):mod_wsgi for [...]

  61. HG says:

    Thanks for this very useful piece!

  62. jell says:

    next time try to turn on epoll (on linux) or kqueue (on bsd) in twisted – it’s only two lines of code:
    from twisted.internet import epollreactor
    epollreactor.install()

  63. [...] try to avoid fat appache, which is the official Mediacore recommendation. I

0

0
 

Web2py + Nginx + FCGI : Installation Notes

web2py install on nginx? – web2py-users | Google Groups:

I have installed it but there is a problem using routes.py with NGINX
+FCGI , I have posted about that earlier.
But using NGINX + CherrPy + web2py works fine and you can find many
other posts discussing that.

I think that NGINX + FCGI is so much faster than NGINX + CherryPy, I
will benchmark the performance soon.

Here is my notes about installing NGINX + FCGI + web2py :

NGINX ——> Socket file <——- web2py

1- Have the latest release of NGINX.

2- Edit the configuration file of your site so that it can use FCGI as
follows:

fastcgi_pass unix:/tmp/web2py.sock;
include fastcgi_params;

3- Edit fcgihandler.py which is located in the root directory of
web2py, change the socket file to web2py.sock

4- Run fcgihandler.py –> python fcgihandler.py

5- Change the permissions of /tmp/web2py.sock to something that is
accessible by NGINX and web2py,
for testing use chmod 777 /tmp/web2py.sock

0

0
 

Nginx is better than Lighttpd

zen and the art of adding bugs to an empty text file:

Lately I’ve gotten fed up with Lighttpd. There’s been outstanding bugs that are so familiar they’ve acquired names. The project’s lead, Jan Kneche, seems more interested in schmoozing up to the Rails crowd than providing a decent product (which is ironic given that Mongrel aims to make Lighttpd irrelevant in the Rails world).

Anyway, a discussion today on the TurboGears list brought up an alternative.

Bob Ippolito replied to a discussion between myself and another person about whether to use Pound or Lighttpd as a reverse-proxy in front of TurboGears applications. I held that Pound is the correct solution as it is a proxy, whereas Lighttpd is a web server that can act as one. Further, I expressed my frustration regarding the state of Lighttpd development and the unmaintainability of its config files.

Bob offered up the following information:

One problem with Lighty is that it leaks memory like a sieve [1]. I audited it for a little bit and I gave up, it’s a mess. I’d steer clear of it, it will quickly ruin your day if you throw a lot of traffic at it.

The only solution I know of that’s extremely high performance that offers all of the features that you want is nginx [2], but its documentation is largely in Russian. I can’t read Russian, but I was able to figure it out (the configuration language isn’t Russian, neither is C source). I currently have nginx doing reverse proxy of over tens of millions of HTTP requests per day (thats a few hundred per second) on a single server. At peak load it uses about 15MB RAM and 10% CPU on my particular configuration (FreeBSD 6).

Under the same kind of load, apache falls over (after using 1000 or so processes and god knows how much RAM), pound falls over (too many threads, and using 400MB+ of RAM for all the thread stacks), and lighty leaks more than 20MB per hour (and uses more CPU, but not significantly more).

[1] http://trac.lighttpd.net/trac/ticket/758

[2] http://sysoev.ru/en/

I found this interesting, first off because I know that Bob has used Pound in the past and because, well, it’s Bob. Also, someone on the #cherokee channel had suggested Nginx as an option, but since the docs were in Russian I was reluctant to commit to it.

Well after Bob’s email, I started searching and it turns out that while the official docs are in Russian, there’s a bit of English documentation on the web, and apparently some happy users as well:

Also of interest is that Nginx happens to live in Gentoo’s portage.


4 comments Leave a comment


On Aug 24 eliott said:

nginx (engine-x?). Sounds good. I may have to take it for a spin…too bad I don’t know russian.

On Aug 25 Joshua said:

So.. nginx lives in portage? What, like a parasite?

On Aug 14 Nginx vs. Lighttpd said:

Wow. I had no idea Lighttpd still has memory leaks issues older than 3 years ago. BTW, in the meantime the ticket’s URL has changed to http://redmine.lighttpd.net/issues/show/758

On Aug 16 Dan Dascalescu said:

I wanted to find Bob’s original message, and that turned out to be not so easy, due to an odd <a href=”http://wiki.dandascalescu.com/blog/google_groups_search_fail“>search bug with Google Groups</a>. It just won’t find that message when you “Search this group” for “how much RAM”. I managed to find it with Google Search itself (?!) on other TurboGears mailing list mirrors.

0

0
 

Nginx vs. Lighttpd for a small VPS

Nginx vs. Lighttpd for a small VPS « HostingFu:

Nginx vs. Lighttpd for a small VPS

Tag: , , , — January 10, 2007 @ 4:41 am Comments 56

I have been using Lighttpd for almost a year and Nginx for a month on my servers. I know that they were created to be massively scalable, solving the C10k problem. However their asynchronised-IO model and small memory foot-print also make them suitable as alternative HTTP servers for memory-limited VPS. Alternative = Anything but the current defacto Apache.

I will be writing more about Lighttpd and Nginx later during the year, but will try to use this post to draw some comparison between Nginx, the new darling of these light-weight web servers, and Lighttpd, many Web 2.0 developers’ all time favourite.

Lighttpd

Lighttpd I have been running Lighttpd (pronounced “lighty”) on my home servers and development boxes since the beginning of 2006. It is a great replacement for Apache if you have the whole box to yourself, i.e. you don’t need to worry about supporting .htaccess files that your users might use. Currently this website is hosted on lighttpd-1.4.13 on a Gentoo VPS.

Pros

  • Light weight. Clean restart of 1.4.13 takes no more than 2Mb RSS on this 64bit VPS. It binds the port, drops the privilege and that’s it! A single process does all the tricks even when you have hundreds of concurrent connections. No more pre-fork MPM with mis-configured MaxClient that sends you to swap hell.
  • Speed. Very fast static file serving. Very fast FastCGI serving. Very fast proxy serving.
  • Modules, and lots of them. Good comprehensive documentation as well. It even has SCGI for your Quixote apps.
  • Mod_magnet. Wanna a scripting engine right inside your web server? Mod_magnet integrates Lua into lighttpd, so your World of Warcraft scripting skillz can be put into better use.
  • Community. It has got a Blog, a Wiki/bug tracker and a forum. It is easy to find help when you need one.

Cons

  • Stability (or lack of according to the RoR folks). I had quite a lot of issues using Lighttpd as proxy+HTTPS front-end for our Python stuff, but the same app runs fine with just lighttpd + proxy without HTTPS.
  • Mod_rewrite (or again, lack of it). Built-in rewriting engine sucks, and porting Apache mod_rewrite rules over can be non-trivial sometimes. Update: Here’s an article I have written on Drupal clean URL on Nginx and Lighttpd, which looks at the URL rewrite options of these two web servers.
  • Memory leak. The RSS of my lighty process grows by about 1.5Mb per day, but then I don’t have lots of traffic (less than 50k requests a day). At the end I just need to restart it once a week. Many people have far worse memory leaking issues I heard.

Nginx

Nginx I have been running Nginx (pronounced “engine X”) on my development box and two of my VPS’s since December 2006. It is Russian, fast and very configurable. I am currently using 0.5.5 for my sites, but don’t be deceived by its version number — it is very stable.

Pros

  • Light weight. It is not as light weight as lighttpd when it clean-starts. At least two processes are needed — one master process running as root that binds to the port, and one or more worker processes that handle the actual requests. Around 7Mb RSS together on my 64bit VPS (and only 4.5Mb on 32bit VPS). Still beats Apache hands down.
  • Fast. Some benchmarks have shown that Nginx has a slight edge over Lighttpd, but so far I haven’t been able to notice any. Again, much faster than Apache over static file serving or proxying, especially when you turn up the value of keep alive (more than 1 minute for example).
  • Modules. There are many modules available on Nginx. Some very useful, and some are just plain weird. While lighttpd has Lua embedded, you can now also embed the whole Perl interpretor inside Nginx.
  • Better Rewrite Module. A much better rewrite module than Lighttpd that supports complex conditions. Porting mod_rewrite rules from Apache is actually now feasible without touching the apps themselves.
  • Stable and not leaking. Been running Nginx on a production site doing PHP-FastCGI, and have no issue what so ever.

Cons

  • Lack of community. Where can I find help regarding Nginx? There’s only IRC as far as I know. And while the lead developer writes beautiful code, all documentation were initially in Russian which was a big stumbling block before the English docs came along.
  • No CGI support. Oh well, maybe I am the only one who still hacks small CGI scripts. Apparently Nginx does not spawn CGI or FastCGI processes, which means you need to either (1) convert it into external-spawn FastCGI, or (2) proxy to another web server that does CGI.
  • No simple virtual host support. Lighttpd has mod_simple_vhost and mod_evhost to let you quickly deploy lots of name-based virtual hosts. You can somehow do the same with using $server_name in root and a wild-card in server_name, but it’s still not as clean as lighttpd. At the end you will find Nginx configuration files much more verbose if you run lots of small sites off a single web server.
  • No X-sendfile support. I found Lighttpd’s X-sendfile support very useful when my scripts need to send back large files, and was disappointed to find out that Nginx does not have it. X-Accel-Redirect is different as it requires extra configuration on web server, which makes your web-app less portable.

Conclusion

I don’t think I am a suitable judge to say which one is better, as (1) I have only been running Nginx for a month, and (2) my level of traffic does not really stress test these high-performing web servers. At the moment I think I like Nginx better purely because it does not leak, and its rewrite module that enables me to run many off-the-shelf open source PHP apps with clean URL.

Again, I might change my mind in 3 months time when I find out more warts about Nginx. We will see.

56 Comments

  1. matt wrote:

    Nice comparison. I was looking to dabble w/ Nginx more, but seems Lightty is still best for my needs.

    January 10, 2007 @ 3:40 pm

  2. Tomislav wrote:

    Nginx has a mailing list where you will get an answer (from the author) within 24h. It also has X-Accel-Redirect which is similar to X-sendfile.

    January 10, 2007 @ 6:05 pm

  3. scotty wrote:

    Tomislav — although you can achieve the same with X-Accel-Redirect, but I think their philosophy are different.

    With X-sendfile, you can basically return any file on the local FS from your scripts, and no configuration is needed on the web server side. I think it is more flexible, but require smarter scripts to figure out where private data is.

    With X-Accel-Redirect, the scripts and web server configuration are sharing that responsibility. Scripts tell the web server which file to return from that predefined location, and then web server can actually define the root of that location, abstracting from the scripts. I guess the idea is, you can just copy the scripts onto different deployment without modification, and you just need to configure web server properly.

    However I found it is not often the case, as the scripts usually also need to know the absolute location of the files to return. For example I need to check the existence, size, geometry if it is an image file, etc. You might as well provide X-sendfile since scripts already know the absolute path to the files.

    January 10, 2007 @ 8:40 pm

  4. Scott Lamb wrote:

    The embedded languages sound more novel than useful. As they say in the mod_magnet documentation:

    Keep in mind that the magnet is executed in the core of lighty. EVERY long-running operation is blocking ALL connections in the server. You are warned. For time-consuming or blocking scripts use mod_fastcgi and friends.

    So nothing CPU-intensive, and no database access. (The latter’s not a fundamental restriction – it’s possible to write a non-blocking one that returns to the server’s event loop while waiting for a response – but it’s difficult enough that probably no one will bother any time soon. Just use mod_fastcgi as they suggest.)

    The memory leak sounds like a deal-breaker, but at the same time, it should be easy to fix.

    January 11, 2007 @ 2:14 am

  5. scotty wrote:

    Thanks Scott.

    The same is said about Nginx’s embedded Perl module. You won’t expect to write any “application” that would block the whole web server for significant amount of time, but they are good replacement for otherwise lacking mod_rewrite support.

    January 11, 2007 @ 4:21 am

  6. Anonymous wrote:

    Did you tried what says here in the last comment?

    http://trac.lighttpd.net/trac/ticket/758

    January 11, 2007 @ 11:38 am

  7. Diego wrote:

    You should test out LiteSpeed web server as well.

    http://www.litespeedtech.com/

    Have been using them since 2004 and they are top notch even with the free as in beer version.

    January 31, 2007 @ 7:06 am

  8. scotty wrote:

    Testing out Litespeed is certainly on my list of todo’s. Thanks for reminding! :)

    January 31, 2007 @ 7:11 am

  9. Dan Kubb wrote:

    I second the suggestion for Litespeed. I have a 256 MB VPS at slicehost, and Nginx + 2 Mongrel process w/Rails didn’t leave alot of memory for other things. I switched to Litespeed + Rails via LSAPI, and I found it uses alot less memory running the same number of ruby processes.

    I’ve tested alot of front-ends like Apache, LightTPD, Nginx with different backends (Mongrel, FastCGI, SCGI) and Litespeed is the most memory efficient way I’ve found to run Rails.

    February 6, 2007 @ 4:55 am

  10. zap wrote:

    I don’t know about nginx 0.5.5, but latest as-of-today (I think) 0.5.10 has pretty easy to use virtual server support. I don’t know how you missed it, you just have to define multiple server{} sections, and inside each of them use the appropiate listen/server_name directives. I’ve ported my 5 sites running on same server pretty easy, some even using https (but on a port different from 443, of course).

    February 9, 2007 @ 11:54 am

  11. scotty wrote:

    zap — yes the ”server” statement was there in 0.5.5. In fact I was surprised that how fast Nginx progressed over the last couple of weeks (I’m running 0.5.11 on two production sites now).

    I am actually talking about “massive” virtual hosting, where you can add new virtual hosts without modifying the configuration file. Nice for a blog farm :)

    For example, Lighttpd’s mod_simple_vhost or Apache’s mod_vhost_alias.

    February 9, 2007 @ 12:21 pm

  12. Anonymous wrote:

    nginx question. Do you know how to configure nginx to not serve up certain files. I tried for a while, but the only solution I could come up with was to use location and do a return a 403 when on a directory match. This however will break things like drupal with .css files in misc,modules and themes. I’m back on lighttpd till I figure this out.

    February 15, 2007 @ 1:55 am

  13. scotty wrote:

    Sorry not sure what you meant? Are you looking for functionality similar to X-send-file? Or looking for rewrite rules for Drupal? Can you give me an example?

    February 15, 2007 @ 3:19 am

  14. Anonymous wrote:

    Setup a drupal site using nginx and try to prevent a user from downloading the source of say http://www.example.com/modules/node/node.module . This would be a drupal rewrite rule.

    February 15, 2007 @ 12:13 pm

  15. Anonymous wrote:

    I’m no pro by any stretch and learning more every day but wouldn’t the following rule work (modules directory for example)?

    location /modules {
    allow 10.0.0.0/24; # my local network
    deny all;
    }

    February 15, 2007 @ 5:13 pm

  16. Anonymous wrote:

    Actually scratch that earlier post, this might offer some assistance to get the brain juices flowing. Since I’m about to bring a dedicated drupal server online using nginx I thought you brought up a great point.

    February 15, 2007 @ 5:33 pm

  17. Anonymous wrote:

    How about this:

    For valid_referers use all ip addresses associated with that web server.

    location /modules/ {
    valid_referers 10.0.0.10 10.0.0.11;
    if ($invalid_referer) {

    return 403;

    }
    }

    February 15, 2007 @ 7:03 pm

  18. Anonymous wrote:

    Here is what I ended up doing.

    location ~* /(modules|themes)/ {
      if (-f $request_filename) {
       rewrite \.(module|inc|info|engine|sql|sh)$  / permanent;
      }
    }
    

    (there is a blackslash before the “.” after the rewrite.

    February 16, 2007 @ 1:07 am

  19. KpoH wrote:

    I am actually talking about “massive” virtual hosting, where you can add new virtual hosts without modifying the configuration file. Nice for a blog farm :)

    For “mass hosting” you just need in main config file

    include vhosts/*.conf

    and some “vhost.conf” files.

    To make nginx reload config kill -HUP pid. In that configuration you can have as many vhosts as you want.
    It’s easy can be implemented an frontend like web-interface to generate those files.

    May 22, 2007 @ 4:36 pm

  20. rogerdpack wrote:

    Did you find any speed differences?
    I assume it uses least memory mostly because it replaces apache with a small exec — litespeed. Good for least mem used :)

    May 25, 2007 @ 12:05 am

  21. Nicholas Orr wrote:

    I’ve just setup litespeed 3.2 on my new slicehost vps (finally getting around to setting this vps up!)

    It works pretty well & I like the webadmin interface its awesome!

    Had a bit of a tricky time getting PHP5 w/ lsapi going since its not in portage and I’d never compiled anything without before…

    Not sure on how secure in a shared environment it would be since I use the same username for each site in under the same grouping. This means that even though they are separate the files have the same permissions…
    Under Apache2.2 I ran mpm-itk and every vhost ran as a different group:user and my assigned permissions on the files was like this
    – unix = znd
    – www user = znd:apache02
    – www chmod = 660
    + unix user = john
    + www = john:apache03
    + www chmod = 660
    – unix user = john
    – www user = john:apache04
    – www chmod = 660

    So as you can see each vhost couldn’t read anyone else’s files, just wasn’t possible :)

    July 19, 2007 @ 8:30 am

  22. Nicholas Orr wrote:

    I just read that comment again and it doesn’t make sense (the last bit so here it is again)

    http://www.domain1.tld
    unix user = znd
    vhost user = apache20:apache20
    unix file permisson = znd:apache20 | -rw-r-----
    +====+
    http://www.domain2.tld
    unix user = znd
    vhost user = apache21:apache21
    unix file permisson = znd:apache21 | -rw-r-----
    

    Right so as can be seen clearly now same unix user account to access the files and change whatever – but different vhost userid:group so each vhost simply cant read any other files :)

    July 23, 2007 @ 1:52 pm

  23. Steven wrote:

    I’m looking for a url re-write solution for WordPress MU, where i can get both subdomains and directories fully under control.
    e.g. chicago.myspace.com and myspace.com/chicago-the-band

    can one of the two solutions above accomplish that? I know with MU and Apache, I can only do one or the other.

    August 22, 2007 @ 5:02 pm

  24. Anonymous wrote:

    Tomislav — although you can achieve the twin with X-Accel-Redirect, but I think their philosophy are different.

    With X-sendfile, you can basically return any file on the local FS from your scripts, and no brand is needed on the web server tribesman. I think it is more flexible, but require smarter scripts to figure out where private data is.

    With X-Accel-Redirect, the scripts and web server keynote are concord that responsibility. Scripts tell the web server which file to return from that prebaptized fix, and then web server can actually seal the root of that disposition, abstracting from the scripts. I guess the fantasy is, you can consistent come close the scripts onto different collation without modification, and you fair and square need to configure web server properly.

    However I conceive it is not regularly the case, as the scripts by and large also need to the specifics the absolute alhomecroft of the files to return. For call to mind I need to bridle the existence, spaciousness, geometry if it is an definition file,

    August 26, 2007 @ 12:01 pm

  25. Anonymous wrote:

    Performed quite aggressive test with MILLIONS of http queries and UNABLE to confirm memory leaks at least in lighttpd core itself + basic modules allowed.

    My test machine (actually my destop)
    Configuration: AMD Athlon 64 3800+ x2, 1Gb RAM, etc.
    OS: Kubuntu x64 version 7.10
    Server: lighttpd 1.4.18 (default version taken from repositories, seems to be current one)
    Enabled modules are: “mod_alias”, “mod_access”, “mod_status”, “mod_accesslog”

    What has been done?
    1) I did measured memory consumption immediately after daemon startup. Not a completely fair since this is not a “cruiser mode” (no users served) but let’s be completely honest and fair, we will measure ALL memory eats.So here we go.All sizes are in Kb.
    VmSize: 45460
    VmRss: 924

    2) I’d performed 1 000 000 http requests, 500 in parallel. Using localhost for speed. I did used http_load tool (great tool to load servers from thttpd web site). File urls – contain just one URL – single file near 4.7Kb in size.

    Command was the following:

    # ./http_load -checksum -verbose -parallel 500 -fetches 1000000 urls
    --- 60.0011 secs, 565964 fetches started, 565934 completed, 30 current
    1000000 fetches, 500 max parallel, 4.804e+09 bytes, in 105.534 seconds
    4804 mean bytes/connection
    9475.63 fetches/sec, 4.55209e+07 bytes/sec
    msecs/connect: 4.04955 mean, 9003.25 max, 0.024 min
    msecs/first-response: 1.26731 mean, 659.119 max, 0.129 min
    HTTP response codes:
      code 200 -- 1000000
    

    3) Looking on eatten memory:
    VmSize 47036
    VmRss 2692
    Hey?Looks like memory leak? Really? Nope, wrong! Just entering “cruiser mode”. Some proof needed, yeah? Easily.

    4) Re-running this command yet another 4 times.That is it. 4M http requests done.

    5) Let’s look on memory, it’s all about RAM, right?
    VmSize 47036
    VmRss 2692
    Wow.There is still 2692Kb eatten on my x64 system and number is no longer increases, even after some extra 4M requests were served.So, this value is stable enough.

    Repeated more and more.9M requests done,No change.Cool, yeah?So where are the leaks?And wtf I have to restart server?

    P.S. I can assume some module may be flawed. Also looks like server can increase memory usage once this needed and not willing to give memory back.Still not a memleak though since this value is static and never grows and only depends on amount of simultaneous connections and file size to send.Yes, if you’ll do bigger files transfers in a millions, more memory can be wasted (I seen up to 48Mb VmSz and ~4Mb VmRss on 1 000 000 fetches of 9Mb file, 500 in parallel).However this is a peak value, it DOES NOT increases ever.You can re-execute millions requests again and again, nubmer will not increase.

    November 13, 2007 @ 4:28 am

  26. scotty wrote:

    What is handled at that URL? Static file? FastCGI? Proxy?

    On my slicehost VPS, Lighttpd 1.4.18 on Gentoo grows around 4MB VmRSS every 100k requests (mix of static files + FastCGI). I haven’t had time to figure out the issue…

    November 13, 2007 @ 4:40 am

  27. Anonymous wrote:

    I’d used one simple static ~4.7Kb file for “fast” test as an URL, over 10 M requests were issued in total for this file without any proof of memleaks. Also attempted with 1M of downloads for bigger 9Mb file (only one launch with memory consuimption monitoring since this test takes some noticeable time). Still no memleaks. If you have better idea about files set you’re welcome. However please be aware that I do not have too many time for tests, that’s a hobby.Nothing more.At very most all reward I can have from it is filing bug report if memleak proven and using fixed version.Duh.Still pretty nice reward though.

    Actually my test is very basic and just covers only core server functionality i.e. ability of server to handle HTTP requests on a quite loaded server without issues for himself.

    Did not tested FastCGI so far as well as lots of other modules. This requires decent amount of time and it is not looks like I have enough of it right now. However in long term I will be glad to catch issue and file bug report.Or, better you can do this, too :)

    November 14, 2007 @ 2:13 am

  28. Aaron Mason wrote:

    You can do CGI, but you must do it through FastCGI. There’s a guide in the nginx wiki on how to do it. The script may need a few mods, but it works and it wouldn’t be difficult making it start on boot.

    January 24, 2008 @ 7:51 am

  29. fak3r wrote:

    likely outdated by now, but there are plenty of good howtos out there to get nginx to do cgi/fastcgi:

    I did it on lenny the other night from a fresh install; took about 20 mins. Great article btw, care to write an update since this is now +1 year old?

    February 7, 2008 @ 6:29 pm

  30. scotty wrote:

    I never said that it cannot do FastCGI. What Nginx can’t do is spawning FastCGI processes within Nginx. Not that it is always a good thing, but I would like to see something like the FastCGI processes life time management in mod_fcgi for Apache

    And Nginx still can’t do CGI. There are plenty of CGI-only apps out there that do not want to be converted to FastCGI (for example, only executed a few times a day). But then it is easy for Nginx to proxy those CGI requests to a small webserver (boa, thttpd) just to serve CGI traffic.

    February 7, 2008 @ 10:48 pm

  31. David wrote:

    It’s been over a year since this article was posted; it would be nice if we find up-to-date comparisons, including all the developments in lighty & nginx within the past year.

    I’ve started a wiki to do just that: Lighttpd vs Nginx – WikiVS

    February 11, 2008 @ 7:25 pm

  32. scotty wrote:

    Thanks David.

    February 11, 2008 @ 10:29 pm

  33. Erick wrote:

    It looks like the article is old even if the comments are recent. But it would nice to hear how people compare Nginx, Lighttpd, and Litespeed. The last one seems to be the easiest to get running and administer with a web based tool. Does anyone have any experience servicing these machines on WHM/Cpanel servers?

    February 19, 2008 @ 6:40 am

  34. sms wrote:

    is there any table overview to see what does the webservers have or not?

    April 29, 2008 @ 7:04 am

  35. ezequiel wrote:

    Nice comparsion. What is your opinion about Nginx now? I am really surprised that soup.io uses it, so it seems to be good!

    April 30, 2008 @ 1:32 pm

  36. Calomel wrote:

    The speed differences between Lighttpd and Nginx are minimal. In truth, both web servers do a good job if you set them up properly. What makes Nginx better is the active development community, the author’s attention to security and the flexible scripting language in the config file. This allows you to filter traffic on pattens you see in the log file and secure your server from malicious bots and bad clients.

    Nginx web server "how to" (nginx.conf)
    http://calomel.org/nginx.html
    
    May 6, 2008 @ 3:18 pm

  37. kpss wrote:

    Nice comparsion. What is your opinion about Nginx now? I am really surprised that soup.io uses it, so it seems to be good!

    July 29, 2008 @ 6:21 am

  38. kpss wrote:

    Nginx web server “how to” (nginx.conf)
    http://calomel.org/nginx.html

    July 29, 2008 @ 6:22 am

  39. Rajeev Jha wrote:

    You do not have to create and include a new vhost file. Nginx allows you to create document roots with $host in it. So your requests for a domain can be served from /var/www/$host file , which is of course dynamic and very much like Apache mod vhost alias. You can also have catch-all domains entry So That way I do not think Nginx lacks in anyting.

    November 27, 2008 @ 5:19 pm

  40. kraloyun wrote:

    Nice comparsion. What is your opinion about Nginx now? I am really surprised that soup.io uses it, so it seems to be good!a

    April 6, 2009 @ 9:49 pm

  41. j0rd wrote:

    SO what’s the verdict now? Lighttpd or Nginx . I’m also curious as I’m about to launch this endeavor myself. I would have used lighttpd, but what I’ve been reading seems to suggest people use nginx now.

    I’ve always used apache with mod_proxy and sending static requests to lighttpd. Speeds everything up and lowers memory significantly. But now that I’m going to a VPS with a smaller memory footprint I might go for just Lighttpd or Nginx.

    May 21, 2009 @ 4:25 pm

  42. scotty wrote:

    I am now using Nginx on pretty much all my servers now. The progress of Lighttpd development seems to be stalled…

    May 27, 2009 @ 2:17 pm

  43. SP wrote:

    Interesting article, out of interest, how did you measure the memory usage of the webservers? Did you use a specific linux command?

    June 29, 2009 @ 7:26 am

  44. the_guv wrote:

    well Tom, maybe it was! Thing is, Lighty’s a bit resource-heavy, or so I’ve found on a 360MB Linode. Strikes me Nginx is being better developed these days too, although it could do with a control panel mod for something like Webmin. I realise your comment is old ..

    Scotty .. big cheers for this post. It influenced my choice when i looked beyond Apache. Just finished a VPS/Nginx setup series and referenced your article for newbies to consider their web server options ..

    Set Up Unmanaged VPS (4 Newbies) – VPS BIBLE Part 1: VPS vs Shared vs Dedicated

    My question is, do you have an updated opinion? I notice you’re using Nginx for all your sites now. I’d be fascinated to hear a review of the why’s and wherefore’s.

    Best to you.

    July 27, 2009 @ 2:17 pm

  45. Hugo wrote:

    You should also take a look at Hiawatha (http://www.hiawatha-webserver.org/). It has been written with security in mind. It’s easy to configure, fast and lightweight.

    August 9, 2009 @ 11:24 am

  46. Debianero Rumbero wrote:

    @Hugo, Hiawatha only supports officially MacOS X, Windows and FreeBSD packages.

    There isn’t official Linux package.

    October 29, 2009 @ 4:32 am

  47. Debianero Rumbero wrote:

    Note: Sorry, there’s a source tarball for Linux ^_^

    October 29, 2009 @ 4:33 am

  48. Light Weight Web Servers (Light weight alternatives to apache) « Shabir Imam’s Blog wrote:

    [...] two most popular free, open source light weight web servers. Comparison between these two servers: http://hostingfu.com/article/nginx-vs-lighttpd-for-a-small-vps http://superjared.com/entry/benching-lighttpd-vs-nginx-static-files/ [...]

    November 25, 2009 @ 8:58 pm

  49. Some Guy wrote:

    LOL @ Debianero Rumbero comment #46 in response to #45. Hugo wrote Hiawatha so he knows best =) Nice server by the way.

    December 7, 2009 @ 2:49 pm

  50. knight online wrote:

    The progress of Lighttpd development seems to be stalled.

    April 9, 2010 @ 1:04 am

  51. Björn Lindqvist wrote:

    No it hasn’t stalled, but there hasn’t been any major release for over three years. But the lighty 1.5 rewrite that was undertaken sometime in 2006 still hasn’t delivered any releases. The 1.4 release line is actively maintained though, with the 1.4.26 release as late as Feb 26, 2010. I’d still take nginx over lighty any day because of the much faster development speed.

    May 11, 2010 @ 10:34 pm

  52. Bop wrote:

    @Björn – so it is stalled then ;-)

    June 4, 2010 @ 7:03 am

  53. Jason Balasuriya wrote:

    We have switched all our production and hosted customer sites over to NginX – Seems lighttd is in it’s last throws…

    June 4, 2010 @ 7:06 am

  54. Jason Balasuriya wrote:

    BTW Scotty – how about an updated article on this ? OR just an experts view on setting up NginX? We would be well up for that :-)

    June 4, 2010 @ 7:09 am

  55. How To Configure Hunchentoot Behind Nginx – kadir pekel wrote:

    [...] has not been solved yet (#758). Also you should read a post about nginx vs .lighttpd comparison here. For more information please refer to the survey of web server usage statistics at netcraft because [...]

    June 24, 2010 @ 5:56 pm

  56. Buzzknow wrote:

    any update of this review?

    i would like to know how ur think about nginx today :)

    regards

    October 27, 2010 @ 1:51 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

0

0
 

site tracking with Asynchronous Google Analytics plugin for Multisite by WordPress Expert at Web Design Jakarta.