XML is often referred as "self-describing data". It is designed in a way that the schema is repeated for each record in the document. On one side, this self describing feature grants the XML great flexibility to be used in many application domains. However, on the other side, it introduces the main disadvantage of XML documents to be huge in size. The huge size problem means that the amount of data that has to be transmitted, processed and stored is often larger than the other data formats.
Several XML compression tools have been proposed in the literature of the recent years. Using XML compressing techniques can have many advantages such as: reducing the network bandwidth required for transmitting the XML data, reducing the disk space required for storing the original documents and minimizing the main memory requirements for representing the XML documents.
This project presents an extensive experimental evaluation and benchmarking for the state-of-the-art of XML compression tools. The XML corpus of this project consists of 57 documents which are covering the different natures and scales of XML documents. The experiments are executed on two different platforms: one with high computing resource and the other with limited computing resources. The experiments are evaluating the XML compressors with 3 metrics: Compression Ratio, Compression Time and Decompression Time. The big part of the experiments have been executed using the default parameters of the evaluated compressors. A part of the experiments of this project have been executed using tuned values of the level of compression parameter provided by some compressors.