When to use a custom classpath?

ReadTools relies on several libraries using Service Provider Interfaces (SPI) for extensible applications. Some common use cases to add an extension in ReadTools are:

  • java.nio.file.spi.FileSystemProvider for IO operations in different file systems.
  • org.apache.hadoop.io.compress.CompressionCodec for custom compression for IO in Hadoop.
  • Other Hadoop services.

How to run ReadTools with a custom classpath

A list of jar files separated by : should be provided to the -cp option of java in addition to the ReadTools.jar. For example, to include one or two services (packaged in service1.jar and service2.jar):

# only service 1
java -cp ReadTools.jar:service1.jar org.magicdgs.readtools.Main
# service 1 and 2
java -cp ReadTools.jar:service1.jar:service2.jar org.magicdgs.readtools.Main

Bundled services

ReadTools jar file already packages several SPI extensions in its main jar, providing out-of-the-box support for:

Example usage: 4mc compression for distmap

One common usage of the custom classpath is to support in your Hadoop cluster non-default compression format, which integrates with the Distmap pipeline.

For example, 4mc compression would make upload/download faster. You can download the packaged jar (e.g., hadoop-4mc-2.0.0.jar). File names ending in .4mc would be output as compressed files with this compressor if run as following:

java -cp ReadTools.jar:hadoop-4mc-2.0.0.jar org.magicdgs.readtools.Main \
         ReadsToDistmap -I input.bam -O hdfs://output.4mc