When to use a custom classpath?
ReadTools relies on several libraries using Service Provider Interfaces (SPI) for extensible applications. Some common use cases to add an extension in ReadTools are:
java.nio.file.spi.FileSystemProvider
for IO operations in different file systems.org.apache.hadoop.io.compress.CompressionCodec
for custom compression for IO in Hadoop.- Other Hadoop services.
How to run ReadTools with a custom classpath
A list of jar files separated by :
should be provided to the -cp
option of java in addition to the ReadTools.jar
. For example, to include
one or two services (packaged in service1.jar and service2.jar):
# only service 1
java -cp ReadTools.jar:service1.jar org.magicdgs.readtools.Main
# service 1 and 2
java -cp ReadTools.jar:service1.jar:service2.jar org.magicdgs.readtools.Main
Bundled services
ReadTools jar file already packages several SPI extensions in its main jar, providing out-of-the-box support for:
- Hadoop File System (HDFS) paths
- Google Cloud Storage (GCS) paths
- Hadoop defaults
Example usage: 4mc compression for distmap
One common usage of the custom classpath is to support in your Hadoop cluster non-default compression format, which integrates with the Distmap pipeline.
For example, 4mc compression would
make upload/download faster. You can download the packaged jar (e.g.,
hadoop-4mc-2.0.0.jar
).
File names ending in .4mc
would be output as compressed files with this
compressor if run as following:
java -cp ReadTools.jar:hadoop-4mc-2.0.0.jar org.magicdgs.readtools.Main \
ReadsToDistmap -I input.bam -O hdfs://output.4mc