De Novo Assembly features
Support for both short read and long read assembly, as well as paired end data.
Cross-platform support
CLC bio's
de novo assembly supports all major types of sequencing data: Sanger, Illumina, Roche/454, and LIFE Technologies/SOLiD

. We will continue to expand with additional types support, including more comprehensive SOLiD-support. We are also working actively with 3
rd Gen Sequencing companies to support their technologies, once they're available to the market.
Hybrid assemblies
As one of the very few assembly algorithms in the market, CLC bio's assembler allows you to join data from different sequencing technologies into the same analysis, including traditional Sanger sequencing data.
High-speed assemblies
Due to utilization of SIMD instructions to parallelize and accelerate the analysis, CLC bio's
de novo assembly algorithm is by far the fastest assembler at present. On large assemblies, e.g.
de novo assembly of human genomes or plants, using paired-end data, the CPU-time is most often less than 1/10 of other assemblers.
Modest hardware requirements
Unlike all other assembly algorithms in the market, CLC bio's
de novo assembler requires only a large computer to run
de novo assemblies of large datasets. For example you can assemble a human genome on a single computer with 8 processor cores and 48 gigabytes of RAM in only 7 hours.
Check out our
benchmarks to see what other algorithms require for the same assembly.
Upcoming features
CLC bio always keeps a high frequency of updates to our software, and this is no different with our
de novo assembly algorithm. Other than improving the overall quality with various types of data, these features are planned for upcoming versions:
- Report scaffolds
- Remove duplicates
- Quality filtering
- Better documentation of recommended workflows on various data sets
Paired-end SOLiD data is applied as a second step in the de novo assembly to link/merge/expand contigs generated by using Sanger, Illumina and/or Roche/454 data.