this post was submitted on 10 Mar 2024
22 points (89.3% liked)
Open Source
31761 readers
198 users here now
All about open source! Feel free to ask questions, and share news, and interesting stuff!
Useful Links
- Open Source Initiative
- Free Software Foundation
- Electronic Frontier Foundation
- Software Freedom Conservancy
- It's FOSS
- Android FOSS Apps Megathread
Rules
- Posts must be relevant to the open source ideology
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
- !libre_culture@lemmy.ml
- !libre_software@lemmy.ml
- !libre_hardware@lemmy.ml
- !linux@lemmy.ml
- !technology@lemmy.ml
Community icon from opensource.org, but we are not affiliated with them.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Bioinformatics perhaps? But I think a lot of those are just specific analyses done in notebooks. Being able to submit a notebook and have it computed though would be pretty handy I Imagine
This is an excellent field for our proof of concept, just looking for a specific app to start with.
I'm a bioinformatician. The problem with using bioinformatics software here is that the input or output data size is huge for most tasks, which makes submitting jobs off site much more difficult.
Bacterial genome assembly isn't too bad though. I use Nanopore sequencing data and the input is usually on the order of a few gigabytes per task for an output file of a few megabytes. (pulling numbers outta my butt, but shouldn't be too far off) But the multiplying this by 48 or 96 which is the number of samples out machine can run all at the same time and you're getting into hundreds of gigabytes for input data. It's just tough to manage this with cloud services.
But if you go simpler, you could offer a BLAST server. You just need to host your own database and accept queries. Not sure if you can split it into smaller tasks though. If you segment the main database your p-value results will change.
Is that data compressible? A few GB for an input or output file isn't entirely unmanageable from our perspective. Not ideal, but workable. What are some popular OSS tools used in your field?
snakemake is a popular tool to define analysis workflows for bioinformatics (also another equivalent called nextflow), there is also KBase, which is a webui for running different jobs, not sure if that is open source.