Meta Segment Anything Model Audio

(ai.meta.com)

200 points | by megaman821 2 days ago

15 comments

blagie 48 minutes ago
For future ML developers: A post like this should include system requirements.
It's not clear from the blog post, the git page, and most other places if this will run on, even in big-O:
* CPU
* 16GB GPU
* 240GB server (of the type most business can afford)
* Meta/Google/Open AI/Anthropic-style data center
hbn 11 hours ago
I hope we keep making progress in isolating tracks in music. I love listening to stems of my favorite songs, I find all sorts of neat parts I missed out on. Listening to isolated harmonies is cool too.
[-]
- TacticalCoder 10 hours ago
  It shall also allow to make re-recordings in higher quality of stuff that are impossible to find in good quality. Like that cover that that band played only once at that obscure concert and that was recorded on an old tape. Or many very old reggae songs: although many from Jamaica/Kingston had great recordings (there was know-how and great recording studios there) there's also a shitload of old reggae songs that are just barely listenable to because the recording is so poor (and, no, it's not an artistic choice by the artist: it's just, you know, a crappy recording).
locusofself 9 hours ago
As someone recording myself playing music, I've been meaning to see if any of these tools are good enough yet to not only separate vocals from another instrument (acoustic guitar for example), but do so without any loss of fidelity (or least not a perceivable one).
The reason I'm interested in this is because recording with multiple microphones (one on guitar, one on the vocal), has it's own set of problems with phase relationship and bleed between the microphones, which causes issues when mixing.
Being able to capture a singing guitarist with a single microphone placed in just the right spot, but still being able to process the tracks individually (with EQ, compression, reverb, etc), could be really helpful.
ortusdux 11 hours ago
Would be great for the hearing impaired and CAPD sufferers when combined with Meta glasses or the like.
[-]
- djabatt 10 hours ago
  very cool idea
moss_dog 4 hours ago
This is incredible! I wouldn't have thought it was possible to cleanly separate tracks like that. I wonder to what extent the model is filling in gaps, akin to Samsung's "ultra zoom" moon.
kace91 7 hours ago
Funny that:
- This feature is awesome for sample-based music
- Sample music is not what it was due to difficulties related to legal rights
- This model was probably created by not giving a damn about said rights
tasty_freeze 7 hours ago
I use moises frequently for track separation for learning songs. It does pretty dang well. I was shocked that the score of moises is ranked way worse than just about everything else, including lalal.ai, which I also used before buying moises. Perhaps lalal.ai has gotten better since I last tried it.
[-]
- Reubend 7 hours ago
  Maybe I'm totally misinterpreting, but the chart I'm looking at says "Net Win Rate of SAM Audio vs. SoTA Separation (text prompted)", so perhaps a lower number means that the alternative model is better?
cyberax 7 hours ago
Can this be used to nuke the laugh tracks?!?
[-]
- helpfulclippy 6 hours ago
  It’s really a shame how popular it was to mar shows with this… I saw a DVD set of a show once with a no-laugh-track version. It sucked because the actors pause for the laughs after each line. This is bad enough with the laugh track in place, but if it’s just dead air it makes every scene feel awkward.
  [-]
  - AmbroseBierce 1 hour ago
    AI can remove those pauses by the actors too so maybe that would work.
  - cyberax 1 hour ago
    I don't even mind awkward pauses. I tried using the laugh track silencer on an episode of Black Adder, and it worked out OK.
htrp 2 days ago
super amazing demo performance being able separate out music voice and background noises. do you have to explicitly specify what type of noise to separate?
mwmisner 11 hours ago
Playing with the background I tried to Isolate just the espresso machine and the train sounds in one of their demos and it seemed to fail. Maybe not the desired use case, but I thought it was odd that I could break it so easily on the sample material.
[-]
- gpm 10 hours ago
  Footsteps worked pretty well when I tried that on the other hand. I wonder if lot of it has to do with how well the model understands what the english description of the sound should sound like...
  [-]
  - freeflow1448 2 hours ago
    i do think that’s the case. i tried a few different ways to write x and got meaningfully varied results
Escapade5160 7 hours ago
From my brief testing in the playground, it is not very good. Maybe it needs better prompting than the 1 word examples.
[-]
- gpm 6 hours ago
  For me it either worked great or not at all. Extracting footsteps, the air conditioner noise, voices, one particular persons voice (identified by gender), all worked great (across multiple clips for most of those).
  A few prompts failed almost entirely though, "train noises", "background noise" and "clatter"... so definitely sensitive to either prompting or the kind of noise being extracted.
brador 2 hours ago
How about they fix their MusicGEN model on hugging face first.
nmstoker 5 hours ago
Would be interesting to leverage the non spoken/environment noises to guide what level of detail and style of speech a chatbot replied with, for instance being more casual, gentle, with a touch more detail if in a quiet home/office environment, but more curt and concise with emphasized diction if the person is traveling, such as in a noisy train concourse. People tend to do that subconsciously but bots ignorantly wittering on can be annoying and hard to use because they miss the cues.
almosthere 10 hours ago
mSAMA haha, get it
emsign 2 days ago
[flagged]
[-]
- dang 11 hours ago
  "Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative."
  "Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."
  https://news.ycombinator.com/newsguidelines.html
- qoez 11 hours ago
  Basically the same thing musicians said about the synth and music made by computers back in the day
  [-]
  - motoxpro 11 hours ago
    100%. The music world has gone through the "but what will we do now?" at least 6-7 times. Music videos ("video killed the radio star"), sampling, the DAW (and time aligning), home studios, auto tune, plugins and amp simulators, napster/piracy, etc, etc.
    [-]
    - alex1138 3 hours ago
      > napster/piracy
      This one rankles me because of a) the benefits piracy has (third world consumers can now discover you, for starters) and b) the absolute bad faith way in which the industry acts, screwing over artists, unethically going after Pirate Bay by making it into a trade war with Sweden (I think)
- subdavis 1 day ago
  That’s pretty much been the story since the Neolithic revolution though?
  [-]