尝试AI数字人项目

cheva (78)超哥in STEEM CN/中文 • 2 years ago (edited)

最近又在研究几个有意思的AI项目,其中一个叫Sadtalker。这个项目非常有意思,它的功能就是通过一段音频,让静止的照片动起来。也就是说,用语音来驱动照片生成视频。用这个项目就可以非常方便的做一个用于自媒体的数字人。首先只需要用Stable Diffusion生成一张美女的图片。然后再用成本转语音的工具生成一段音频,就可以做成一个自媒体播报主持人了。

这个项目其实出来也有一段时间了,大概有半年了吧。它在GitHub上的仓库说明写得非常的清楚,按照说明很容易就在本地部署了这个软件。然后为了操作方便,在AI的帮助下写了一个简单的图形界面,只要选取想要使用的图片和音频，然后点生成就可以了。

当然和所有的AI软件一样,它对显卡GPU的要求还是比较高的。生成出来的画面,分辨率也比较低,需要再经过人脸增强的AI进行增强。这个因为是视频，就耗时非常长了,我在一台显卡一般的电脑上试了一下。生成四到五分钟的视频,在这台显卡较差的电脑上大约需要跑到六个小时。不过换一台显卡比较好的电脑,可以在一个小时左右就生成出到六分钟的视频。于是我把以前发的一个关于拿破仑的帖子,当作文案制作了一段美女主播的视频,发在一个很久都没操作过的油管频道上。没想到,两天之内就有上百个点击,看来美女才是第一生产力啊。

考虑要不要用这种方式运营一个YouTube频道,不过看了一下YouTube创收的条件要求还是蛮高的,要有五百人的订阅和一年三千小时以上的播放量。好在使用像Sadtalker这种技术,需要投入的成本和精力也不算太大,所以在频道初创期没收入的时候，还是很适合的，比较容易坚持下来。

不过就在几天之前,阿里巴巴集团发布了一个类似的数字人应用,叫Dreamtalker,不过它仅仅是实现了把图片的头部裁切出来,然后用声音驱动,不过它的效果貌似要更好一些,而且不光是说话,还可以用唱歌来驱动图片,但是如果只有头部,而且分辨率很低的话,其实用性还是赶不上有比较完整工作流的sadtaler,反正AI领域发展很快,就静观其变吧。

I've been working on a couple of interesting AI projects lately, one of which is called Sadtalker. This project is very interesting, and its function is to make a still photo move through an audio clip. That is, using voice to drive photos to generate video. With this project, it is very convenient to make a digital person for we-media. First you just need to create a picture of a beautiful woman using Stable Diffusion. And then use the cost to voice tool to generate an audio, you can make a media broadcast host.

This project has actually been out for a while, about half a year. Its warehouse instructions on GitHub are very clear, and it is easy to follow the instructions to deploy the software locally. Then, in order to facilitate the operation, I wrote a simple graphical interface with the help of AI, just select the pictures and audio you want to use, and then click Generate.

Of course, like all AI software, its requirements for the graphics card GPU are still relatively high. The resulting picture, the resolution is also relatively low, and it needs to be enhanced by face-enhancing AI. This took a long time because it was video, and I tried it on a computer with a mediocre graphics card. Generating four to five minutes of video takes about six hours on this computer with a poor graphics card. However, a computer with a better graphics card can generate up to six minutes of video in about an hour. So I used a post I had posted about Napoleon as a copywriter to make a video of a beautiful anchorwoman and posted it on a YouTube channel I hadn't operated in a long time. Unexpectedly, within two days there are hundreds of clicks, it seems that beautiful women are the first productivity ah.

Consider running a YouTube channel in this way, but look at the requirements for YouTube to make money is quite high, to have 500 subscribers and more than 3,000 hours of streaming a year. Fortunately, the cost and effort required to use such technology as Sadtalker is not too large, so when the channel has no income in the initial stage, it is still very suitable and relatively easy to stick to.

But just a few days ago, Alibaba Group released a similar digital application, called Dreamtalker, but it is only to achieve the head of the picture cut out, and then driven by sound, but its effect seems to be better, and not only speak, you can also use singing to drive the picture, but if only the head, and the resolution is very low If so, its usefulness is still not as good as sadtaler, which has a more complete workflow, but the AI field is moving fast anyway, just wait and see.

#cn #whalepower #lifestyle #cn-reader #life #stemsocial #ai #zzan #dblog #diamondtoken #marlians #upfundme #actnearn