最近看很多人研究语音接口,于是我也来一发。

很早之前想做一个路由器说话的项目,我的路由器是一台X86架构的小型主机,带有音频输出接口,在装系统的时候驱动都装的很全,插上耳机即可放出音乐。

命令行下播放音乐是用mplayer

mplayer filename.mp3

即可。

接下来要做一个文字转语音的接口,text to speech,简称TTS。网上有很多,python也很容易安装,但是只有英文的。
中文语音处理做的不错的应该是科大讯飞了,科大讯飞也有自己的开放平台,上去注册一个帐号,下载一个开发工具。
工具里有appkey,貌似每个开发者下载的开发包都不一样。

打开Example文件夹下面的ttsdemo,make一下,然后有个可执行文件。执行可执行文件,会生成一个pcm文件,使用mplayer即可播放这个文件,发出声音。

下面把ttsdemo.c这个文件改一下:
改成不生成文件,而是直接从标准输出中输出音频流。注意apppkey要填写自己的。


#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>

#include "include/qtts.h"
#include "include/msp_cmn.h"
#include "include/msp_errors.h"


typedef int SR_DWORD;
typedef short int SR_WORD ;

//音频头部格式
struct wave_pcm_hdr
{
        char            riff[4];                        // = "RIFF"
        SR_DWORD        size_8;                         // = FileSize - 8
        char            wave[4];                        // = "WAVE"
        char            fmt[4];                         // = "fmt "
        SR_DWORD        dwFmtSize;                      // = 下一个结构体的大小 : 16

        SR_WORD         format_tag;              // = PCM : 1
        SR_WORD         channels;                       // = 通道数 : 1
        SR_DWORD        samples_per_sec;        // = 采样率 : 8000 | 6000 | 11025 | 16000
        SR_DWORD        avg_bytes_per_sec;      // = 每秒字节数 : dwSamplesPerSec * wBitsPerSample / 8
        SR_WORD         block_align;            // = 每采样点字节数 : wBitsPerSample / 8
        SR_WORD         bits_per_sample;         // = 量化比特数: 8 | 16

        char            data[4];                        // = "data";
        SR_DWORD        data_size;                // = 纯数据长度 : FileSize - 44
} ;

//默认音频头部数据
struct wave_pcm_hdr default_pcmwavhdr =
{
        { 'R', 'I', 'F', 'F' },
        0,
        {'W', 'A', 'V', 'E'},
        {'f', 'm', 't', ' '},
        16,
        1,
        1,
        16000,
        32000,
        2,
        16,
        {'d', 'a', 't', 'a'},
        0
};

int text_to_speech(const char* src_text ,const char* params)
{
        struct wave_pcm_hdr pcmwavhdr = default_pcmwavhdr;
        const char* sess_id = NULL;
        int ret = 0;
        unsigned int text_len = 0;
        char* audio_data;
        unsigned int audio_len = 0;
        int synth_status = 1;
        FILE* fp = NULL;

        printf("begin to synth...\n");
        if (NULL == src_text)
        {
                printf("params is null!\n");
                return -1;
        }
        text_len = (unsigned int)strlen(src_text);
        fp = stdout;
        if (NULL == fp)
        {
                printf("open stdout error\n");
                return -1;
        }
        sess_id = QTTSSessionBegin(params, &ret);
        if ( ret != MSP_SUCCESS )
        {
                printf("QTTSSessionBegin: qtts begin session failed Error code %d.\n",ret);
                return ret;
        }

        ret = QTTSTextPut(sess_id, src_text, text_len, NULL );
        if ( ret != MSP_SUCCESS )
        {
                printf("QTTSTextPut: qtts put text failed Error code %d.\n",ret);
                QTTSSessionEnd(sess_id, "TextPutError");
                return ret;
        }
        fwrite(&pcmwavhdr, sizeof(pcmwavhdr) ,1, fp);
        while (1)
        {
                const void *data = QTTSAudioGet(sess_id, &audio_len, &synth_status, &ret);
                if (NULL != data)
                {
                   fwrite(data, audio_len, 1, fp);
                   pcmwavhdr.data_size += audio_len;//修正pcm数据的大小
                }
                if (synth_status == 2 || ret != 0)
                break;
        }

        //修正pcm文件头数据的大小
        pcmwavhdr.size_8 += pcmwavhdr.data_size + 36;

        //将修正过的数据写回文件头部
        fseek(fp, 4, 0);
        fwrite(&pcmwavhdr.size_8,sizeof(pcmwavhdr.size_8), 1, fp);
        fseek(fp, 40, 0);
        fwrite(&pcmwavhdr.data_size,sizeof(pcmwavhdr.data_size), 1, fp);
        fclose(fp);

        ret = QTTSSessionEnd(sess_id, NULL);
        if ( ret != MSP_SUCCESS )
        {
        printf("QTTSSessionEnd: qtts end failed Error code %d.\n",ret);
        }
        return ret;
}

int main(int argc, char* argv[])
{
        ///APPID请勿随意改动
        const char* login_configs = " appid = xxxxxxxxx, work_dir =   .  ";
        const char* text  = argv[1];
        const char* param = "aue = speex-wb;3, vcn=xiaoyan,  spd = 50, vol = 50, tte = utf8";
        int ret = 0;
        char key = 0;

        //用户登录
        ret = MSPLogin(NULL, NULL, login_configs);
        if ( ret != MSP_SUCCESS )
        {
                printf("MSPLogin failed , Error code %d.\n",ret);
        }
        //音频合成
        ret = text_to_speech(text,param);
        if ( ret != MSP_SUCCESS )
        {
                printf("text_to_speech: failed , Error code %d.\n",ret);
        }
        //退出登录
        MSPLogout();
        return 0;
}

编译一下,改个名叫做say_stdout。say_stdout程序会把命令行参数转换为音频流,从标准输出中输出,接下来是把音频流播放出来。

mplayer具有播放音频流的功能
写一个shell程序,命名为say

#/bin/sh
say_stdout $1 | mplayer -cache 8192 - > /dev/null 2>&1

接下来使用say命令就能让路由器说话了。

say "我会讲话了!"

嘿嘿,还可以调节语速和音量参数。

路由器能说话之后,我让他来个正点报时的功能。
在crontab中增加一行

0 * * * * say "现在时刻`date +%I`点整" | mplayer -cache 8192 - > /dev/null 2>&1

完工!
参看http://open.voicecloud.cn/

======================================

忘了说了,会说话只让他正点报时肯定不爽,于是我又开发了一个自定义内容的说话接口,是一个python 写的 cgi程序。
因为cgi程序运行时不是root用户所以不能发出声音,具体原因我也不清楚,至少是不能通过cgi直接来发音的。
于是我做了一个客户端和一个服务器端,客户端是cgi程序,以www用户身份执行,服务器端是开机启动,并且一直在后台运行,两者之间通过unix的套接字通信。
cgi端(客户端)

#!/usr/bin/python
# coding:utf8
print "Content-type: text/html"
print
import cgi,socket,sys
try:
    form = cgi.FieldStorage()
    content = form.getvalue('saywhat')
except:
    content = ''

if content:
    content = content.replace(' ','_')
    content = content.replace('\t','_')
    content = content.replace('\r','_')
    content = content.replace('\n','_')
    try:
        sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        sock.connect("/tmp/say.sock")
        sock.send(content)
        print sock.recv(1024)
        sock.close()
    except:
        print "Error",sys.exc_type

print "<html>"
print "<head><title>路由器说话</title><meta charset='UTF-8'></head>"
print "<body>"
print "said:", content
print "<form method='post' action=''>"
print "    <input type='text' name='saywhat' />"
print "    <input type='submit' value='SAY' />"
print "</form>"
print "</body>"
print "</html>"

服务器端

#!/usr/bin/python
import socket,os,commands,stat

def main():
    sockfile="/tmp/say.sock"
    sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    if os.path.exists(sockfile):
        os.unlink(sockfile)
    sock.bind(sockfile)
    sock.listen(5)
    os.chmod(sockfile,stat.S_IRWXU|stat.S_IRWXG|stat.S_IRWXO)
    while True:
        connection,address = sock.accept()
        content = connection.recv(1024)
        if content:
            print 'say "%s"' % content
            (rc,rs) = commands.getstatusoutput('say ' + content)
            connection.send(rs)
        connection.close()

if __name__ == "__main__":
    main()

目前即可通过网页来命令路由器讲话了。

假定已经安装了nginx, python2.7, pip, virtualenv

安装uwsgi

pip install uwsgi
ln -s /usr/local/python2.7/bin/uwsgi /usr/bin/

配置nginx的虚拟主机

server
{
    listen 80;
    server_name <SERVER_NAME>;
    root <HOME_DIR>;

    location / {
        try_files $uri @uwsgi;
    }

    location @uwsgi {
        include uwsgi_params;
        uwsgi_pass 127.0.0.1:89;
        uwsgi_param UWSGI_SCRIPT <SCRIPT_NAME>;
        uwsgi_param UWSGI_PYHOME <HOME_DIR>/venv;
        uwsgi_param UWSGI_CHDIR <HOME_DIR>;
    }
    location ~ .*\.(py|pyc|cgi)?$ {
        return 404;
    }
}

SCRIPT_NAME是可执行wsgi的程序名称,并且不带 .py

修改程序,使兼容uwsgi接口
以web.py为例

import web
app = web.application(urls, globals())

if __name__ == "__main__":
    app.run()
else:
    application = app.wsgifunc()

启动uwsgi程序

uwsgi -s :89 -H venv/ -w controller

观察是否有错误,若有错误查看日志进行改正

改为开机自启动

vi /etc/init.d/uwsgi
#! /bin/sh
# chkconfig: 2345 55 25
# Description: Startup script for uwsgi webserver on Debian. Place in /etc/init.d and
# run 'update-rc.d -f uwsgi defaults', or use the appropriate command on your
# distro. For CentOS/Redhat run: 'chkconfig --add uwsgi'

### BEGIN INIT INFO
# Provides:          uwsgi
# Required-Start:    $all
# Required-Stop:     $all
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: starts the uwsgi web server
# Description:       starts uwsgi using start-stop-daemon
### END INIT INFO

# Author:   licess
# website:  http://lnmp.org

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DESC="uwsgi daemon"
NAME=uwsgi
DAEMON=/usr/local/python2.7/bin/uwsgi
CONFIGFILE=/etc/$NAME.ini
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME

set -e
[ -x "$DAEMON" ] || exit 0

do_start() {
    $DAEMON $CONFIGFILE || echo -n "uwsgi already running"
}

do_stop() {
    kill -9 `cat $PIDFILE` || echo -n "uwsgi not running"
    rm -f $PIDFILE
    echo "$DAEMON STOPED."
}

do_reload() {
    $DAEMON --reload $PIDFILE || echo -n "uwsgi can't reload"
}

do_status() {
    ps aux|grep $DAEMON
}

case "$1" in
 status)
    echo -en "Status $NAME: \n"
    do_status
 ;;
 start)
    echo -en "Starting $NAME: \n"
    do_start
 ;;
 stop)
    echo -en "Stopping $NAME: \n"
    do_stop
 ;;
 reload|graceful)
    echo -en "Reloading $NAME: \n"
    do_reload
 ;;
 *)
    echo "Usage: $SCRIPTNAME {start|stop|reload}" >&2
    exit 3
 ;;
esac

exit 0

chmod +x /etc/init.d/uwsgi

chkconfig --add uwsgi
chkconfig uwsgi on

建立uwsgi的配置文件

vi /etc/uwsgi.ini
[uwsgi]
socket = 127.0.0.1:89
master = true
vhost = true
no-stie = true
#workers = 2
#reload-mercy = 10
#vacuum = true
#max-requests = 1000
#limit-as = 512
#buffer-size = 30000
pidfile = /var/run/uwsgi.pid
daemonize = /home/wwwlogs/uwsgi/uwsgi.log
disable-logging = true

发现一个很全的文档,赞一个
http://www.cnblogs.com/xiongpq/p/3381069.html
http://www.cnblogs.com/zhouej/archive/2012/03/25/2379646.html

发现部署python网站需要依赖太多知识了。
nginx python uwsgi virtualenv 等等
所以总的来说部署python网站很难,向那些曾经奉献给部署python网站的人致敬。

好久没有写文章了,因为最近在搞python+flask,本想先发一篇flask方面的文章,但是这篇博客却更早面世了。这篇博客也是在认识他之后发的第一篇博客。
因为他要做一个项目,需要用python爬虫抓取页面,经分析之后存进数据库,因为抓取的数据有些字段不全,所以用关系型数据库比较浪费空间。
于是本着厉行节约的原则,采用比较松散化的非关系型数据库来存储数据(出发点是对的,但是殊不知非关系型数据库更浪费磁盘空间),不管怎么样,还是选择了用非关系型数据库,其实也没别的意思,只是觉得有新技术不用也是一种浪费。
于是准备采用python+mongodb来完成这个项目。

安装mongodb:这方面资料很多了,用apt-get install mongodb 也行,自己编译也行,这里不是重点。
mongodb服务器端安装好了,输入mongo命令进入交互式命令行界面

cf@ubuntu:~$ mongo
MongoDB shell version: 2.0.4
connecting to: test
> 

查看内置帮助

> help
	db.help()                    help on db methods
	db.mycoll.help()             help on collection methods
	rs.help()                    help on replica set methods
	help admin                   administrative help
	help connect                 connecting to a db help
	help keys                    key shortcuts
	help misc                    misc things to know
	help mr                      mapreduce

	show dbs                     show database names
	show collections             show collections in current database
	show users                   show users in current database
	show profile                 show most recent system.profile entries with time >= 1ms
	show logs                    show the accessible logger names
	show log [name]              prints out the last segment of log in memory, 'global' is default
	use <db_name>                set current database
	db.foo.find()                list objects in collection foo
	db.foo.find( { a : 1 } )     list objects in foo where a == 1
	it                           result of the last line evaluated; use to further iterate
	DBQuery.shellBatchSize = x   set default number of items to display on shell
	exit                         quit the mongo shell
> 

mongodb中最大的数据体是服务器(host),在一个服务器上可以有若干个数据库(db),一个数据库中有若干个数据集合(collection),一个数据集合中有若干条文档。与关系型数据库类似,只不过关系型数据库中的数据表在这里变成了数据集,记录变成了文档。

show dbs 显示数据库名字
show collections 显示数据集合名字
use foo 使用foo数据库

> db.help()
DB methods:
	db.addUser(username, password[, readOnly=false])
	db.auth(username, password)
	db.cloneDatabase(fromhost)
	db.commandHelp(name) returns the help for the command
	db.copyDatabase(fromdb, todb, fromhost)
	db.createCollection(name, { size : ..., capped : ..., max : ... } )
	db.currentOp() displays the current operation in the db
	db.dropDatabase()
	db.eval(func, args) run code server-side
	db.getCollection(cname) same as db['cname'] or db.cname
	db.getCollectionNames()
	db.getLastError() - just returns the err msg string
	db.getLastErrorObj() - return full status object
	db.getMongo() get the server connection object
	db.getMongo().setSlaveOk() allow this connection to read from the nonmaster member of a replica pair
	db.getName()
	db.getPrevError()
	db.getProfilingLevel() - deprecated
	db.getProfilingStatus() - returns if profiling is on and slow threshold 
	db.getReplicationInfo()
	db.getSiblingDB(name) get the db at the same server as this one
	db.isMaster() check replica primary status
	db.killOp(opid) kills the current operation in the db
	db.listCommands() lists all the db commands
	db.logout()
	db.printCollectionStats()
	db.printReplicationInfo()
	db.printSlaveReplicationInfo()
	db.printShardingStatus()
	db.removeUser(username)
	db.repairDatabase()
	db.resetError()
	db.runCommand(cmdObj) run a database command.  if cmdObj is a string, turns it into { cmdObj : 1 }
	db.serverStatus()
	db.setProfilingLevel(level,<slowms>) 0=off 1=slow 2=all
	db.shutdownServer()
	db.stats()
	db.version() current version of the server
	db.getMongo().setSlaveOk() allow queries on a replication slave server
	db.fsyncLock() flush data to disk and lock server for backups
	db.fsyncUnock() unlocks server following a db.fsyncLock()
> 

这些不经常用。。。
db.getCollectionNames() 返回数据集合名称列表

> db.test.help()
DBCollection help
	db.test.find().help() - show DBCursor help
	db.test.count()
	db.test.dataSize()
	db.test.distinct( key ) - eg. db.test.distinct( 'x' )
	db.test.drop() drop the collection
	db.test.dropIndex(name)
	db.test.dropIndexes()
	db.test.ensureIndex(keypattern[,options]) - options is an object with these possible fields: name, unique, dropDups
	db.test.reIndex()
	db.test.find([query],[fields]) - query is an optional query filter. fields is optional set of fields to return.
	                                              e.g. db.test.find( {x:77} , {name:1, x:1} )
	db.test.find(...).count()
	db.test.find(...).limit(n)
	db.test.find(...).skip(n)
	db.test.find(...).sort(...)
	db.test.findOne([query])
	db.test.findAndModify( { update : ... , remove : bool [, query: {}, sort: {}, 'new': false] } )
	db.test.getDB() get DB object associated with collection
	db.test.getIndexes()
	db.test.group( { key : ..., initial: ..., reduce : ...[, cond: ...] } )
	db.test.mapReduce( mapFunction , reduceFunction , <optional params> )
	db.test.remove(query)
	db.test.renameCollection( newName , <dropTarget> ) renames the collection.
	db.test.runCommand( name , <options> ) runs a db command with the given name where the first param is the collection name
	db.test.save(obj)
	db.test.stats()
	db.test.storageSize() - includes free space allocated to this collection
	db.test.totalIndexSize() - size in bytes of all the indexes
	db.test.totalSize() - storage allocated for all data and indexes
	db.test.update(query, object[, upsert_bool, multi_bool])
	db.test.validate( <full> ) - SLOW
	db.test.getShardVersion() - only for use with sharding
	db.test.getShardDistribution() - prints statistics about data distribution in the cluster
> 

db.test.ensureIndex()建立索引,唯一键等
db.test.insert() 插入记录
save函数实际就是根据参数条件,调用了insert或update函数.如果想插入的数据对象存在,insert函数会报错,而save函数是改变原来的对象;如果想插入的对象不存在,那么它们执行相同的插入操作.这里可以用几个字来概括它们两的区别,即所谓”有则改之,无则加之”.
db.test.find() 查询

> db.test.find().help()
find() modifiers
	.sort( {...} )
	.limit( n )
	.skip( n )
	.count() - total # of objects matching query, ignores skip,limit
	.size() - total # of objects cursor would return, honors skip,limit
	.explain([verbose])
	.hint(...)
	.showDiskLoc() - adds a $diskLoc field to each returned object

Cursor methods
	.forEach( func )
	.map( func )
	.hasNext()
	.next()
> 

与关系型数据库方法类似。

以上就是几个简单的命令。
mongodb的数据库,数据集可以隐式创建,直接用db.database_name.collection_name即可访问数据集合。

接下来要安装接口,因为我经常使用的是php和python语言,所以要装这两个语言的扩展模块

php扩展模块的安装参见 http://www.php.net/manual/en/mongo.installation.php
python扩展模块pymongo的安装使用命令即可 easy_install pymongo

php下访问mongodb:

<?php

// 链接服务器
$m = new MongoClient();

// 选择一个数据库
$db = $m->database_name;

// 选择一个集合
$collection = $db->collection_name;

// 插入一个文档
$document = array( "title" => "Calvin and Hobbes", "author" => "Bill Watterson" );
$collection->insert($document);

// 添加另一个文档,它的结构与之前的不同
$document = array( "title" => "XKCD", "online" => true );
$collection->insert($document);

// 查询集合中的所有文档
$cursor = $collection->find();

// 遍历查询结果
foreach ($cursor as $document) {
    echo $document["title"] . "\n";
}

?>

python访问mongodb:

>>> import pymongo
>>> client = pymongo.MongoClient("localhost", 27017)
>>> db = client.test
>>> db.name
u'test'
>>> db.my_collection
Collection(Database(MongoClient('localhost', 27017), u'test'), u'my_collection')
>>> db.my_collection.save({"x": 10})
ObjectId('4aba15ebe23f6b53b0000000')
>>> db.my_collection.save({"x": 8})
ObjectId('4aba160ee23f6b543e000000')
>>> db.my_collection.save({"x": 11})
ObjectId('4aba160ee23f6b543e000002')
>>> db.my_collection.find_one()
{u'x': 10, u'_id': ObjectId('4aba15ebe23f6b53b0000000')}
>>> for item in db.my_collection.find():
...     print item["x"]
...
10
8
11
>>> db.my_collection.create_index("x")
u'x_1'
>>> for item in db.my_collection.find().sort("x", pymongo.ASCENDING):
...     print item["x"]
...
8
10
11
>>> [item["x"] for item in db.my_collection.find().limit(2).skip(1)]
[8, 11]