博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
hello dato--graphlab create
阅读量:5970 次
发布时间:2019-06-19

本文共 7508 字,大约阅读时间需要 25 分钟。

Install Dato(GraphLab Create)

Dato需要注册才能使用, 并且有30天的试用期.

下面使用python的虚拟环境安装一个干净的dato测试环境:

# Create a virtual environment named dato-envvirtualenv dato-env# Activate the virtual environmentsource dato-env/bin/activate# Make sure pip is up to datepip install --upgrade pip# Install IPython Notebook (optional)pip install "ipython[notebook]"# Install your licensed copy of GraphLab Createpip install --upgrade --no-cache-dir https://get.dato.com/GraphLab-Create/1.5.2/EMAIL/KEY/GraphLab-Create-License.tar.gz

如果是旧版本升级, 则到dato-env下执行: bin/pip install graphlab-create==1.5.2

测试dato可用:

➜  dato-env  bin/pythonPython 2.7.8 (default, Oct 20 2014, 15:05:19) [GCC 4.9.1] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import graphlab as gl

如果没有报错, 说明可以使用graphlab的python包了.

如果执行路径不对,比如不在dato-env下或者直接敲入python都会报错找不到graphlab模块,
因为系统中已经有python了. 无法认识虚拟环境的python. 所以必须用的是虚拟环境下的python!

然后参考

Getting Started with GraphLab Create

1.加载数据为SFrame

SFrame: tab分割的结构, 对数据再加工和特征构造非常理想

Graph: 对处理稀疏数据非常理想的一种结构

vertices = gl.SFrame.read_csv('http://s3.amazonaws.com/dato-datasets/bond/bond_vertices.csv')edges = gl.SFrame.read_csv('http://s3.amazonaws.com/dato-datasets/bond/bond_edges.csv')

读取csv文件时, gl会根据文件第一行的内容推断tab分割列的类型:

bond_vertices: [str,str,int,int]
bond_edges: [str,str,str]

查看vertices顶点和edges边, 直接一个变量就可以了:

>>> vertices+----------------+--------+-----------------+---------+|      name      | gender | license_to_kill | villian |+----------------+--------+-----------------+---------+|   James Bond   |   M    |        1        |    0    ||       M        |   M    |        1        |    0    ||   Moneypenny   |   F    |        1        |    0    ||       Q        |   M    |        1        |    0    ||    Wai Lin     |   F    |        1        |    0    || Inga Bergstorm |   F    |        0        |    0    || Elliot Carver  |   M    |        0        |    1    ||  Paris Carver  |   F    |        0        |    1    ||   Gotz Otto    |   M    |        0        |    1    ||  Henry Gupta   |   M    |        0        |    1    |+----------------+--------+-----------------+---------+>>> edges+----------------+------------+------------+|      src       |    dst     |  relation  |+----------------+------------+------------+|    Wai Lin     | James Bond |   friend   ||       M        | James Bond |  worksfor  || Inga Bergstorm | James Bond |   friend   || Elliot Carver  | James Bond | killed_by  ||   Gotz Otto    | James Bond | killed_by  ||   James Bond   |     M      | managed_by ||       Q        |     M      | managed_by ||   Moneypenny   |     M      | managed_by ||       Q        | Moneypenny | colleague  ||       M        | Moneypenny |  worksfor  |+----------------+------------+------------+

2.创建图对象Graph,并添加顶点和边

g = gl.SGraph()g = g.add_vertices(vertices=vertices, vid_field='name')g = g.add_edges(edges=edges, src_field='src', dst_field='dst')

查看图的结构, 注意到把原先顶点的name改成了__id. 把边的src,dst改成__src_id, __dst_id.

>>> gSGraph({'num_edges': 20, 'num_vertices': 10})Vertex Fields:['__id', 'gender', 'license_to_kill', 'villian']Edge Fields:['__src_id', '__dst_id', 'relation']

图对象提供了一些方法可以获取变和顶点. 跟原先的vertices,edges变量的输出类似.

g.get_vertices()g.get_edges()

3.对图计算pagerank

>>> pr = gl.pagerank.create(g)PROGRESS: Counting out degreePROGRESS: Done counting out degreePROGRESS: +-----------+-----------------------+PROGRESS: | Iteration | L1 change in pagerank |PROGRESS: +-----------+-----------------------+PROGRESS: | 1         | 6.65833               |PROGRESS: | 2         | 4.65611               |PROGRESS: | 3         | 3.46298               |PROGRESS: | 4         | 2.55686               |PROGRESS: | 5         | 1.95422               |PROGRESS: | 6         | 1.42139               |PROGRESS: | 7         | 1.10464               |PROGRESS: | 8         | 0.806704              |PROGRESS: | 9         | 0.631771              |PROGRESS: | 10        | 0.465388              |PROGRESS: | 11        | 0.364898              |PROGRESS: | 12        | 0.271257              |PROGRESS: | 13        | 0.212255              |PROGRESS: | 14        | 0.159062              |PROGRESS: | 15        | 0.124071              |PROGRESS: | 16        | 0.0935911             |PROGRESS: | 17        | 0.0727674             |PROGRESS: | 18        | 0.0551714             |PROGRESS: | 19        | 0.0427744             |PROGRESS: | 20        | 0.0325555             |PROGRESS: +-----------+-----------------------+

上面我们看到直接使用gl的pagerank.create方法, 传入构造好的Graph对象, 就返回了pr对象.

>>> prClass                                   : PagerankModelGraph-----num_edges                               : 20num_vertices                            : 10Results-------graph                                   : SGraph. See m['graph']change in last iteration (L1 norm)      : 0.0326vertex pagerank                         : SFrame. See m['pagerank']Settings--------maximun number of iterations            : 20convergence threshold (L1 norm)         : 0.01probablity of random jumps to any node in the graph: 0.15Metrics-------training time (secs)                    : 1.0853number of iterations                    : 20Queryable Fields----------------training_time                           : Total training time of the modelgraph                                   : A new SGraph with the pagerank as a vertex propertydelta                                   : Change in pagerank for the last iteration in L1 normreset_probability                       : The probablity of randomly jumps to any node in the graphpagerank                                : An SFrame with each vertex's pageranknum_iterations                          : Number of iterationsthreshold                               : The convergence threshold in L1 normmax_iterations                          : The maximun number of iterations to run

看到上面的可查询的字段, 都可以通过pr.get()来获得:

>>> pr.get('pagerank')+----------------+----------------+-------------------+|      __id      |    pagerank    |       delta       |+----------------+----------------+-------------------+|   Moneypenny   | 1.18363921275  |  0.00143637385736 || Inga Bergstorm | 0.869872717136 |  0.00477951418076 ||  Henry Gupta   | 0.284762885673 | 1.89255522874e-05 ||  Paris Carver  | 0.284762885673 | 1.89255522874e-05 ||       Q        | 1.18363921275  |  0.00143637385736 ||    Wai Lin     | 0.869872717136 |  0.00477951418076 ||       M        | 1.87718696576  |  0.00666194771763 ||   James Bond   | 2.52743578524  |  0.0132914517076  || Elliot Carver  | 0.634064732205 | 0.000113553313724 ||   Gotz Otto    | 0.284762885673 | 1.89255522874e-05 |+----------------+----------------+-------------------+

但是上面是没有排序的, 我们按照pagerank这一列进行topK排序, 得到最重要的人: 邦德!

>>> pr.get('pagerank').topk(column_name='pagerank')+----------------+----------------+-------------------+|      __id      |    pagerank    |       delta       |+----------------+----------------+-------------------+|   James Bond   | 2.52743578524  |  0.0132914517076  ||       M        | 1.87718696576  |  0.00666194771763 ||   Moneypenny   | 1.18363921275  |  0.00143637385736 ||       Q        | 1.18363921275  |  0.00143637385736 || Inga Bergstorm | 0.869872717136 |  0.00477951418076 ||    Wai Lin     | 0.869872717136 |  0.00477951418076 || Elliot Carver  | 0.634064732205 | 0.000113553313724 ||  Henry Gupta   | 0.284762885673 | 1.89255522874e-05 ||  Paris Carver  | 0.284762885673 | 1.89255522874e-05 ||   Gotz Otto    | 0.284762885673 | 1.89255522874e-05 |+----------------+----------------+-------------------+

dato userguide

转载地址:http://sszox.baihongyu.com/

你可能感兴趣的文章
如何在指定的内容中找出指定字符串的个数
查看>>
我的友情链接
查看>>
浅谈如何用We7站群平台打造垂直性政务网站
查看>>
我的友情链接
查看>>
Spring MVC请求处理流程分析
查看>>
生产环境MySQL 5.5.x单机多实例配置实践
查看>>
Web应用工作原理、动态网页技术
查看>>
EXCEL工作表保护密码破解 宏撤销保护图文教程
查看>>
Catalan数(卡特兰数)
查看>>
.Net Core下使用 RSA
查看>>
python 数据库中文乱码 Excel
查看>>
利用console控制台调试php代码
查看>>
递归算法,如何把list中父子类对象递归成树
查看>>
jsf初学解决GlassFish Server 无法启动
查看>>
hdu 1050 (preinitilization or postcleansing, std::fill) ...
查看>>
Form各键盘触发子所对应的“按键”
查看>>
【java IO】使用Java输入输出流 读取txt文件内数据,进行拼接后写入到另一个文件中...
查看>>
第一次模拟面试
查看>>
window.showModalDialog
查看>>
Pycharm选择pyenv安装的Python版本
查看>>