<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5274885697566202243</id><updated>2011-04-21T12:45:06.489-07:00</updated><category term='网页分类、语料'/><category term='c++ urldecode'/><category term='new、delete、free、malloc'/><category term='BM25'/><category term='Behavioral Targeting、BT、行为营销、Behavioural Targeting'/><category term='Slope One、Slope One算法'/><category term='struct、class'/><category term='libcurl'/><category term='Firefox Cookie、cookie'/><category term='reset form'/><category term='解析JavaScript'/><category term='asp.net reset'/><title type='text'>Binary World</title><subtitle type='html'>The world includes only 0 and 1</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>34</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-7427686911365187223</id><published>2009-05-06T20:11:00.000-07:00</published><updated>2009-05-06T22:56:32.669-07:00</updated><title type='text'>attention for using libsvm</title><content type='html'>Many beginners use the following procedure now:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Transform data to the format of an SVM software&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Randomly try a few kernels and parameters&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Test&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;We propose that beginners try the following procedure rst:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Transform data to the format of an SVM software&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Conduct simple scaling on the data&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Consider the RBF kernel K(x; y) = &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_9qaFnw4SFD4/SgJTD9e1CRI/AAAAAAAAACM/ySBoBq9w_zY/s1600-h/libsvm.jpg"&gt;&lt;img id="BLOGGER_PHOTO_ID_5332916236268669202" style="WIDTH: 81px; CURSOR: pointer; HEIGHT: 22px" alt="" src="http://1.bp.blogspot.com/_9qaFnw4SFD4/SgJTD9e1CRI/AAAAAAAAACM/ySBoBq9w_zY/s320/libsvm.jpg" border="0" /&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Use cross-validation to nd the best parameter C and γ&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Use the best parameter C and γ to train the whole training set&lt;/li&gt;&lt;li&gt;Test&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-7427686911365187223?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/7427686911365187223/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=7427686911365187223' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7427686911365187223'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7427686911365187223'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/05/attention-for-using-libsvm.html' title='attention for using libsvm'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_9qaFnw4SFD4/SgJTD9e1CRI/AAAAAAAAACM/ySBoBq9w_zY/s72-c/libsvm.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-6141263068578913503</id><published>2009-05-02T20:36:00.000-07:00</published><updated>2009-05-02T20:44:57.405-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='reset form'/><category scheme='http://www.blogger.com/atom/ns#' term='asp.net reset'/><title type='text'>asp.net reset button</title><content type='html'>asp.net is not have "Reset type Button" runat server, your can use HTML Control only.&lt;br /&gt;like this:&lt;br /&gt;&lt;br /&gt;&amp;lt;input type="button" id="btn_reset" value="Reset" onclick="javascript:document.getElementById('frm_id').reset();"&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-6141263068578913503?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/6141263068578913503/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=6141263068578913503' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6141263068578913503'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6141263068578913503'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/05/aspnet-reset-button.html' title='asp.net reset button'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-7641825754200773615</id><published>2009-04-29T01:07:00.000-07:00</published><updated>2009-05-02T20:52:55.422-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='c++ urldecode'/><title type='text'>easiest C++ urldecode code</title><content type='html'>将一个encode后的url进行decode, 我的这个版本应该是最简单的一个版本了.&lt;br /&gt;&lt;pre&gt;&lt;br /&gt;std::string url_util::url_decode( const std::string&amp;amp; url_src )&lt;br /&gt;{&lt;br /&gt; std::string url_ret = "";&lt;br /&gt; int len = (int)url_src.length();&lt;br /&gt; char tmpstr[5] = { '0', 'x', '_', '_', '\0' };&lt;br /&gt;&lt;br /&gt; for ( int i=0; i＜len; i++ ) //copy代码后将这个小于号改成半角&lt;br /&gt; {&lt;br /&gt;  char ch = url_src.at( i );&lt;br /&gt;  if ( ch == '%' )&lt;br /&gt;  {&lt;br /&gt;   if ( i+2 &gt;= len )&lt;br /&gt;    return "";&lt;br /&gt;   tmpstr[2] = url_src.at( ++i );&lt;br /&gt;   tmpstr[3] = url_src.at( ++i );&lt;br /&gt;   ch = (char)strtol( tmpstr, NULL, 16 );&lt;br /&gt;  }&lt;br /&gt;  url_ret += ch;&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt;    return url_ret;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-7641825754200773615?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/7641825754200773615/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=7641825754200773615' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7641825754200773615'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7641825754200773615'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/04/easiest-urldecode-code-with-c.html' title='easiest C++ urldecode code'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-5886829792389439109</id><published>2009-04-25T00:11:00.000-07:00</published><updated>2009-04-25T00:51:28.139-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Firefox Cookie、cookie'/><title type='text'>如何用程序去解析firefox的cookie?</title><content type='html'>今天写一个关于读取cookie的小程序，目前只实现了读取IE中的cookie的功能，以后有时间了再实现能读取Firefox中cookie的功能。&lt;div&gt;我看到有很多老外也在问“如何解析firefox的cookie?”这个问题，为了增加blog的流量，那就用英文描述了。呵呵。&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;How to parse cookies in Firefox?&lt;/div&gt;&lt;div&gt;as you know, in IE(or using IE core) browser, the Win32 API InternetGetCookie(InternetGetCookieEx) can get the (special) cookie and it's values. but how to implement this in Firefox? &lt;/div&gt;&lt;div&gt;may be it's complex, a little. &lt;/div&gt;&lt;div&gt;1. where is Firefox's Cookie files?&lt;br /&gt;the Firefox's Cookie information is stored in two files: cookies.txt and hostperm.1. (Older versions used "cookperm.txt" instead  of hostperm.1. )&lt;/div&gt;&lt;div&gt; &lt;p&gt;Starting in Firefox 3.0, cookie information is stored in  "cookies.sqlite" and "permissions.sqlite". &lt;/p&gt; &lt;table style="BORDER-RIGHT: #aaa 1px solid; BORDER-TOP: #aaa 1px solid; BACKGROUND: #fcfcfc; MARGIN: 1em 1em 1em 0px; BORDER-LEFT: #aaa 1px solid; BORDER-BOTTOM: #aaa 1px solid; BORDER-COLLAPSE: collapse" cellspacing="0" cellpadding="4" border="2"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;th&gt;File &lt;/th&gt; &lt;th&gt;Description &lt;/th&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td valign="top"&gt;cookies.txt&lt;br /&gt;cookies.sqlite &lt;/td&gt; &lt;td&gt;Holds all of your cookies, including login information, session data, and  preferences. &lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td valign="top"&gt;hostperm.1&lt;br /&gt;permissions.sqlite &lt;/td&gt; &lt;td&gt;Holds preferences about which sites you allow or prohibit to set cookies, to  display images, to open popup windows and to initiate extensions installation.  &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-weight: normal;"&gt;cookies.txt(&lt;/span&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; font-weight: normal; "&gt;cookies.sqlite)&lt;span class="Apple-style-span" style="border-collapse: separate; "&gt; and &lt;span class="Apple-style-span" style="border-collapse: collapse; "&gt;hostperm.1(&lt;span class="Apple-style-span" style="border-collapse: separate; "&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; "&gt;permissions.sqlite) &lt;span class="Apple-style-span" style="border-collapse: separate; "&gt;is a file in the profile folder. The profile folder path in Windows OS is:&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;p&gt;For other applications, see the information below. Some of these folders may  be hidden.&lt;/p&gt;&lt;table style="BORDER-RIGHT: #aaa 1px solid; BORDER-TOP: #aaa 1px solid; BACKGROUND: #fcfcfc; MARGIN: 1em 1em 1em 0px; BORDER-LEFT: #aaa 1px solid; BORDER-BOTTOM: #aaa 1px solid; BORDER-COLLAPSE: collapse" cellspacing="0" cellpadding="4" border="2"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;th&gt;Operating system &lt;/th&gt; &lt;th&gt;Profile folder location(s) &lt;/th&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;Windows NT (NT 4.x, 2000, XP, and Vista) &lt;/td&gt; &lt;td&gt;"%APPDATA%\Mozilla\" &lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;Windows 95 (without Desktop Update) &lt;/td&gt; &lt;td&gt;C:\Windows\Mozilla &lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;Windows 95 (with Desktop Update)/98/Me &lt;/td&gt; &lt;td&gt;C:\Windows\Application Data\Mozilla\&lt;br /&gt;&lt;p&gt;C:\Windows\Profiles\&lt;windows&gt;\Application Data\Mozilla\  &lt;/windows&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt; &lt;tr&gt; &lt;td&gt;&lt;br /&gt;&lt;/td&gt;&lt;td&gt;&lt;br /&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;2. format of &lt;span class="Apple-style-span" style="border-collapse: collapse; "&gt;cookies.txt and cookies.sqlite&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;"&gt;&lt;p&gt;A typical line from the file(cookies.txt) might read (the spaces are tabs in the original  file): &lt;/p&gt; &lt;dl&gt; &lt;span class="Apple-style-span"   style="  ;font-family:-webkit-monospace;font-size:13px;"&gt;www.taobao.com FALSE / FALSE 1158030396 cna +7aB31T1YyKBA&lt;/span&gt;&lt;/dl&gt;&lt;p&gt;The meaning of the above line is as follows: &lt;/p&gt; &lt;table style="BORDER-RIGHT: #aaa 1px solid; BORDER-TOP: #aaa 1px solid; BACKGROUND: #fcfcfc; MARGIN: 1em 1em 1em 0px; BORDER-LEFT: #aaa 1px solid; BORDER-BOTTOM: #aaa 1px solid; BORDER-COLLAPSE: collapse" cellspacing="0" cellpadding="4" border="2"&gt; &lt;tbody&gt; &lt;tr&gt; &lt;td&gt;&lt;span class="Apple-style-span"   style=" ;font-family:-webkit-monospace;font-size:13px;"&gt;www.taobao.com&lt;/span&gt;&lt;/td&gt; &lt;td&gt;The name of the website (server) that stored the cookie. &lt;/td&gt;&lt;/tr&gt; &lt;tr valign="top"&gt; &lt;td&gt;&lt;tt&gt;FALSE&lt;/tt&gt; &lt;/td&gt; &lt;td&gt;Whether the cookie can be read by other machines at the same domain  (mozillazine.org); in this case, no. &lt;/td&gt;&lt;/tr&gt; &lt;tr valign="top"&gt; &lt;td&gt;&lt;tt&gt;/&lt;/tt&gt; &lt;/td&gt; &lt;td&gt;The directory path for which the cookie is relevant; in this case,  &lt;tt&gt;/&lt;/tt&gt; denotes the home directory . &lt;/td&gt;&lt;/tr&gt; &lt;tr valign="top"&gt; &lt;td&gt;&lt;tt&gt;FALSE&lt;/tt&gt; &lt;/td&gt; &lt;td&gt;Whether the cookie requires a secure connection; in this case, no.  &lt;/td&gt;&lt;/tr&gt; &lt;tr valign="top"&gt; &lt;td&gt;&lt;span class="Apple-style-span"   style=" ;font-family:-webkit-monospace;font-size:13px;"&gt;1158030396&lt;br /&gt;&lt;/span&gt;&lt;/td&gt; &lt;td&gt;The time at which the cookie will expire (the number of seconds since &lt;a class="extiw" title="wikipedia:Unix_time" href="http://www.wikipedia.org/wiki/Unix_time"&gt;12 a.m., January 1, 1970&lt;/a&gt;).  &lt;/td&gt;&lt;/tr&gt; &lt;tr valign="top"&gt; &lt;td&gt;&lt;span class="Apple-style-span"   style=" ;font-family:-webkit-monospace;font-size:13px;"&gt;cna&lt;/span&gt;&lt;/td&gt; &lt;td&gt;The name of the cookie. &lt;/td&gt;&lt;/tr&gt; &lt;tr valign="top"&gt; &lt;td&gt;&lt;span class="Apple-style-span"   style=" ;font-family:-webkit-monospace;font-size:13px;"&gt;+7aB31T1YyKBA&lt;br /&gt;&lt;/span&gt;&lt;/td&gt; &lt;td&gt;The value of the cookie. &lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/span&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse; "&gt;cookies.sqlite is a SQLite database file, see:&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;"&gt;&lt;a href="http://www.sqlite.org/"&gt;http://www.sqlite.org/&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a class="wiki" href="http://hackerstv.blogspot.com/2008/07/convert-firefox-3-cookiessqlite-to.html"&gt;Hackers  TV: Firefox 3 cookies.sqlite to Firefox 2 cookies.txt&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="border-collapse: collapse;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-5886829792389439109?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/5886829792389439109/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=5886829792389439109' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/5886829792389439109'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/5886829792389439109'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/04/firefoxcookie.html' title='如何用程序去解析firefox的cookie?'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-559998724349488783</id><published>2009-04-24T01:16:00.000-07:00</published><updated>2009-05-02T20:50:16.231-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='libcurl'/><title type='text'>libcurl使用</title><content type='html'>libcurl库是一个很强大的下载库，支持多种协议。&lt;br /&gt;最近的一个项目，我需要将原来在FreeBSD上运行良好的蜘蛛移植到Linux64系统下运行，编译运行立刻就core down了。&lt;br /&gt;查找原因良久未果，但是大致原因能看出来，这个蜘蛛是封装了libcurl做的，封装的类使用STL太频繁了，我debug的时候就连set&lt;string&gt;类型的一个变量进行insert一个string后，那个被insert的string就被改变了，完全是内存混乱的情景。&lt;br /&gt;与其在这上面改，还不如重新封装一遍。于是，我重新封装了libcurl，C++的架子，C的代码；也比较简单，无非就是按照官方的手册拼些代码，只是多了一个类似于“池”的东西，存放着n个CURL对象。&lt;br /&gt;&lt;br /&gt;ok了，程序不再core down了。但是结果不是我想要的，下载的网页内容不全。why?&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt;* 代码只留下了也我描述有关的.&lt;br /&gt;*/&lt;br /&gt;&lt;br /&gt;int pool_curl::initialize( int pool_size, int timeout, int max_content_len )&lt;br /&gt;{&lt;br /&gt;this-&gt;m_pool_size = pool_size;&lt;br /&gt;this-&gt;m_timeout = timeout;&lt;br /&gt;pool_curl::m_max_conten_len = max_content_len;&lt;br /&gt;&lt;br /&gt;// 申请指针数组&lt;br /&gt;this-&gt;m_pool = (CURL**)malloc( sizeof(CURL*) * this-&gt;m_pool_size );&lt;br /&gt;if ( this-&gt;m_pool == NULL )&lt;br /&gt;return -1;&lt;br /&gt;this-&gt;m_url = (char**)malloc( sizeof(char*) * this-&gt;m_pool_size );&lt;br /&gt;if ( this-&gt;m_url == NULL )&lt;br /&gt;return -2;&lt;br /&gt;this-&gt;m_content = (char**)malloc( sizeof(char*) * this-&gt;m_pool_size );&lt;br /&gt;if ( this-&gt;m_content == NULL )&lt;br /&gt;return -3;&lt;br /&gt;&lt;br /&gt;// 申请指针实际指向的内容&lt;br /&gt;for( int i=0; i&lt;this-&gt;m_pool_size; i++ )&lt;br /&gt;{&lt;br /&gt;CURL* curl = NULL;&lt;br /&gt;curl = curl_easy_init();&lt;br /&gt;if ( curl == NULL )&lt;br /&gt;return -2;&lt;br /&gt;curl_easy_setopt( curl, CURLOPT_VERBOSE, 0 );&lt;br /&gt;curl_easy_setopt( curl, CURLOPT_NOPROGRESS, 1 );&lt;br /&gt;curl_easy_setopt( curl, CURLOPT_TIMEOUT, this-&gt;m_timeout );&lt;br /&gt;curl_easy_setopt( curl, CURLOPT_USERAGENT, pool_curl::usr_agent );&lt;br /&gt;curl_easy_setopt( curl, CURLOPT_NOSIGNAL, 1 );&lt;br /&gt;curl_easy_setopt( curl, CURLOPT_HEADER, 0 );&lt;br /&gt;curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, pool_curl::cb_wr ); // 这儿设置当有内容下载时,写内容的回调函数&lt;br /&gt;&lt;br /&gt;*(this-&gt;m_pool + i) = curl;&lt;br /&gt;*(this-&gt;m_url + i) = (char *)malloc( URL_MAX_LENGTH );&lt;br /&gt;*(this-&gt;m_content + i) = (char *)malloc( pool_curl::m_max_conten_len + sizeof(int) ); // 特别的字符串, 前四个字节表示自己的长度&lt;br /&gt;curl_easy_setopt( curl, CURLOPT_WRITEDATA, *(this-&gt;m_content+i) );&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;this-&gt;m_idle = (char*)calloc( this-&gt;m_pool_size, 1 );&lt;br /&gt;if ( this-&gt;m_idle == NULL )&lt;br /&gt;return -2;&lt;br /&gt;&lt;br /&gt;this-&gt;m_multi_handle = curl_multi_init();&lt;br /&gt;if ( this-&gt;m_multi_handle == NULL )&lt;br /&gt;return -2;&lt;br /&gt;&lt;br /&gt;return 0;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;// 这就是那个前面提到的回调函数&lt;br /&gt;int pool_curl::cb_wr( char* data, size_t size, size_t nmemb, char* dest )&lt;br /&gt;{&lt;br /&gt;int have_len = *(int*)dest; // 前四个字节是存放现在长度的&lt;br /&gt;&lt;br /&gt;int n_cur = (int)(size * nmemb);&lt;br /&gt;if ( have_len+n_cur &gt;= pool_curl::m_max_conten_len )&lt;br /&gt;return 0;&lt;br /&gt;&lt;br /&gt;memcpy( dest+sizeof(int)+have_len, data, n_cur );&lt;br /&gt;have_len += n_cur;&lt;br /&gt;*(dest+sizeof(int)+have_len) = '\0';&lt;br /&gt;*(int*)dest = have_len;&lt;br /&gt;&lt;br /&gt;return n_cur;  //libcurl要求这个回调函数的返回值必需等于size * nmemb&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;int pool_curl::is_completed( std::string&amp;amp; url, std::string&amp;amp; content )&lt;br /&gt;{&lt;br /&gt;int ret = -1;&lt;br /&gt;&lt;br /&gt;// 此处省略n行代码......&lt;br /&gt;&lt;br /&gt;CURLMsg* msg; /* for picking up messages with the transfer status */&lt;br /&gt;int msgs_left; /* how many messages are left */&lt;br /&gt;&lt;br /&gt;/* See how the transfers went */&lt;br /&gt;msg = curl_multi_info_read( this-&gt;m_multi_handle, &amp;amp;msgs_left );&lt;br /&gt;if ( msg != NULL )&lt;br /&gt;{&lt;br /&gt;if ( msg-&gt;msg == CURLMSG_DONE )&lt;br /&gt;{&lt;br /&gt;curl_multi_remove_handle( this-&gt;m_multi_handle, msg-&gt;easy_handle );&lt;br /&gt;&lt;br /&gt;int idx = 0;&lt;br /&gt;bool found = false;&lt;br /&gt;&lt;br /&gt;/* Find out which handle this message is about */&lt;br /&gt;for ( idx=0; found==false &amp;amp;&amp;amp; idx&lt;this-&gt;m_pool_size; idx++ )&lt;br /&gt;found = ( msg-&gt;easy_handle == *(this-&gt;m_pool+idx) );&lt;br /&gt;&lt;br /&gt;if ( found == true )&lt;br /&gt;{&lt;br /&gt;idx = idx - 1;&lt;br /&gt;*(this-&gt;m_idle + idx) = (char)0;&lt;br /&gt;url = *(this-&gt;m_url + idx);&lt;br /&gt;content = *(this-&gt;m_content + idx ) + sizeof(int);&lt;br /&gt;&lt;br /&gt;ret = msg-&gt;data.result;&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;}&lt;br /&gt;return ret;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;我犯的错误在于，libcurl要求那个回调函数的返回值必需等于size * nmemb；而我最初没有注意这个，返回值随意地等当前缓冲区中字符串长度；所以ret = msg-&gt;data.result; 经常返回23，表示执行写内容的回调函数有错。&lt;br /&gt;&lt;br /&gt;改正后一切正常。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-559998724349488783?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/559998724349488783/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=559998724349488783' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/559998724349488783'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/559998724349488783'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/04/libcurl.html' title='libcurl使用'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-6722776097706207589</id><published>2009-04-15T19:49:00.000-07:00</published><updated>2009-04-15T20:12:09.618-07:00</updated><title type='text'>C#中类成员函数中的静态变量</title><content type='html'>&lt;div&gt;C#中类成员函数内部是不能声明静态变量的；这跟C\C++是不同的，在C\C++中是可以这样做的。&lt;/div&gt;&lt;br /&gt;&lt;div&gt;C#这样做的目的可能在于成为一种纯的面向对象的语言。实际上C\C++成员函数中的静态变量实际存储也是放在了类的静态存储区的，所以C#在设计时直接去掉函数中静态变量的“功能”。&lt;/div&gt;&lt;br /&gt;&lt;div&gt;你可以在类中声明和定义静态变量，然后在类的成员函数中使用它们，从而达到类成员函数中的静态变量的目的。&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-6722776097706207589?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/6722776097706207589/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=6722776097706207589' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6722776097706207589'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6722776097706207589'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/04/c_15.html' title='C#中类成员函数中的静态变量'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-3363416288876629700</id><published>2009-04-12T22:25:00.000-07:00</published><updated>2009-04-12T22:32:39.962-07:00</updated><title type='text'>autoconf手册</title><content type='html'>&lt;p&gt;我简单地整理了一下，创建了一个chm格式的文档。&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.alibaby.org/alibaby_dl/autoconf.rar"&gt;点击此处下载&lt;/a&gt;。&lt;/p&gt;&lt;p&gt;&lt;span style="color:#999999;"&gt;Autoconf是一个用于生成可以自动地配置软件源代码包以适应多种Unix类系统的 shell脚本的工具。由Autoconf生成的配置脚本在运行的时候与Autoconf是无关的，就是说配置脚本的用户并不需要拥有Autoconf。&lt;br /&gt;由Autoconf生成的配置脚本在运行的时候不需要用户的手工干预；通常它们甚至不需要通过给出参数以确定系统的类型。相反，它们对软件包可能需要的各种特征进行独立的测试。（在每个测试之前，它们打印一个单行的消息以说明它们正在进行的检测，以使得用户不会因为等待脚本执行完毕而焦躁。）因此，它们在混合系统或者从各种常见Unix变种定制而成的系统中工作的很好。没有必要维护文件以储存由各个Unix变种、各个发行版本所支持的特征的列表。&lt;br /&gt;对于每个使用了Autoconf的软件包，Autoconf从一个列举了该软件包需要的，或者可以使用的系统特征的列表的模板文件中生成配置脚本。在shell代码识别并响应了一个被列出的系统特征之后，Autoconf允许多个可能使用（或者需要）该特征的软件包共享该特征。如果后来因为某些原因需要调整shell代码，就只要在一个地方进行修改；所有的配置脚本都将被自动地重新生成以使用更新了的代码。&lt;br /&gt;Metaconfig包在目的上与Autoconf很相似，但它生成的脚本需要用户的手工干预，在配置一个大的源代码树的时候这是十分不方便的。不象Metaconfig脚本，如果在编写脚本时小心谨慎， Autoconf可以支持交叉编译（cross-compiling）。&lt;br /&gt;Autoconf目前还不能完成几项使软件包可移植的工作。其中包括为所有标准的目标自动创建`Makefile'文件，包括在缺少标准库函数和头文件的系统上提供替代品。目前正在为在将来添加这些特征而工作。&lt;/span&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-3363416288876629700?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/3363416288876629700/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=3363416288876629700' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3363416288876629700'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3363416288876629700'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/04/autoconf.html' title='autoconf手册'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-6127262920906079888</id><published>2009-04-10T04:34:00.000-07:00</published><updated>2009-04-10T04:37:09.390-07:00</updated><title type='text'>制作中文分词器</title><content type='html'>&lt;strong&gt;1. 提出目标&lt;/strong&gt;&lt;br /&gt;分词器是一个基础件，它是自然语言处理的基本，通用的分词器能够对多种语言，多种编码格式进行切分，输出的信息也很详细，一般包括切分词，词的词性，各词之间的关系，甚至句子的结构等等。人名，地名，新词的识别也是通用分词器解决的问题。分词器的开发是一个没有止境的工作。&lt;br /&gt;这里我们的目标很简单：制作一个简单的中文分词器，只对gbk编码的文本进行分词；输出很简单，只包括切分词。例如：&lt;br /&gt;&lt;span style="color:#000099;"&gt;我们的祖国是花园。&lt;/span&gt;&lt;br /&gt;通用的分词结果输出常见的是 &lt;span style="color:#000099;"&gt;我们\n 的\a 祖国\n 是\v 花园\n 。\i&lt;/span&gt;&lt;br /&gt;而我们将要完成的分词器的输出结果简单地为：&lt;br /&gt;&lt;span style="color:#000099;"&gt;我们\祖国\花园&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;2. 实现方法&lt;/strong&gt;&lt;br /&gt;采用正向最大匹配，反向最大匹配，词频统计并进行句子权重计算 这三种方法结合来实现。&lt;br /&gt;&lt;br /&gt;3. 实现步骤&lt;br /&gt;(1)获取语料，词频统计&lt;br /&gt;语料的质量很重要，语料越大越好。目前我使用的是人民日报1998年语料。该语料已经人工分好词了，我们可以用简单的awk脚本来进行词频的统计，生成一个最初的字典。&lt;br /&gt;(2)“净化”字典。&lt;br /&gt;去掉标点符号，停用词（例如“的”，“吗”等这些单字）。&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;&lt;span style="color:#ff0000;"&gt;待续...&lt;/span&gt;&lt;/strong&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-6127262920906079888?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/6127262920906079888/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=6127262920906079888' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6127262920906079888'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6127262920906079888'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/04/blog-post_10.html' title='制作中文分词器'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-3603936244490390163</id><published>2009-04-05T07:25:00.001-07:00</published><updated>2009-04-24T01:15:07.399-07:00</updated><title type='text'>C\C++函数参数实际都是整数</title><content type='html'>C\C++函数参数实际都是整数.&lt;br /&gt;&lt;br /&gt;首先char, short, long都是整数,只是其取值范围不同而已.&lt;br /&gt;再看函数:&lt;br /&gt;char get_char( char* content, int pos );&lt;br /&gt;参数content是一个指针类型,content实质是一个long型的数字. 上面的函数原型只是让我们好理解才那样写的.实际上它是:&lt;br /&gt;char get_char( long content, int pos );&lt;br /&gt;&lt;br /&gt;那么我们可以说C\C++函数永远都只是传值的,它是把那些(指针)地址的值(long型)都压入栈中.&lt;br /&gt;仔细想想:&lt;br /&gt;&lt;br /&gt;char p[] = { 'y','a','h','o','o','\0' };&lt;br /&gt;fprintf( "addr:%ld.\n", (long)p );&lt;br /&gt;set_char( p, 0, 'Y' );&lt;br /&gt;fprintf( "%s,addr:%ld.\n", p, (long)p );&lt;br /&gt;&lt;br /&gt;p本身的值并没有发生改变,因为函数是传值的,调用set_char()只是把p的值copy了一份压入栈中. 改变的是什么呢?是p指向的内容.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-3603936244490390163?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/3603936244490390163/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=3603936244490390163' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3603936244490390163'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3603936244490390163'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/04/cc.html' title='C\C++函数参数实际都是整数'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-7592713108650962777</id><published>2009-04-02T20:34:00.000-07:00</published><updated>2009-04-02T20:41:56.435-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='网页分类、语料'/><title type='text'>中文网页自动分类语料</title><content type='html'>&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;CCT2006&lt;/span&gt;: 2006年3月中文网页分类训练集CCT2006, 编号YQ-CCT-2006-03. &lt;div&gt;根据常见的新闻类别而设定的分类体系，从新闻网站上抓取得到对应 类别的新闻网页作为训练集页面。它包括960个训练网页和240个测试网页， 分布在8个类别中。&lt;a href="http://www.cwirf.org/2006WebTrack/YQ-CCT-2006-03.tgz"&gt;下载&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;CCT2002-v1.1&lt;/span&gt;:2002年中文网页分类训练集CCT2002-v1.1, 编号YQ-WEBBENCH-V1.1, &lt;a href="http://www.cwirf.org/2005WebTrack/trainset_intro_v1.1.pdf"&gt;说明&lt;/a&gt;，在CCT2002-v1.0 的基础上对类别进行了部分修正. &lt;/div&gt;&lt;div&gt;是2002年秋天北京大学网络与分布式实验室天网小组通过动员不同专业的几十个学生， 人工选取形成了一个全新的基于层次模型的大规模中文网页样本集。 它包括11678个训练网页实例和3630个测试网页实例，分布在11个大类别中。 &lt;a href="http://www.cwirf.org/2005WebTrack/YQ-WEBBENCH-V1.1.tgz"&gt;下载&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;在这儿下载&lt;a href="http://www.cwirf.org/SharedRes/Tool/makeindex.c"&gt;读取原始训练集文件的工具&lt;/a&gt;, May 10, 2005&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-7592713108650962777?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/7592713108650962777/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=7592713108650962777' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7592713108650962777'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7592713108650962777'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/04/blog-post.html' title='中文网页自动分类语料'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-5268241997998295690</id><published>2009-03-30T05:00:00.000-07:00</published><updated>2009-03-30T05:08:17.485-07:00</updated><title type='text'>C4.5和C5.0</title><content type='html'>Data Mining Tool(s) C4.5:&lt;a href="http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html"&gt;http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html&lt;/a&gt;&lt;br /&gt;Data Mining Tools C5.0 &amp;amp; See5(not free!):&lt;a href="http://www.rulequest.com/see5-info.html"&gt;http://www.rulequest.com/see5-info.html&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-5268241997998295690?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/5268241997998295690/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=5268241997998295690' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/5268241997998295690'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/5268241997998295690'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/03/c45c50.html' title='C4.5和C5.0'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-3313295774257477347</id><published>2009-03-29T20:13:00.000-07:00</published><updated>2009-03-29T20:21:30.090-07:00</updated><title type='text'>Warning: mysql_connect() [function.mysql-connect]: Access denied for user 'ODBC'@'localhost' (using password: NO) in</title><content type='html'>Warning: mysql_connect() [function.mysql-connect]: Access denied for user &lt;a href="mailto:"&gt;'ODBC'@'localhost'&lt;/a&gt; (using password: NO) in&lt;br /&gt;&lt;br /&gt;mysql连接数据库时发生以上的错误信息.&lt;br /&gt;&lt;br /&gt;原因是:&lt;br /&gt;(1)mysql_connect()函数没有传入正确的账号及密码, 使得系统默认使用ODBC这个账号去登录了. 请检查一下是不是自己不小心将全局变量直接传给mysql_connect()了, 而没有在函数中先定义一下:&lt;br /&gt;global $global_var&lt;br /&gt;(2)mysql数据库中没有设定权限, 用下面的办法允许该账号登录:&lt;br /&gt;进入mysql环境, 运行:&lt;br /&gt;grant all on *.* to 账号 Identified by "账号密码";&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-3313295774257477347?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/3313295774257477347/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=3313295774257477347' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3313295774257477347'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3313295774257477347'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/03/warning-mysqlconnect-functionmysql.html' title='Warning: mysql_connect() [function.mysql-connect]: Access denied for user &apos;ODBC&apos;@&apos;localhost&apos; (using password: NO) in'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-8911176792007558944</id><published>2009-03-24T19:29:00.000-07:00</published><updated>2009-03-24T19:38:55.965-07:00</updated><title type='text'>linux下的size_t的数据类型</title><content type='html'>在linux下的size_t数据类型是在&lt;stddef.h&gt;文件中定义的。&lt;br /&gt;&lt;br /&gt;它的定义可能会有下面两种情况：&lt;br /&gt;typedef unsigned int size_t;&lt;br /&gt;或&lt;br /&gt;typedef unsigned long size_t;&lt;br /&gt;&lt;br /&gt;可以肯定的一点是它是无符号的。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-8911176792007558944?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/8911176792007558944/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=8911176792007558944' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/8911176792007558944'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/8911176792007558944'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/03/linuxsizet.html' title='linux下的size_t的数据类型'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-1796783252109003061</id><published>2009-03-24T19:11:00.000-07:00</published><updated>2009-03-24T19:26:43.787-07:00</updated><title type='text'>gprof没有生成gmon文件的原因</title><content type='html'>gprof是Unix/Linux下一个强有力的程序分析工具。对于C语言的程序，它能够以“日志”的形式记录程序运行时的统计信息：程序运行中各个函数消耗的时间和函数调用关系，以及每个函数被调用的次数等等。从而可以帮助程序员找出众多函数中耗时最多的函数，也可以帮助程序员分析程序的运行流程。&lt;br /&gt;gprof的使用手册很多，大家可以在网上搜索一下。&lt;br /&gt;&lt;br /&gt;实际工作中可能会碰到没有生成gmon文件的情况，这里写下我的实际经验。&lt;br /&gt;(1)必需在编译程序必须使用-pg编译选项，连接的时候必须使用-lc_p库；这两者缺一不可。&lt;br /&gt;(2)程序不能是守护进程。&lt;br /&gt;(3)程序必需正常退出，返回操作系统，gmon文件只在程序退出的时刻生成。强制退出是不能生成gmon文件的。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-1796783252109003061?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/1796783252109003061/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=1796783252109003061' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/1796783252109003061'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/1796783252109003061'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/03/gprofgmon.html' title='gprof没有生成gmon文件的原因'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-2149763536621290004</id><published>2009-03-05T19:22:00.000-08:00</published><updated>2009-03-05T19:29:11.652-08:00</updated><title type='text'>卓越亚马逊</title><content type='html'>&lt;a href="http://www.amazon.cn/"&gt;http://www.amazon.cn/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;以前买书一直在china-pub上，后来有几次给我送的书有问题，觉得china-pub不负责，于是听同事的推荐到amazon上开始买书了。&lt;br /&gt;&lt;br /&gt;明显感觉卓越被amazon收购后的变化：&lt;br /&gt;1.绝对不会弹出新的网页，所有的网页均在原来的窗口中(这是对用户的尊重)。&lt;br /&gt;2.推荐引擎做得不错。&lt;br /&gt;3.送货质量提高(包装，订单都很不错，如果书比较多，还会用纸箱做包装)，书的质量也比较高。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-2149763536621290004?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/2149763536621290004/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=2149763536621290004' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/2149763536621290004'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/2149763536621290004'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/03/blog-post.html' title='卓越亚马逊'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-1419983816383456790</id><published>2009-02-11T00:36:00.000-08:00</published><updated>2009-02-11T01:05:45.280-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Slope One、Slope One算法'/><title type='text'>Slope One算法</title><content type='html'>协同过滤是推荐系统结合不同用户的意见和喜好，以实现个性化的建议的一种技术。协同过滤至少可分两类：以用户为基础(user-based)和以推荐的条目为基础(item-based)。&lt;br /&gt;&lt;br /&gt;Slope One 算法是由 &lt;a title="Daniel Lemire" href="http://www.daniel-lemire.com/en" target="_blank"&gt;Daniel Lemire&lt;/a&gt; 教授在 2005 年提出的一个可用于item-based 协同过滤的算法。&lt;br /&gt;&lt;br /&gt;参见&lt;a href="http://en.wikipedia.org/wiki/Slope_One"&gt;http://en.wikipedia.org/wiki/Slope_One&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-1419983816383456790?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/1419983816383456790/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=1419983816383456790' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/1419983816383456790'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/1419983816383456790'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/02/slope-one.html' title='Slope One算法'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-796406413731866001</id><published>2009-02-10T18:28:00.000-08:00</published><updated>2009-02-10T18:30:52.148-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BM25'/><title type='text'>BM25算法</title><content type='html'>&lt;p&gt;在信息检索领域, 有一个ranking函数(集)BM25.&lt;br /&gt;&lt;/p&gt;&lt;p&gt;In &lt;a title="Information retrieval" href="/wiki/Information_retrieval"&gt;information retrieval&lt;/a&gt;, &lt;b&gt;Okapi BM25&lt;/b&gt;  is a &lt;a title="Ranking function" href="/wiki/Ranking_function"&gt;ranking  function&lt;/a&gt; used by &lt;a class="mw-redirect" title="Search engine" href="/wiki/Search_engine"&gt;search engines&lt;/a&gt; to rank matching documents  according to their &lt;a title="Relevance (information retrieval)" href="/wiki/Relevance_%28information_retrieval%29"&gt;relevance&lt;/a&gt; to a given search  query. It is based on the probabilistic retrieval framework developed in the  1970s and 1980s by &lt;a class="new" title="Stephen E. Robertson (page does not exist)" href="/w/index.php?title=Stephen_E._Robertson&amp;amp;action=edit&amp;amp;redlink=1"&gt;Stephen  E. Robertson&lt;/a&gt;, &lt;a title="Karen Spärck Jones" href="/wiki/Karen_Sp%C3%A4rck_Jones"&gt;Karen Spärck Jones&lt;/a&gt;, and others.&lt;/p&gt; &lt;p&gt;The name of the actual ranking function is BM25. To set the right context,  however, it usually referred to as "Okapi BM25", since the Okapi information  retrieval system, implemented at &lt;a title="London" href="/wiki/London"&gt;London&lt;/a&gt;'s &lt;a title="City University, London" href="/wiki/City_University,_London"&gt;City University&lt;/a&gt; in the 1980s and 1990s,  was the first system to implement this function.&lt;/p&gt; &lt;p&gt;BM25, and its newer variants, e.g. BM25F (a version of BM25 that can take  document structure and anchor text into account), represent state-of-the-art  retrieval functions used in document retrieval, such as Web search.&lt;/p&gt;&lt;p&gt;见&lt;a href="http://en.wikipedia.org/wiki/Probabilistic_relevance_model_%28BM25%29"&gt;http://en.wikipedia.org/wiki/Probabilistic_relevance_model_(BM25)&lt;/a&gt;&lt;br /&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-796406413731866001?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/796406413731866001/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=796406413731866001' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/796406413731866001'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/796406413731866001'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/02/bm25.html' title='BM25算法'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-5905623560780147628</id><published>2009-02-09T19:22:00.000-08:00</published><updated>2009-02-09T19:23:11.972-08:00</updated><title type='text'>Top 10 algorithms in data mining</title><content type='html'>&lt;div&gt;#1: C4.5(61 votes), presented by Hiroshi Motoda&lt;/div&gt;&lt;div&gt;#2: K-Means(60 votes), presented by JoydeepGhosh&lt;/div&gt;&lt;div&gt;#3: SVM(58 votes), presented by QiangYang&lt;/div&gt;&lt;div&gt;#4: Apriori(52 votes), presented by ChristosFaloutsos&lt;/div&gt;&lt;div&gt;#5: EM(48 votes), presented by Joydeep Ghosh&lt;/div&gt;&lt;div&gt;#6: PageRank(46 votes), presented by Christos Faloutsos&lt;/div&gt;&lt;div&gt;#7: AdaBoost(45 votes), presented by Zhi-HuaZhou&lt;/div&gt;&lt;div&gt;#7: kNN(45 votes), presented by VipinKumar&lt;/div&gt;&lt;div&gt;#7: Naive Bayes(45 votes), presented by QiangYang&lt;/div&gt;&lt;div&gt;#10: CART(34 votes), presented by Dan Steinberg&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-5905623560780147628?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/5905623560780147628/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=5905623560780147628' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/5905623560780147628'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/5905623560780147628'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/02/top-10-algorithms-in-data-mining.html' title='Top 10 algorithms in data mining'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-4450051047558119817</id><published>2009-02-04T22:34:00.000-08:00</published><updated>2009-02-04T22:37:52.877-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Behavioral Targeting、BT、行为营销、Behavioural Targeting'/><title type='text'>什么是Behavioral Targeting?</title><content type='html'>&lt;div&gt;Behavioral Targeting(或Behavioural Targeting,以下简称BT)是为了提高广告主商业活动效果而采用的一项营销技术.&lt;/div&gt;&lt;div&gt;BT根据搜集到的某用户的行为信息(例如该用户访问过的页面或是查找过的内容,用户曾经购买过的产品),选择出适合的广告显示给该用户,从而帮助广告主把广告投放给那些对该广告更有兴趣的用户,提高广告的效果.&lt;/div&gt;&lt;div&gt;BT可以仅仅靠用户行为进行,或是考虑连同其它因素的基础上进行,例如地理，人口统计或周边情况等。&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-4450051047558119817?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/4450051047558119817/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=4450051047558119817' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/4450051047558119817'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/4450051047558119817'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/02/behavioral-targeting.html' title='什么是Behavioral Targeting?'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-2774757037463222185</id><published>2009-01-24T17:46:00.000-08:00</published><updated>2009-01-24T17:47:56.647-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='struct、class'/><title type='text'>C++ struct和class的区别</title><content type='html'>在C++中，struct和class的区别在于，struct的成员变量和成员函数默认都是public的；而class则默认为private的。&lt;br /&gt;除此之外，两者的功能是相同的。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-2774757037463222185?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/2774757037463222185/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=2774757037463222185' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/2774757037463222185'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/2774757037463222185'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/01/c-structclass.html' title='C++ struct和class的区别'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-6685088823243710390</id><published>2009-01-24T08:17:00.000-08:00</published><updated>2009-01-24T08:18:52.156-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='new、delete、free、malloc'/><title type='text'>C++中new(delete)和malloc(free)的常见问题</title><content type='html'>Q: "delete p"会释放"p"指针，还是"p"指向的内容"*p"?&lt;br /&gt;A: 会释放"p"指向的内容"*p".&lt;br /&gt;"delete"真正的含义是"delete the thing pointed to by";同样free(p)也是删除释放"p"指向的内容.&lt;br /&gt;至于指针p没有做任何的改变,如果想让我们的程序更加健壮,建议加入下面的一行代码:&lt;br /&gt;p = NULL;&lt;br /&gt;尤其在程序中频繁使用p的时候,这行代码将显得很重要.&lt;br /&gt;&lt;br /&gt;Q: 我能用"free()"掉由"new"申请到的,用"delete"掉由"malloc()"申请到的内存吗?&lt;br /&gt;A: 不能.&lt;br /&gt;malloc/free,new/delete必需配对使用!&lt;br /&gt;free掉new申请出来的内存或delete掉malloc申请出来的内存该被痛骂一通.&lt;br /&gt;&lt;br /&gt;Q: new/delete和malloc/free有什么区别?&lt;br /&gt;A: 这是一个老问题, 面试官常会问到的问题.&lt;br /&gt;可以从是否调用构造函数/析构函数,类型安全性,是否被重载三个方面讨论:&lt;br /&gt;new/delete会调用构造函数/析构函数;&lt;br /&gt;malloc()返回的是一个不具有类型安全性的"void *",而new会返回正确形态的指针;&lt;br /&gt;new/delete是可以被重载的一个运算子;malloc/free是不能被重载的.&lt;br /&gt;&lt;br /&gt;Q: 为什么C++不为"new"来搭建一个类型于"malloc"的realloc()的"函数"呢?&lt;br /&gt;A: 避免产生意外.&lt;br /&gt;realloc()这样的函数会破坏大部分的C++对象.&lt;br /&gt;&lt;br /&gt;Q: 如何用new/delete去申请/释放数组?&lt;br /&gt;A: 用new[]和delete[]去实现.&lt;br /&gt;int * p_obj = new int[100];&lt;br /&gt;// ...&lt;br /&gt;delete [] p_obj;&lt;br /&gt;在"new"运算式中使用了"[...]",就必须在"delete"运算式中使用"[]".&lt;br /&gt;&lt;br /&gt;Q: 忘记了将"[]"用在"delete"由"new int[100]"申请的数组,会发生什么事?&lt;br /&gt;A: 灾难.&lt;br /&gt;new[]和delete[]正确配对是程序员的,而不是编译器的责任.堆(heap)被破坏是最可能产生的后果,更坏的可能是程序down掉.&lt;br /&gt;&lt;br /&gt;Q: "delete this"是合法的吗?&lt;br /&gt;A: 不建议这样使用. 然后这样使用是合法的,只要我们足够小心.这里的小心点有:&lt;br /&gt;1)要确定"this"是由"new"申请来的(而不是new []);&lt;br /&gt;2)确保完成"delete this"后,不能再去碰"this"的对象了,甚至不能碰"this"了,包括不能查看它,或是跟NULL比较.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-6685088823243710390?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/6685088823243710390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=6685088823243710390' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6685088823243710390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6685088823243710390'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/01/cnewdeletemallocfree.html' title='C++中new(delete)和malloc(free)的常见问题'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-8024986719194011024</id><published>2009-01-08T01:25:00.000-08:00</published><updated>2009-01-08T02:19:22.659-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='解析JavaScript'/><title type='text'>如何让网络蜘蛛解析JavaScript脚本</title><content type='html'>太多的网页使用大量的JS；特别是一些使用了ajax的网页 ，使得蜘蛛难以处理；目前大部分网络蜘蛛都不去处理JavaScript。&lt;br /&gt;&lt;br /&gt;可以尝试让网络蜘蛛使用SpiderMoney引擎去解析JS。&lt;br /&gt;&lt;br /&gt;JavaScript is widely used for client-side scripts that run in the browser. But  Mozilla's JavaScript engine is a library that can be linked into any C/C++  program, not just a browser. Many applications can benefit from scripting. These  programs can execute JavaScript code from C using the SpiderMonkey API.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.mozilla.org/js/spidermonkey/"&gt;http://www.mozilla.org/js/spidermonkey/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-8024986719194011024?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/8024986719194011024/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=8024986719194011024' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/8024986719194011024'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/8024986719194011024'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2009/01/javascript.html' title='如何让网络蜘蛛解析JavaScript脚本'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-3980446070150374865</id><published>2008-12-17T17:41:00.000-08:00</published><updated>2008-12-17T17:42:59.969-08:00</updated><title type='text'>使用openssl库</title><content type='html'>今天帮同事看代码, 问题是这样的:&lt;br /&gt;&lt;br /&gt;编译正常, 说明头文件什么的都正确.&lt;br /&gt;链接时出现“undefined reference to `MD5'”:&lt;br /&gt;  /tmp/cc2c.o(.text+0x41):   undefined   reference   to   `MD5'     collect2:ld   returned   1   exit   status&lt;br /&gt;&lt;br /&gt;/usr/lib下有libssl.a一系列的库.&lt;br /&gt;&lt;br /&gt;问题的原因:&lt;br /&gt;呵呵, 一般我们都会觉得用openssl, 那么有libssl就可以了,其实openssl主要提供的库是libssl和libcrypto这两个. 使用openssl的库时一定要记住它们.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-3980446070150374865?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/3980446070150374865/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=3980446070150374865' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3980446070150374865'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3980446070150374865'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/12/openssl.html' title='使用openssl库'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-7190953403130967015</id><published>2008-12-13T18:24:00.000-08:00</published><updated>2008-12-13T18:27:16.541-08:00</updated><title type='text'>建议使用C++风格的类型转换</title><content type='html'>C++通过引进四个新的类型转换操作符克服了C 风格类型转换的缺点，这四个操作符是static_cast, const_cast, dynamic_cast, 和reinterpret_cast。&lt;br /&gt;&lt;br /&gt;static_cast 在功能上基本上与C 风格的类型转换一样强大，含义也一样。它也有功能上限制。例如，你不能用static_cast 象用C 风格的类型转换一样把struct 转换成int 类型或者把double 类型转换成指针类型，另外，static_cast 不能从表达式中去除const属性。&lt;br /&gt;&lt;br /&gt;const_cast 用于类型转换掉表达式的const 或volatileness 属性，到目前为止，const_cast 最普通的用途就是转换掉对象的const 属性。&lt;br /&gt;&lt;br /&gt;dynamic_cast，它被用于安全地沿着类的继承关系向下进行类型转换。这就是说，你能用dynamic_cast 把指向基类的指针或引用转换成指向其派生类或其兄弟类的指针或引用，而且你能知道转换是否成功。失败的转换将返回空指针（当对指针进行类型转换时）或者抛出异常（当对引用进行类型转换时）。&lt;br /&gt;&lt;br /&gt;reinterpret_cast，使用这个操作符的类型转换，它的转换结果几乎都是执行期定义（ implementation-defined ）。因此，使用reinterpret_casts 的代码很难移植。reinterpret_casts 的最普通的用途就是在函数指针类型之间进行转换。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-7190953403130967015?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/7190953403130967015/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=7190953403130967015' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7190953403130967015'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7190953403130967015'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/12/c_13.html' title='建议使用C++风格的类型转换'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-1765053776427464140</id><published>2008-12-12T20:57:00.000-08:00</published><updated>2008-12-12T21:24:21.240-08:00</updated><title type='text'>Linux/Unix screen</title><content type='html'>您是否会常常碰到这样的情况: 需要远程登录到Linux/Unix服务器,你有一个任务, 这个任务运算时间很长, 短则几小时, 长则几天才能运算结束; 您打开终端, 运行了那个任务, 从此您不能退出终端结束会话,因为一旦会话结束,您的任务的进程也随之退出了;直到你的那个任务运算结束. 其间,您想关了终端去睡觉,去陪您的妻子,孩子?&lt;br /&gt;那我建议您使用screen吧,我把这个工具介绍给了我好多的同事,他们都觉得很好用.&lt;br /&gt;&lt;br /&gt;GNU Screen官方地址:&lt;a href="http://www.gnu.org/software/screen/"&gt;http://www.gnu.org/software/screen/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;使用方法:&lt;br /&gt;1.直接在命令行键入screen命令&lt;br /&gt;# screen&lt;br /&gt;Screen将创建一个执行shell的全屏窗口。你可以执行任意shell程序，就像在ssh窗口中那样。在该窗口中键入exit退出该窗口窗口。&lt;br /&gt;2. 执行你的任务&lt;br /&gt;# cd ~/gbdt&lt;br /&gt;# ls&lt;br /&gt;# gbdt my.conf&lt;br /&gt;就像在SSH终端输入命令一样, 输入您的命令. 上面就是我拿gbdt来训练数据, 可能需要几个小时完成.&lt;br /&gt;3.让程序继续运行&lt;br /&gt;按一下CTRL+A,D, 暂时断开screen会话, 此时, 我先前执行的那个训练数据的任务继续运行着. 此后, 你就不用再管它了, 可以从服务器上退出, 关了终端.&lt;br /&gt;4.回到先前的screen, 查看结果&lt;br /&gt;过了几个小时了,其间我吃了晚餐,陪家人看了几个小时的电视.忽然想起,那个训练数据的任务该看看结果了.于是我打开电脑,远程登录到服务器,输入:&lt;br /&gt;#screen -ls&lt;br /&gt;There are screens on:&lt;br /&gt;        8736.******       (Detached)&lt;br /&gt;1 Sockets in /root/.screen.&lt;br /&gt;我看到系统仅有一个screen(有可能是多个,此时你自己查看一下哪个属于自己的), 于是我继续输入:&lt;br /&gt;#screen -r 8736&lt;br /&gt;&lt;br /&gt;哈哈, 进入了先前我创建的那个screen, 我看到训练结果已出来了......&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-1765053776427464140?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/1765053776427464140/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=1765053776427464140' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/1765053776427464140'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/1765053776427464140'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/12/linuxunix-screen.html' title='Linux/Unix screen'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-7452222520292717351</id><published>2008-12-12T20:54:00.000-08:00</published><updated>2008-12-12T20:56:20.952-08:00</updated><title type='text'>C++静态数据成员的初始化问题</title><content type='html'>静态数据成员在类声明中声明, 在包含类声明的文件中初始化。初始化时使用作用域操作符来指出静态成员所属的类。只有在静态成员是const的整型或是const的枚举型时，则可以在类声明中初始化。&lt;br /&gt;&lt;br /&gt;例子：&lt;br /&gt;  1&lt;br /&gt;  2 #include &lt;stdio.h&gt;&lt;br /&gt;  3&lt;br /&gt;  4 class base&lt;br /&gt;  5 {&lt;br /&gt;  6     public:&lt;br /&gt;  7         static const int _num = 10; // 正确，因为_num是const int型&lt;br /&gt;  8 };&lt;br /&gt;&lt;br /&gt;  1&lt;br /&gt;  2 #include &lt;stdio.h&gt;&lt;br /&gt;  3&lt;br /&gt;  4 class base&lt;br /&gt;  5 {&lt;br /&gt;  6     public:&lt;br /&gt;  7         static int _num = 10; // 错误，因为_num不是const类型的&lt;br /&gt;  8 };&lt;br /&gt;&lt;br /&gt;  1&lt;br /&gt;  2 #include &lt;stdio.h&gt;&lt;br /&gt;  3&lt;br /&gt;  4 class base&lt;br /&gt;  5 {&lt;br /&gt;  6     public:&lt;br /&gt;  7         static const void * _NULL = (void *)0; // 错误，因为_NULL不是const int或是const的枚举型&lt;br /&gt;  8 };&lt;br /&gt;&lt;br /&gt;  1&lt;br /&gt;  2 #include &lt;stdio.h&gt;&lt;br /&gt;  3&lt;br /&gt;  4 class base&lt;br /&gt;  5 {&lt;br /&gt;  6     public:&lt;br /&gt;  7         static const void * _NULL; // 声明, 正确&lt;br /&gt;  8 };&lt;br /&gt;  9 const void * base::_NULL = (void *)0; // 真正的定义，正确, (btw)别忘了_NULL前面的base::&lt;br /&gt;10&lt;br /&gt;&lt;br /&gt;static const 定义的常量是全局的，对该类的每一个实例（对象）甚至在全局范围都有效，而且一致。&lt;br /&gt;与全局对象一样，对于静态数据成员在程序中也只能提供一个定义，这意味着静态数据成员的初始化不应该被放在头文件中而应该放在含有类的非inline函数定义的文件中。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-7452222520292717351?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/7452222520292717351/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=7452222520292717351' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7452222520292717351'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/7452222520292717351'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/12/c.html' title='C++静态数据成员的初始化问题'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-2745663920425358535</id><published>2008-12-11T04:48:00.000-08:00</published><updated>2008-12-11T04:51:57.938-08:00</updated><title type='text'>SIGBUS和SIGSEGV信号</title><content type='html'>两者都是由于内存错误操作收到的信号,不同之处在于:&lt;br /&gt;SIGBUS信号缘于某一个地址被放到地址总线之后被检查出来的不符合对齐的错误；而SIGSEGV则缘于某一个地址已经放到地址总线上了，由后续流程中的某个设施检查出来的内存违法访问错误。&lt;br /&gt;一般我们遇到SIGBUS时总是因为地址未对齐导致的，而SIGSEGV则是由于内存地址不合法造成的。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-2745663920425358535?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/2745663920425358535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=2745663920425358535' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/2745663920425358535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/2745663920425358535'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/12/sigbussigsegv.html' title='SIGBUS和SIGSEGV信号'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-999350571002447390</id><published>2008-11-27T01:12:00.000-08:00</published><updated>2009-01-06T03:49:54.954-08:00</updated><title type='text'>数据挖掘中的ROC曲线</title><content type='html'>&lt;p&gt;我看到“ROC曲线”在百度百科的解释是：&lt;/p&gt;&lt;blockquote&gt;&lt;em&gt;&lt;span style="color: rgb(102, 102, 102);font-size:85%;" &gt;ROC曲线：Receiver Operating Characteeristic&lt;br /&gt;Curve，接受者操作特性曲线，又称感受性曲线，得此名的原因在于曲线上各点反映着相同的感受性，它们都是对同一信号刺激的反应，只不过是在几种不同的判定标准下所得的结果而已。接受者操作特性曲线就是以虚惊概率为横轴，击中概率为纵轴所组成的坐标图，和被试在特定刺激条件下由于采用不同的判断标准得出的不同结果画出的曲线。&lt;/span&gt;&lt;/em&gt;&lt;/blockquote&gt;我不清楚ROC曲线的起源，但从上面的解释来看，应该和信号处理有关，可能我的判断是错误的。而在数据挖掘领域，ROC曲线更合适的解释是（从互联网上整理得到）：&lt;br /&gt;它是查看一个模型真正率和假正率之间折中的一种图形化显示的方法。以下是几个ROC图中的几个概念：&lt;br /&gt;&lt;br /&gt;真正（True Positive , TP）被模型预测为正的正样本&lt;br /&gt;假负（False Negative , FN）被模型预测为负的正样本&lt;br /&gt;假正（False Positive , FP）被模型预测为正的负样本&lt;br /&gt;真负（True Negative , TN）被模型预测为负的负样本&lt;br /&gt;真正率（True Positive Rate , TPR）或灵敏度（sensitivity）&lt;br /&gt;　　TPR = TP /（TP + FN）正样本预测结果数/正样本实际数&lt;br /&gt;假负率（False Negative Rate , FNR）&lt;br /&gt;　　FNR = FN /（TP + FN）被预测为负的正样本结果数/正样本实际数&lt;br /&gt;假正率（False Positive Rate , FPR）&lt;br /&gt;　　FPR = FP /（FP + TN）被预测为正的负样本结果数/负样本实际数&lt;br /&gt;真负率（True Negative Rate , TNR）或特指度（specificity）&lt;br /&gt;　　TNR = TN /（TN + FP）负样本预测结果数 / 负样本实际数&lt;br /&gt;&lt;br /&gt;目标属性的被选中的那个期望值称作是“正”（positive）&lt;br /&gt;ROC曲线上几个关键点的解释：&lt;br /&gt;&lt;ul&gt;&lt;li&gt;(TPR=0,FPR=0) 把每个实例都预测为负类的模型&lt;/li&gt;&lt;li&gt;(TPR=1,FPR=1) 把每个实例都预测为正类的模型&lt;/li&gt;&lt;li&gt;(TPR=1,FPR=0) 理想模型&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;strong&gt;&lt;span style="color: rgb(255, 0, 0);"&gt;一个好的模型其ROC曲线应该是怎样的？&lt;/span&gt;&lt;/strong&gt;&lt;/p&gt;一个好的分类模型应该尽可能靠近图形的左上角，而一个随机猜测模型应位于连接点（TPR=0,FPR=0）和（TPR=1,FPR=1）的主对角线上。&lt;br /&gt;ROC曲线下方的面积（AUC）提供了评价模型平均性能的另一种方法。如果模型是完美的，那么它的AUC = 1，如果模型是个简单的随机猜测模型，那么它的AUC = 0.5，如果一个模型好于另一个，则它的曲线下方面积相对较大。&lt;br /&gt;&lt;br /&gt;下面是一段关于ROC曲线分析应用在二值分类问题的描述：&lt;br /&gt;"The ROC analysis applies to binary classification problems. One of the classes is selected as a "positive" one. The ROC chart plots the true positive rate as a function of the false positive rate. It is parametrized by the probability threshold values. The true positive rate represents the fraction of positive cases that were correctly classified by the model. The false positive rate represents the fraction of negative cases that were incorrectly classified as positive. Each point on the ROC plot represents a true_positive_rate/false_positive_rate pair corresponding to a particular probability threshold. Each point has a corresponding confusion matrix. The user can analyze the confusion matrices produced at different threshold levels and select a probability threshold to be used for scoring. The probability threshold choice is usually based on application requirements (i.e., acceptable level of false positives).The ROC does not represent a model. Instead it quantifies its discriminatory ability and assists the user in selecting an appropriate operating point for scoring."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-999350571002447390?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/999350571002447390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=999350571002447390' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/999350571002447390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/999350571002447390'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/11/roc.html' title='数据挖掘中的ROC曲线'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-6259856014349865568</id><published>2008-11-19T06:57:00.000-08:00</published><updated>2008-11-19T07:17:12.369-08:00</updated><title type='text'>Hadoop - 简介</title><content type='html'>&lt;div align="left"&gt;Apache Hadoop项目是一个可靠的，可扩展的，分布式计算平台，它是开源软件。&lt;br /&gt;截止目前，Hadoop包括以下四个部分：&lt;br /&gt;Hadoop 内核：Hadoop的核心子项目，提供了一个分布式文件系统(HDFS)和支持MapReduce的分布式计算。&lt;br /&gt;HBase：建立在Hadoop内核之上，提供可靠的，可扩展的分布式数据库。&lt;br /&gt;Pig：建立于Hadoop内核之上，是一种支持并行计算运行框架的高级数据流语言。&lt;br /&gt;ZooKeeper：一个高效的，可扩展的协调系统。分布式应用可以使用ZooKeeper来存储和协调关键共享状态。&lt;br /&gt;&lt;br /&gt;目前，IBM，Google，Yahoo!，Facebook等公司都有使用Hadoop。 &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-6259856014349865568?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/6259856014349865568/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=6259856014349865568' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6259856014349865568'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/6259856014349865568'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/11/hadoop.html' title='Hadoop - 简介'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-3038192574702586314</id><published>2008-11-16T02:43:00.000-08:00</published><updated>2008-11-16T02:54:33.537-08:00</updated><title type='text'>Taobao Machine Learning Tools</title><content type='html'>&lt;a href="http://4.bp.blogspot.com/_9qaFnw4SFD4/SR_5wu9wTNI/AAAAAAAAABI/B6GO816qDDw/s1600-h/logo_bak.gif"&gt;&lt;img id="BLOGGER_PHOTO_ID_5269204704682855634" style="FLOAT: left; MARGIN: 0px 10px 10px 0px; WIDTH: 320px; CURSOR: hand; HEIGHT: 192px" alt="" src="http://4.bp.blogspot.com/_9qaFnw4SFD4/SR_5wu9wTNI/AAAAAAAAABI/B6GO816qDDw/s320/logo_bak.gif" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div&gt;Taobao Machine Learning Tools，我最近正在筹划的一个项目，打算使用业余时间完成，供以后在实际工作中使用。版权归Taobao所有。该工具主要提供一些机器学习的算法，例如LogLinear，GBDT，SVM等等，可以用它去训练和测试数据等等。&lt;/div&gt;&lt;div&gt; &lt;/div&gt;&lt;div&gt;我觉得这是一个比较有意义的项目。&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-3038192574702586314?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/3038192574702586314/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=3038192574702586314' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3038192574702586314'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/3038192574702586314'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/11/taobao-machine-learning-tools.html' title='Taobao Machine Learning Tools'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_9qaFnw4SFD4/SR_5wu9wTNI/AAAAAAAAABI/B6GO816qDDw/s72-c/logo_bak.gif' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-1159629945443179744</id><published>2008-11-15T23:49:00.000-08:00</published><updated>2008-11-15T23:51:12.375-08:00</updated><title type='text'>SPSS 15.0版本的序列号生成器</title><content type='html'>发现一个非常好的SPSS15.0版本的序列号生成器：&lt;br /&gt;&lt;a href="http://www.alibaby.org/alibaby_dl/"&gt;http://www.alibaby.org/alibaby_dl/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;只适用于SPSS15.0版本。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-1159629945443179744?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/1159629945443179744/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=1159629945443179744' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/1159629945443179744'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/1159629945443179744'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/11/spss-150.html' title='SPSS 15.0版本的序列号生成器'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-452066665407036460</id><published>2008-11-11T06:19:00.000-08:00</published><updated>2008-11-11T06:29:03.689-08:00</updated><title type='text'>Google搜索有意思的一个bug</title><content type='html'>无意间在Google的搜索引擎上搜索“没谱是什么意思”，搜索的结果让我结舌：第一页结果全是关于mp3的内容。&lt;br /&gt;难道Google把“没谱是”一词拿出来了取了声母成了"MP3"? 但是如果单独搜索“没谱是”的结果是好的，“没谱”也是好的。&lt;br /&gt;&lt;br /&gt;想不明白为什么是这个结果，明天去问问那个在Google做中文分词的同学。&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-452066665407036460?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/452066665407036460/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=452066665407036460' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/452066665407036460'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/452066665407036460'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/11/googlebug.html' title='Google搜索有意思的一个bug'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-8403349762867366880</id><published>2008-11-10T19:37:00.000-08:00</published><updated>2008-11-10T19:39:57.443-08:00</updated><title type='text'>FreeBSD查询CPU信息</title><content type='html'>在FreeBSD上，要查看当前系统CPU信息，可用/etc/bin/smbiosinfo&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-8403349762867366880?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/8403349762867366880/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=8403349762867366880' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/8403349762867366880'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/8403349762867366880'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/11/freebsdcpu.html' title='FreeBSD查询CPU信息'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5274885697566202243.post-2800089004654130339</id><published>2008-11-09T23:34:00.000-08:00</published><updated>2008-11-09T23:57:35.769-08:00</updated><title type='text'>Gradient Boosted Decision Trees(GBDT)</title><content type='html'>最近看了看GBDT, 即Gradient Boosted Decision Trees(梯度渐近决策树).&lt;br /&gt;据我实际测试发现, 它与TreeNet的效果相当.&lt;br /&gt;&lt;br /&gt;wiki地址:&lt;br /&gt;&lt;a href="http://vis.berkeley.edu/courses/cs294-10-fa07/wiki/index.php/FP-Jerry_Ye_and_Jimmy_Chen"&gt;http://vis.berkeley.edu/courses/cs294-10-fa07/wiki/index.php/FP-Jerry_Ye_and_Jimmy_Chen&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;相关的论文:&lt;br /&gt;&lt;a href="http://www-stat.stanford.edu/~jhf/ftp/stobst.ps"&gt;Friedman, J. H. "Stochastic Gradient Boosting." (Feb. 1999a)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www-stat.stanford.edu/~jhf/ftp/trebst.ps"&gt;Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine." (March 1999b)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www-stat.stanford.edu/~hastie/Papers/samme.pdf"&gt;Zhu, Ji; Rosset, Saharon; Zou Hui; Hastie, Trevor "Multi-class Adaboost" (Jan. 2006)&lt;/a&gt; &lt;br /&gt;&lt;a href="http://www-stat.stanford.edu/~jhf/ftp/trebst.ps"&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www-stat.stanford.edu/~hastie/Papers/samme.pdf"&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5274885697566202243-2800089004654130339?l=se-study.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://se-study.blogspot.com/feeds/2800089004654130339/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5274885697566202243&amp;postID=2800089004654130339' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/2800089004654130339'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5274885697566202243/posts/default/2800089004654130339'/><link rel='alternate' type='text/html' href='http://se-study.blogspot.com/2008/11/gradient-boosted-decision-treesgbdt.html' title='Gradient Boosted Decision Trees(GBDT)'/><author><name>yan cheng</name><uri>http://www.blogger.com/profile/05662653437637157214</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='23' height='32' src='http://3.bp.blogspot.com/_9qaFnw4SFD4/SRfiWOaxFzI/AAAAAAAAAAs/izKGu-WZciY/S220/IMG_0596.gif'/></author><thr:total>0</thr:total></entry></feed>
