0
点赞
收藏
分享

微信扫一扫

搜索引擎Zend_lucene

Zend Lucene

1.General

Zend_Search_Lucene is a general purpose text search engine written entirely in PHP 5. it stores its index on the filesystem and does not require a database server.

2. How to install Zend Lucene

l DownLoad WebSite :     ​​http://www.zend.com/community/downloads​​

l Zend Framework version :   Zend Framework 1.9 minimal

Download Zend Framework 1.9 minimal from DownLoad WebSite.

Remove everything from Zend Folder but remain following files and directories:

l Exception.php

l Loader/

l Loader.php

l Search/


3.How to create an index.

an example of creating an index as below:

<?php
//File Name: createindex.php
require_once 'Zend/Search/Lucene.php';
$productsData= array(
0=>array("PID"=>1,"url"=>"http://www.cybozu.jp","productName"=>"garoon","Description"=>"garoon Description","lag"=>"en"),
1=>array("PID"=>2,"url"=>"http://www.cybozu.jp","productName"=>"share360","Description"=>"share360 Description" ,"lag"=>"en"),
2=>array("PID"=>3,"url"=>"http://www.cybozu.jp a","productName"=>"日本語の製品名前","Description"=>"日本語の製品","lag"=>"jp"),
3=>array("PID"=>4,"url"=>"http://www.cybozu.jp a","productName"=>"中文产品名","Description"=>"中文产品描述","lag"=>"zh")
);
$index=new Zend_Search_Lucene('index',true);
$doc = new Zend_Search_Lucene_Document();
foreach ($productsData as $productData)
{
$doc->addField(Zend_Search_Lucene_Field::keyword('PID', $productData['PID'], 'UTF-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('url', $productData['url'], 'UTF-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('productName', $productData['productName'], 'UTF-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('Description', $productData['Description'], 'UTF-8'));
$doc->addField(Zend_Search_Lucene_Field::unIndexed('lan', $productData['lan'], 'UTF-8'));
$index->addDocument($doc);
$index->commit();
$index->optimize();
}
echo 'index has been created!';
In KB project, index data is come from database, using method above , We can index all the text from database.

4.Searching index

After creating an index , We can search index as below:

<?php
//File Name: search.php
require_once('Zend/Search/Lucene.php');
$index = new Zend_Search_Lucene('index');
$keywords='garoon';
echo "Index contains {$index->count()} documents.\n";
$query = Zend_Search_Lucene_Search_QueryParser::parse( $keywords, 'utf-8' );
$hits = $index->find($query);
foreach ($hits as $hit)
{
echo 'PID: '.$hit->PID.'<br>';
echo 'Score: '.$hit->score.'<br>';
echo 'url: '.$hit->url.'<br>';
echo 'productName: '.$hit->productName.'<br>';
echo 'lan: '.$hit->lan.'<br>';
}

If we want to search the text for multiple language, We can get value of lan , and then display different results by lan.


5.delete and update index.

If we want to update index , first we must find the document in index by keyword, then delete it ,after deleting the old document ,We can add a new document. This is an example to update an index. We delete PID :1 product,and update the description.

<?php
require_once('Zend/Search/Lucene.php');
$index = new Zend_Search_Lucene('index');
//new product data to update
$productNewData =array("PID"=>1,"url"=>"http://www.cybozu.jp","productName"=>"garoon","Description"=>"update garoon Description","lan"=>"en");
$keywords="PID:1";
$hits = $index->find($keywords);
//Delete PID:1
foreach ($hits as $hit)
{
echo 'PID: '.$hit->PID .'has been deleted <br>';
$index->delete($hit->id);
}
$index->commit();
//add new product data to index
$doc = new Zend_Search_Lucene_Document();
$doc->addField(Zend_Search_Lucene_Field::keyword('PID', $productNewData['PID'], 'UTF-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('url', $productNewData['url'], 'UTF-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('productName', $productNewData['productName'], 'UTF-8'));
$doc->addField(Zend_Search_Lucene_Field::Text('Description', $productNewData['Description'], 'UTF-8'));
$doc->addField(Zend_Search_Lucene_Field::unIndexed('lan', $productNewData['lan'], 'UTF-8'));
$index->addDocument($doc);
$index->commit();
$index->optimize();

6.How to search japanese or chinese text by lucene.

As default , lucene can only search English text.But in this project , we must search the text by English, Japanese and Chinese. So we have to change default analyzer of Lucene.

This is an extend of default analyzer of Lucene as below:


<?php 
// File Name:chinese.php
require_once 'Zend/Search/Lucene/Analysis/Analyzer.php';
require_once 'Zend/Search/Lucene/Analysis/Analyzer/Common.php';

class CN_Lucene_Analyzer extends Zend_Search_Lucene_Analysis_Analyzer_Common
{
private $_position;
private $_cnStopWords = array( );

public function setCnStopWords( $cnStopWords )
{
$this->_cnStopWords = $cnStopWords;
}

/**
* Reset token stream
*/
public function reset()
{
$this->_position = 0;
$search = array(",", "/", "\\", ".", ";", ":", "\"", "!", "~", "`", "^", "(", ")", "?", "-", "'", "<", ">", "$", "&", "%", "#", "@", "+", "=", "{", "}", "[", "]", ":", ")", "(", ".", "。", ",", "!", ";", "“", "”", "‘", "’", "[", "]", "、", "—", " ", "《", "》", "-", "…", "【", "】", "?", "¥" );

$this->_input = str_replace( $search, '', $this->_input );
$this->_input = str_replace( $this->_cnStopWords, ' ', $this->_input );
}

/**
* Tokenization stream API
* Get next token
* Returns null at the end of stream
*
* @return Zend_Search_Lucene_Analysis_Token|null
*/
public function nextToken()
{
if ($this->_input === null)
{
return null;
}
$len = strlen($this->_input);
//print "Old string:".$this->_input."<br />";
while ($this->_position < $len)
{
// Delete space at the begining
while ($this->_position < $len &&$this->_input[$this->_position]==' ' )
{
$this->_position++;
}
$termStartPosition = $this->_position;
$temp_char = $this->_input[$this->_position];
$isCnWord = false;
if(ord($temp_char)>127)
{
$i = 0;
while( $this->_position < $len && ord( $this->_input[$this->_position] )>127 )
{
$this->_position = $this->_position + 3;
$i ++;
if($i==2)
{
$isCnWord = true;
break;
}
}

if($i==1) continue;
}
else
{
while ($this->_position < $len && ctype_alnum( $this->_input[$this->_position] ))
{
$this->_position++;
}
//echo $this->_position.":".$this->_input[$this->_position-1]."\n";
}
if ($this->_position == $termStartPosition)
{
$this->_position++;
continue;
}

$tmp_str = substr($this->_input, $termStartPosition, $this->_position - $termStartPosition);

$token = new Zend_Search_Lucene_Analysis_Token( $tmp_str, $termStartPosition,$this->_position );

$token = $this->normalize($token);

if($isCnWord)
{
$this->_position = $this->_position - 3;
}

if ($token !== null)
{
return $token;
}
}

return null;
}
}

With the help of chinese.php we can search Japanese and Chinese in kb. And also we must add codes as below before creating an index and searching.


require_once 'chinese.php';

Zend_Search_Lucene_Analysis_Analyzer::setDefault(new CN_Lucene_Analyzer());


7.Is Zend Lucene need downtime?

  By using Zend Lucene , we don’t need any downtime. When add a new article we can add it to index at the same time, If we edit an article, we need to delete old document and update index with new one .



 

举报

相关推荐

0 条评论