Saturday, 17 August 2013

php get matching keywords even if they are not exactly the same / next to each other

php get matching keywords even if they are not exactly the same / next to
each other

I'm looking for a way to find keywords (or word combinations) in a text,
that not necessarily match literally with keywords in a database but are
recognized as such.
There is a database with keywords;
id keyword
.. ...
6 cities
7 hotel
8 visit Paris
9 swimming pool
.. ...

I already have the following code to get the exact matching keywords from
a text:
$text = $_POST['text'];
$connection = new mysqli('localhost', 'root', 'password', 'database');
if (mysqli_connect_errno($connection))
{
echo "Failed to connect to MySQL: " . mysqli_connect_error();
}
$query = "SELECT * FROM `keywords`";
$keywords_found = array();
$result = $connection->query($query);
while ($result = $result->fetch_assoc())
{ if (stripos($text, $result['keyword']) !== false) {
$keywords_found[] = $result['keyword'];
}
else {}
}
echo "<div>";
foreach ($keywords_found as $key => $value) {
echo '<p>' . $value . '</p>';
}
echo '</div>';
Some examples of what is already achieved:
1.1 "You have visited a lot of cities." => ["cities"]
1.2 "She wants to visit Paris some day and stay in a hotel with a swimming
pool." => ["visit Paris", "hotel", "swimming pool"]
1.3 "When they visit cities such as Paris, they choose a 5-star hotel." =>
["cities", "hotel"]

Some examples of what I want to achieve:
2.1 "When they visit cities such as Paris, they choose a 5-star hotel." =>
["visit Paris", "cities", "hotel"]
2.2 "On the second day of my visit to Paris, I wanted to go swimming in
the hotel's pool." => ["visit Paris", "swimming pool", "hotel"]
2.3 "Paris is one of those cities I'd like to visit" => ["cities", "visit
Paris"]
2.4 "When I visited my uncle, ... [3 lines of text] ... he lived in
several cities, such as London and Paris" => ["cities"] // visit and Paris
are too far away

I haven't got much of a clue on how to tackle this and I'm not even sure
if this is possible. I've been looking at regex; is it possible to set a
limited search range (eg. example 2.4)? Can I use offset to limit the
search area around part of one keyword element if there are more of them?

No comments:

Post a Comment