How do I check if a string contains a specific word?

+2599 votes
asked Dec 6, 2010 by charles-yeung

Consider:

$a = 'How are you?';
if ($a contains 'are') echo 'true';

Suppose I have the code above, what is the correct way to write the statement if ($a contains 'are')?

28 Answers

+405 votes
answered Dec 6, 2010 by breezer

You could use regular expressions, it's better for word matching compared to strpos as mentioned by other users it will also return true for strings such as fare, care, stare etc. This can simply be avoided in the regular expression by using word boundaries.

A simple match for are could look something like this:

$a = 'How are you?';
if (preg_match('/\bare\b/',$a)) echo 'true';

On the performance side, strpos is about three times faster and have in mind, when I did one million compares at once, it took preg match 1.5 seconds to finish and for strpos it took 0.5 seconds.

+5065 votes
answered Dec 6, 2010 by codaddict

You can use the strpos() function which is used to find the occurrence of one string inside another one:

$a = 'How are you?';
if (strpos($a, 'are') !== false) { echo 'true';
}

Note that the use of !== false is deliberate; strpos() returns either the offset at which the needle string begins in the haystack string, or the boolean false if the needle isn't found. Since 0 is a valid offset and 0 is "falsey", we can't use simpler constructs like !strpos($a, 'are').

+53 votes
answered Dec 6, 2010 by glutorange

Using strstr() or stristr() if your search should be case insensitive would be another option.

+105 votes
answered Dec 6, 2010 by ftdrbwlxw6

While most of these answers will tell you if a substring appears in your string, that's usually not what you want if you're looking for a particular word, and not a substring.

What's the difference? Substrings can appear within other words:

  • The "are" at the beginning of "area"
  • The "are" at the end of "hare"
  • The "are" in the middle of "fares"

One way to mitigate this would be to use a regular expression coupled with word boundaries (\b):

function containsWord($str, $word)
{ return !!preg_match('#\\b' . preg_quote($word, '#') . '\\b#i', $str);
}

This method doesn't have the same false positives noted above, but it does have some edge cases of its own. Word boundaries match on non-word characters (\W), which are going to be anything that isn't a-z, A-Z, 0-9, or _. That means digits and underscores are going to be counted as word characters and scenarios like this will fail:

  • The "are" in "What _are_ you thinking?"
  • The "are" in "lol u dunno wut those are4?"

If you want anything more accurate than this, you'll have to start doing English language syntax parsing, and that's a pretty big can of worms (and assumes proper use of syntax, anyway, which isn't always a given).

+86 votes
answered Dec 6, 2010 by jose-vega

To determine whether a string contains another string you can use the PHP function strpos().

int strpos ( string $haystack , mixed $needle [, int $offset = 0 ] )

<?php
$haystack = 'how are you';
$needle = 'are';
if (strpos($haystack,$needle) !== false) { echo '$haystack contains $needle';
}
?>

CAUTION:

If the needle you are searching for is at the beginning of the haystack it will return position 0, if you do a == compare that will not work, you will need to do a ===

A == sign is a comparison and tests whether the variable / expression / constant to the left has the same value as the variable / expression / constant to the right.

A === sign is a comparison to see whether two variables / expresions / constants are equal AND have the same type - i.e. both are strings or both are integers.

+29 votes
answered Aug 20, 2012 by rafasashi

Peer to SamGoody and Lego Stormtroopr comments.

If you are looking for a PHP algorithm to rank search results based on proximity/relevance of multiple words here comes a quick and easy way of generating search results with PHP only:

Issues with the other boolean search methods such as strpos(), preg_match(), strstr() or stristr()

  1. can't search for multiple words
  2. results are unranked

PHP method based on Vector Space Model and tf-idf (term frequency–inverse document frequency):

It sounds difficult but is surprisingly easy.

If we want to search for multiple words in a string the core problem is how we assign a weight to each one of them?

If we could weight the terms in a string based on how representative they are of the string as a whole, we could order our results by the ones that best match the query.

This is the idea of the vector space model, not far from how SQL full-text search works:

function get_corpus_index($corpus = array(), $separator=' ') { $dictionary = array(); $doc_count = array(); foreach($corpus as $doc_id => $doc) { $terms = explode($separator, $doc); $doc_count[$doc_id] = count($terms); // tf–idf, short for term frequency–inverse document frequency, // according to wikipedia is a numerical statistic that is intended to reflect // how important a word is to a document in a corpus foreach($terms as $term) { if(!isset($dictionary[$term])) { $dictionary[$term] = array('document_frequency' => 0, 'postings' => array()); } if(!isset($dictionary[$term]['postings'][$doc_id])) { $dictionary[$term]['document_frequency']++; $dictionary[$term]['postings'][$doc_id] = array('term_frequency' => 0); } $dictionary[$term]['postings'][$doc_id]['term_frequency']++; } //from http://phpir.com/simple-search-the-vector-space-model/ } return array('doc_count' => $doc_count, 'dictionary' => $dictionary);
}
function get_similar_documents($query='', $corpus=array(), $separator=' '){ $similar_documents=array(); if($query!=''&&!empty($corpus)){ $words=explode($separator,$query); $corpus=get_corpus_index($corpus, $separator); $doc_count=count($corpus['doc_count']); foreach($words as $word) { if(isset($corpus['dictionary'][$word])){ $entry = $corpus['dictionary'][$word]; foreach($entry['postings'] as $doc_id => $posting) { //get term frequency–inverse document frequency $score=$posting['term_frequency'] * log($doc_count + 1 / $entry['document_frequency'] + 1, 2); if(isset($similar_documents[$doc_id])){ $similar_documents[$doc_id]+=$score; } else{ $similar_documents[$doc_id]=$score; } } } } // length normalise foreach($similar_documents as $doc_id => $score) { $similar_documents[$doc_id] = $score/$corpus['doc_count'][$doc_id]; } // sort from high to low arsort($similar_documents); } return $similar_documents;
}

CASE 1

$query = 'are';
$corpus = array( 1 => 'How are you?',
);
$match_results=get_similar_documents($query,$corpus);
echo '<pre>'; print_r($match_results);
echo '</pre>';

RESULT

Array
( [1] => 0.52832083357372
)

CASE 2

$query = 'are';
$corpus = array( 1 => 'how are you today?', 2 => 'how do you do', 3 => 'here you are! how are you? Are we done yet?'
);
$match_results=get_similar_documents($query,$corpus);
echo '<pre>'; print_r($match_results);
echo '</pre>';

RESULTS

Array
( [1] => 0.54248125036058 [3] => 0.21699250014423
)

CASE 3

$query = 'we are done';
$corpus = array( 1 => 'how are you today?', 2 => 'how do you do', 3 => 'here you are! how are you? Are we done yet?'
);
$match_results=get_similar_documents($query,$corpus);
echo '<pre>'; print_r($match_results);
echo '</pre>';

RESULTS

Array
( [3] => 0.6813781191217 [1] => 0.54248125036058
)

There are plenty of improvements to be made but the model provides a way of getting good results from natural queries, which don't have boolean operators such as strpos(), preg_match(), strstr() or stristr().

NOTA BENE

Optionally eliminating redundancy prior to search the words

  • thereby reducing index size and resulting in less storage requirement

  • less disk I/O

  • faster indexing and a consequently faster search.

1. Normalisation

  • Convert all text to lower case

2. Stopword elimination

  • Eliminate words from the text which carry no real meaning (like 'and', 'or', 'the', 'for', etc.)

3. Dictionary substitution

  • Replace words with others which have an identical or similar meaning. (ex:replace instances of 'hungrily' and 'hungry' with 'hunger')

  • Further algorithmic measures (snowball) may be performed to further reduce words to their essential meaning.

  • The replacement of colour names with their hexadecimal equivalents

  • The reduction of numeric values by reducing precision are other ways of normalising the text.

RESOURCES

+12 votes
answered Oct 9, 2012 by mathias-stavrou

Maybe you could use something like this:

<?php findWord('Test all OK'); function findWord($text) { if (strstr($text, 'ok')) { echo 'Found a word'; } else { echo 'Did not find a word'; } }
?>
+52 votes
answered Jul 9, 2013 by haim-evgi

Look at strpos():

<?php $mystring = 'abc'; $findme = 'a'; $pos = strpos($mystring, $findme); // Note our use of ===. Simply, == would not work as expected // because the position of 'a' was the 0th (first) character. if ($pos === false) { echo "The string '$findme' was not found in the string '$mystring'."; } else { echo "The string '$findme' was found in the string '$mystring',"; echo " and exists at position $pos."; }
?>
+25 votes
answered Sep 19, 2013 by armfoot

I'm a bit impressed that none of the answers here that used strpos, strstr and similar functions mentioned Multibyte String Functions yet (2015-05-08).

Basically, if you're having trouble finding words with characters specific to some languages, such as German, French, Portuguese, Spanish, etc. (e.g.: ä, é, ô, ç, º, ñ), you may want to precede the functions with mb_. Therefore, the accepted answer would use mb_strpos or mb_stripos (for case-insensitive matching) instead:

if (mb_strpos($a,'are') !== false) { echo 'true';
}

If you cannot guarantee that all your data is 100% in UTF-8, you may want to use the mb_ functions.

A good article to understand why is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky.

+22 votes
answered Oct 10, 2013 by jason-ooo

The function below also works and does not depend on any other function; it uses only native PHP string manipulation. Personally, I do not recommend this, but you can see how it works:

<?php
if (!function_exists('is_str_contain')) { function is_str_contain($string, $keyword) { if (empty($string) || empty($keyword)) return false; $keyword_first_char = $keyword[0]; $keyword_length = strlen($keyword); $string_length = strlen($string); // case 1 if ($string_length < $keyword_length) return false; // case 2 if ($string_length == $keyword_length) { if ($string == $keyword) return true; else return false; } // case 3 if ($keyword_length == 1) { for ($i = 0; $i < $string_length; $i++) { // Check if keyword's first char == string's first char if ($keyword_first_char == $string[$i]) { return true; } } } // case 4 if ($keyword_length > 1) { for ($i = 0; $i < $string_length; $i++) { /* the remaining part of the string is equal or greater than the keyword */ if (($string_length + 1 - $i) >= $keyword_length) { // Check if keyword's first char == string's first char if ($keyword_first_char == $string[$i]) { $match = 1; for ($j = 1; $j < $keyword_length; $j++) { if (($i + $j < $string_length) && $keyword[$j] == $string[$i + $j]) { $match++; } else { return false; } } if ($match == $keyword_length) { return true; } // end if first match found } // end if remaining part } else { return false; } // end for loop } // end case4 } return false; }
}

Test:

var_dump(is_str_contain("test", "t")); //true
var_dump(is_str_contain("test", "")); //false
var_dump(is_str_contain("test", "test")); //true
var_dump(is_str_contain("test", "testa")); //flase
var_dump(is_str_contain("a----z", "a")); //true
var_dump(is_str_contain("a----z", "z")); //true
var_dump(is_str_contain("mystringss", "strings")); //true 
Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
...