One could isolate questions by analyzing the beginning of an input. You know, question words like: who, what, where, why, how, ...
Which do you think is better, matching the entire string
<?php
$responses=array();
$regex="/\bWhat is a (\w+)\b/i";
preg_match($regex,"What is a ball?",$responses);
or matching parsed word substrings with the word positions numbered:
$parsed = array(0=>"What", 1=>"is", 2=>"a", 3=>"ball");
$responses[] = $parsed[3];
print_r($responses);
?>
NOTE: Above is one source code listing split into two parts. The first part uses regular expressions. The second part uses an array of strings, parsed by some function into tokens (not shown). The source code is simplified for readability purposes.
Both ways work. Here they each insert the substring "ball" into the same $responses array.
$responses array
(
[0] => What is a ball
[1] => ball // inserted by regular expressions
[2] => ball // inserted by array element
)
I guest the question I am asking is which way do you think can handle a wider range of questions as user input?