[FX.php List] FMPHP API

DC dan.cynosure at dbmscan.com
Fri Aug 4 22:57:34 MDT 2006


So, I learned something today that I want to share about this Value  
List parsing question that's been batting around for a few weeks...

Out of a somewhat morbid curiosity, I set up a file to do speed  
testing for each of the solutions suggested to this parsing problem  
posed by AC. This is the format of data that he wanted to parse:  
"integer space other text which may or may not include space characters"

I compared 6 parsing techniques with regard to speed only:

1. DC's preg_split
2. DC's sscanf
3. Hannah's strpos substr combo
4. AC's strcspn technique (similar to #3)
5. DC's explode shift implode array functions
6. Denman's strtok (suggested by Andrew Denman early on)

The results vary depending on what size your text is.

And the overall speed winner is: #6 strtok()
http://us3.php.net/manual/en/function.strtok.php

It handles small text chunks and large text chunks with aplomb.
I now have to take back what I said about this function earlier (that  
it was obscure); it's time to get comfortable with strtok().
It looks like the best choice for this type of simple but not dead  
simple parsing BOTH for speed and readability.
See below for a link to where you can download the test file and test  
it yourself.

Close behind for small text is #2 sscanf().
It also is very readable if you are familiar with the sprintf()  
conversion flags.

#1 preg_split and #5 array functions are slowest at small size (but  
still fast), and #5 completely breaks down when the text size is  
high. #1 preg_split holds up well over all sizes but is slightly  
slower than the top 3 at higher sizes (#3,#4,#6). Also, preg_split is  
such a capable function that it is good to learn a little regexp to  
be able to use it for more complicated parsing.

#3 strpos substr combo and #4 AC's strcspn are not quite as fast as  
#2 sscanf and #6 strtok at small text sizes, but they hold up very  
evenly over all text sizes. So, if you don't know your text size and  
you expect you may be dealing with large numbers of characters, these  
are good overall techniques.

You can test these yourself by grabbing the Parse Speed Test snippet:
http://www.dbmscan.com/snippets/index.php?cat_select=Miscellaneous

Hope you find this as entertaining as I did.
dan

On Aug 2, 2006, at 12:18 AM, DC wrote:

> Yes, AC, you are absolutely right. That code was broken! I'd like  
> to say it was some genius pedagogical method to help you become a  
> better PHP programmer but really it was my poor debugging. Thanks  
> for pointing out the deficiencies in the code I posted.
>
> Anyhow, it looks like you've got a tidy little 2-line function that  
> uses some more standard code substr() and strpos(). Congratulations!
>
> Since I can't believe I posted broken code to everyone on the list,  
> I've gone back to the drawing board and fixed them both.
>
> For posterity, here are the fixed versions of both the preg_split()  
> and sscanf() Value List parser. The %d in sscanf() tells the parser  
> to grab anything that looks like a number at the beginning of the  
> string. the %[^$]s tells the parser to look for everything else  
> that is a string up to the end of the string.
>
> What can I say... I LOVE oneliners like this (you don't even need  
> to use list() since sscanf() has builtin assignment):
>
> list($num,$text)=sscanf($string_to_parse,'%d %[^$]s'); // pretty
> sscanf($string_to_parse,'%d %[^$]s',$num,$text); // prettier
>
> <?php
> // shows two ways to parse a slightly complicated string
> $value_list = array("311 Dell", "12 My Company Name");
>
> // parse the strings using preg_split()
> foreach ($value_list as $string_to_parse) {
> 	list($num,$text)=preg_split('/([0-9]+)\s([a-zA-Z\s]+)/',
> 					$string_to_parse,-1,PREG_SPLIT_NO_EMPTY| 
> PREG_SPLIT_DELIM_CAPTURE);
> 	echo "&sect;
> 		string: $string_to_parse<br>
> 		num: $num<br>
> 		text: $text<br>";
> }
>
> echo "<hr>";
>
> // show sscanf() doing the same thing
> foreach ($value_list as $string_to_parse) {
> 	sscanf($string_to_parse,'%d %[^$]s',$num,$text);
> 	echo "&sect;
> 		string: $string_to_parse<br>
> 		num: $num<br>
> 		text: $text<br>";
> }
>
> ?>
>
> i replaced the snippet too. http://dbmscan.com/snippets
>
> it's in misc section.
>
> cheers,
> dan
>
>
> On Aug 1, 2006, at 10:57 PM, AC wrote:
>
>> Dan,
>> First let me say thanks for your quick replies.
>> I tried your code:
>>
>> $value_list = array("311 Dell", "12 My Company Name");
>> foreach ($value_list as $string_to_parse) {
>>    list($num,$text)=sscanf($string_to_parse,'%d %s');
>>    echo "num: $num <br>text: $text <br>";
>> }
>>
>> Instead of "My Company Name" I only got "My".  I assume "%d" is a  
>> new line character and there wasn't one after "My Company Name" so  
>> it only grabbed up to the space after "My".
>>
>>
>>
>> What I really wanted was the equivalent of FMP's;
>> Left($TheText, Position($TheText, " ", 1, 1) - 1)     //grab up to  
>> the space but not the space
>> and
>> Replace($TheText, 1, Position($TheText, " ", 1, 1), "")     //grab  
>> everything after the space (excluding the space)
>>
>>
>> I think I finally have it using PHP's;
>> substr($TheText, 0, strpos($TheText, ' '));     //grab up to the  
>> space but not the space
>> and
>> substr_replace($TheText, '', 0, strpos($TheText, ' ') + 1);     // 
>> grab everything after the (space excluding the space)
>>
>>
>>
>>
>> On Aug 1, 2006, at 5:37 PM, DC wrote:
>>
>>> ok, now we're getting somewhere.
>>>
>>> so, you have a more complicated parsing task.
>>>
>>> you don't just have a space delimited string, you have a number,  
>>> then a space, then some range of characters that could also  
>>> include a space. that, of course, would confound a simple list()  
>>> and explode() parser.
>>>
>>> you now have two choices(besides your clever but opaque use of  
>>> strcspn()!), both of which incidentally still use list():
>>>
>>> list() with sscanf()
>>>
>>> OR
>>>
>>> list with() preg_split()
>>>
>>> both of these functions offer VERY powerful parsing for almost  
>>> every situation. it would behoove any new PHP coder to become  
>>> familiar with both.
>>>
>>> try one of these parsing techniques. i think the sscanf() version  
>>> is a little cleaner, but some people might be more familiar with  
>>> regular expressions than with the sscanf() parsing language. in  
>>> this case, though sscanf() parsing definition is VERY easy to  
>>> read and the preg_split() regular expression is not and the  
>>> preg_split() function requires 2 ugly CONSTANTS to make it  
>>> behave. for other tasks preg_split() will be better, but in this  
>>> case sscanf() shines.
>>>
>>> i set these up in a repeat loop to show how they successfully  
>>> parse your range of strings.
>>>
>>> <?php
>>> // shows two ways to parse a slightly complicated string
>>>
>>> $value_list = array("311 Dell", "12 My Company Name");
>>>
>>> // parse the strings using preg_split()
>>> foreach ($value_list as $string_to_parse) {
>>> 	echo $string_to_parse.'<br>';
>>> 	list($num,$text)=preg_split('/([0-9]+)\s([a-zA-Z]+)/', 
>>> $string_to_parse,-1,PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
>>> 	echo "num: $num <br>";
>>> 	echo "text: $text <br>";
>>> }
>>>
>>> echo "<hr>";
>>>
>>> // show sscanf() doing the same thing
>>> foreach ($value_list as $string_to_parse) {
>>> 	echo $string_to_parse.'<br>';
>>> 	list($num,$text)=sscanf($string_to_parse,'%d %s');
>>> 	echo "num: $num <br>";
>>> 	echo "text: $text <br>";
>>> }
>>>
>>> ?>
>>>
>>> in case your email reader mangles the code, i put it here in  
>>> Miscellaneous section:
>>> http://www.dbmscan.com/snippets/index.php?cat_select=Miscellaneous
>>>
>>> your choice. let me know if i can be of assistance...
>>>
>>> cheers,
>>> dan
>>>
>>>
>>>
>>> On Aug 1, 2006, at 4:17 PM, AC wrote:
>>>
>>>> Dan,
>>>>
>>>> I needed the contents of a FM valuelist in PHP.
>>>> The values could be "311 Dell" or "12 My Company Name".
>>>> Should I still be using the list() & explode() function to  
>>>> obtain the ID number portion?
>>>> If I should use the substr() function, then what function do I  
>>>> use to determine how many characters the ID is?
>>>>
>>>>
>>>>
>>>>
>>>> On Aug 1, 2006, at 4:03 PM, DC wrote:
>>>>
>>>>> hi AC,
>>>>>
>>>>> just to make sure we're still talking about the same issue here...
>>>>>
>>>>> you want to take a string, say "311 Dell", and assign two  
>>>>> variables, one that holds the first part and one that holds the  
>>>>> second part. the first and second part are separated  
>>>>> (delimited) by a space character. Is that the goal?
>>>>>
>>>>> assuming it is... my advice is that you want to end up with  
>>>>> those two variables with the minimum of fuss and with a  
>>>>> standard PHP code style.
>>>>>
>>>>> list() with explode() is the most standard way of getting two  
>>>>> PHP variables from a simple delimited string:
>>>>>
>>>>> <?php
>>>>> $num_and_text_separated_by_space = "311 Dell";
>>>>> list($num,$text)=explode(' ', $num_and_text_separated_by_space);
>>>>> echo $num;
>>>>> echo $text;
>>>>> ?>
>>>>>
>>>>> The equivalent of FMP
>>>>> Left("text",2)
>>>>>
>>>>> in PHP is:
>>>>> substr("text",0,2);
>>>>>
>>>>> you see here i just hard coded the number two to get the first  
>>>>> two characters of "text". FMP Left() is for getting left  
>>>>> characters. substr() is for getting any substring of  
>>>>> characters. substr() is way more flexible than FMP Left(). By  
>>>>> changing the parameters, it can also be used for FMP's Right()  
>>>>> or Middle(). see the PHP manual for how to use it fully.
>>>>>
>>>>> so, when you ask, "what function should I be using inside  
>>>>> "substr()" instead of the "strcspn()" function?" that is the  
>>>>> wrong question. the right question is: "what is the best,  
>>>>> clearest way to get my data from a space-delimited string into  
>>>>> PHP variables? the answer to that is list() with explode().
>>>>>
>>>>> yes, i would say that strcspn() is a specialty function. i  
>>>>> didn't say you shouldn't use it... i said you should have a  
>>>>> good reason to use it that outweighs the confusion it may cause  
>>>>> you or your colleagues down the road because it is an unusual  
>>>>> usage for what you are trying to do.
>>>>>
>>>>> so, the substr(Text, 0, strcspn(Text, " ")) function may seem  
>>>>> to work fine, but to the experienced PHP eye it is quite  
>>>>> opaque. if you want to write clear, concise code then consider  
>>>>> using list() with explode().
>>>>>
>>>>> cheers,
>>>>> dan
>>>>>
>>>>>
>>>>> On Aug 1, 2006, at 2:54 PM, AC wrote:
>>>>>
>>>>>> Not meaning to beat a dead horse here;
>>>>>> - you said the equivalent of the FileMaker "Left()" function  
>>>>>> is the PHP "substr()" function
>>>>>> - you also said the PHP "strcspn()" function is not a common  
>>>>>> function to use
>>>>>> - I'm currently using    substr(Text, 0, strcspn(Text, " "))
>>>>>> - what function should I be using inside "substr()" instead of  
>>>>>> the "strcspn()" function?  (I'm completely new to PHP so I  
>>>>>> assume I'm just missing the obvious answer)
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Jul 31, 2006, at 11:41 AM, DC wrote:
>>>>>>
>>>>>>> well, strcspn() is another obscure function. i've been coding  
>>>>>>> PHP on a daily basis for several years now and i had to look  
>>>>>>> it up! it may have a specialized usage and you've certainly  
>>>>>>> demonstrated one interesting one ;-) it took me a minute to  
>>>>>>> wrap my head around that one.
>>>>>>>
>>>>>>> But, your best bet for code share-ability, maintainability  
>>>>>>> and readability is to know and explore the outer reaches of  
>>>>>>> PHP functions but when writing PHP to accomplish basic tasks,  
>>>>>>> use standard PHP idioms like list with explode. PHP offers  
>>>>>>> multiple paths to the same outcome but if you write code with  
>>>>>>> idiosyncratic idioms you're going to run into trouble down  
>>>>>>> the road - with other PHP coders and with your own memory.
>>>>>>>
>>>>>>> FYI, the equivalent to FMP's left() function in PHP is substr():
>>>>>>>
>>>>>>> http://us2.php.net/manual/en/function.substr.php
>>>>>>>
>>>>>>> Cheers and happy coding,
>>>>>>> dan
>>>>>>>
>>>>>>> On Jul 28, 2006, at 8:04 PM, AC wrote:
>>>>>>>
>>>>>>>> Thanks Andrew & Dan,
>>>>>>>>
>>>>>>>> I had found both strtok() and list() with explode().
>>>>>>>> I guess I'm still thinking in FileMaker terms when I'm doing  
>>>>>>>> things in PHP so I was looking for something like;
>>>>>>>> Left(Text, NumOfChar) and Replace(Text, Start, Size,  
>>>>>>>> ReplacementText)
>>>>>>>> I just figured since PHP had such an easy function to grab  
>>>>>>>> the remainder of the string (the equivalent of the FM  
>>>>>>>> Replace command) that it probably also had an easy  
>>>>>>>> equivalent of the FM Left command.
>>>>>>>> Based on your answers I'm guessing it doesn't and the     
>>>>>>>> substr(Text, 0, strcspn(Text, " "))    command is the  
>>>>>>>> closest equivalent.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jul 28, 2006, at 6:01 PM, DC wrote:
>>>>>>>>
>>>>>>>>> i would stay away from strtok()
>>>>>>>>>
>>>>>>>>> strtok() is a confusing function because it has an internal  
>>>>>>>>> stack that 'remembers' how many times you've called it and  
>>>>>>>>> then you have to call it the number of times that you have  
>>>>>>>>> strings to split.
>>>>>>>>>
>>>>>>>>> much better and more widely used and standard is to use  
>>>>>>>>> this construction using list() and explode()
>>>>>>>>>
>>>>>>>>> $variable = "311 something";
>>>>>>>>> list($number,$word) = explode(' ',$variable);
>>>>>>>>>
>>>>>>>>> now you'll have two nicely named variables one that has the  
>>>>>>>>> number and one that has the word
>>>>>>>>>
>>>>>>>>> cheers,
>>>>>>>>> dan
>>>>>>>>>
>>>>>>>>> On Jul 28, 2006, at 3:59 PM, Andrew Denman wrote:
>>>>>>>>>
>>>>>>>>>> Don't know about your first question, but for the second  
>>>>>>>>>> one, it looks like
>>>>>>>>>> you want to break the string on the spaces.  Check out  
>>>>>>>>>> strtok():
>>>>>>>>>> http://us3.php.net/manual/en/function.strtok.php
>>>>>>>>>>
>>>>>>>>>> Andrew Denman
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: fx.php_list-bounces at mail.iviking.org
>>>>>>>>>> [mailto:fx.php_list-bounces at mail.iviking.org] On Behalf Of AC
>>>>>>>>>> Sent: Friday, July 28, 2006 2:51 PM
>>>>>>>>>> To: FX.php Discussion List
>>>>>>>>>> Subject: [FX.php List] FMPHP API
>>>>>>>>>>
>>>>>>>>>> Anyone know if this API will grab both value fields from  
>>>>>>>>>> value lists
>>>>>>>>>> that are based on 2 fields ex.
>>>>>>>>>> The "Company" value list uses the fields "IDCompany" and  
>>>>>>>>>> "CompanyName".
>>>>>>>>>> In FX only the IDCompany is returned.
>>>>>>>>>> Does this API return both values?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Also, assuming I had   MyVariable = "316 Dell"
>>>>>>>>>> I can use the function    strstr(MyVariable, " ")
>>>>>>>>>> to get the "Dell" part by itself but to get the 316 I'm  
>>>>>>>>>> currently doing
>>>>>>>>>>    substr(MyVariable, 0, strcspn(MyVariable, " "))
>>>>>>>>>> Is there an easier way to get the 316 by itself (assuming  
>>>>>>>>>> it could also
>>>>>>>>>> be text)?
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> FX.php_List mailing list
>>>>>>>>>> FX.php_List at mail.iviking.org
>>>>>>>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> FX.php_List mailing list
>>>>>>>>>> FX.php_List at mail.iviking.org
>>>>>>>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> FX.php_List mailing list
>>>>>>>>> FX.php_List at mail.iviking.org
>>>>>>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> FX.php_List mailing list
>>>>>>>> FX.php_List at mail.iviking.org
>>>>>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> FX.php_List mailing list
>>>>>>> FX.php_List at mail.iviking.org
>>>>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> FX.php_List mailing list
>>>>>> FX.php_List at mail.iviking.org
>>>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>>
>>>>> _______________________________________________
>>>>> FX.php_List mailing list
>>>>> FX.php_List at mail.iviking.org
>>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> FX.php_List mailing list
>>>> FX.php_List at mail.iviking.org
>>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>
>>> _______________________________________________
>>> FX.php_List mailing list
>>> FX.php_List at mail.iviking.org
>>> http://www.iviking.org/mailman/listinfo/fx.php_list
>>>
>>>
>>
>> _______________________________________________
>> FX.php_List mailing list
>> FX.php_List at mail.iviking.org
>> http://www.iviking.org/mailman/listinfo/fx.php_list
>
> _______________________________________________
> FX.php_List mailing list
> FX.php_List at mail.iviking.org
> http://www.iviking.org/mailman/listinfo/fx.php_list



More information about the FX.php_List mailing list